Table of Contents

Cover

Title

Preface

Notations and Abbreviations

A Few Functions of Python®

1 Useful Maths

1.1. Basic concepts on probability
1.2. Conditional expectation
1.3. Projection theorem
1.4. Gaussianity
1.5. Random variable transformation
1.6. Fundamental theorems of statistics
1.7. A few probability distributions

2 Statistical Inferences

2.1. First step: visualizing data
2.2. Reduction of dataset dimensionality
2.3. Some vocabulary
2.4. Statistical model
2.5. Hypothesis testing
2.6. Statistical estimation

3 Inferences on HMM

3.1. Hidden Markov models (HMM)
3.2. Inferences on HMM
3.3. Filtering: general case
3.4. Gaussian linear case: Kalman algorithm
3.5. Discrete finite Markov case

4 Monte-Carlo Methods

4.1. Fundamental theorems
4.2. Stating the problem
4.3. Generating random variables
4.4. Variance reduction

5 Hints and Solutions

5.1. Useful maths
5.2. Statistical inferences
5.3. Inferences on HMM
5.4. Monte-Carlo methods

Bibliography

Index

End User License Agreement

List of Tables

2 Statistical Inferences

Table 2.1. Growth in mm under the action of 2 fertilizers
Table 2.2. Cavity values given in thousands. The fluoride level is given in ppm (parts per million)
Table 2.3. H height in cm, W weight in kg
Table 2.4. Temperature in degrees Fahrenheit and state of the O-ring: 1 signifies the existence of a fault and 0 the absence of faults. These data are related to the Space Shuttle Challenger disaster occurred on January 28, 1986

4 Monte-Carlo Methods

Table 4.1. The initial probabilities ℙ {X₀ = i} and the transition probabilities ℙ {X_n+1 = j|X_n = i} of a three-state homogeneous Markov chain. We verify that, for all i, ℙ {X_n+1 = j| X_n = i} = 1

List of Illustrations

1 Useful Maths

Figure 1.1. Orthogonality principle: the point X₀ which is the closest to X in is such that X − X₀ is orthogonal to
Figure 1.2. The conditional expectation {X|Y} is the orthogonal projection of X onto the set of measurable functions of Y. The expectation {X} is the orthogonal projection of X onto the set of constant functions. Clearly, ⊂

2 Statistical Inferences

Figure 2.1. Scatter plots of all the pairs of six variables. The couples (0, 5) and (3, 4) present a clear linear trend
Figure 2.2. Atmospheric CO₂ concentration from continuous air samples at Mauna Loa Observatory, Hawaii. The recorded data are weekly averages of parts per million by volume (ppmv), observed from March 1958 to December 2001. On the RHS, a zoom on 104 weeks shows a clear annual trend. Some data are missing
Figure 2.3. Top: the histogram consists of 10 bins of equal size. The ordinates correspond to the number of values in different disjoint intervals. Bottom: the boxplot consists of a central box containing 50% of the values. The vertical line indicates the empirical median. The plus symbols indicate outliers.
Figure 2.4. Q-Q plot of the math grades of 303 students. The theoretical quantile values are derived from a Gaussian distribution. The graph indicates that the data can be considered as Gaussian
Figure 2.5. The two curves represent the respective probability densities of the test function Φ(X) under the hypothesis H₀ (m₀ = 0) and hypothesis H₁ (m₁ = 2). For a threshold value η, the light gray area represents the significance level or probability of false alarm, and the dark gray area represents the power or probability of detection
Figure 2.6. ROC curve. The statistical model is { i.i.d. (n; m, 1)}, where m ∈ {0, 1}. The hypothesis H₀ = {0} is tested by the likelihood ratio [2.18]. The higher the value of n, the closer the curve to the ideal point, with coordinates (0, 1). The significance level α is interpreted as the probability of a false alarm. The power β is interpreted as the probability of detection
Figure 2.7. Diagram showing the calculation of the GLRT
Figure 2.8. Flow, expressed in m³/s averaged over one year, of the Nile as measured at Ashwan from 1871 to 1970. The curve presents an apparent change around 1899
Figure 2.9. Percentage of dental cavities observed as a function of fluoride levels in drinking water in ppm (parts per million)
Figure 2.10. Multimodal distribution
Figure 2.11. Step equal to 1/N for the five values of the series, ranked in increasing order
Figure 2.12. Cross-validation: a block is selected as the test base, for example block no. 4 in this illustration, and the remaining blocks are used as a learning base, before switching over

3 Inferences on HMM

Figure 3.1. Directed acyclic graph (DAG) associated with an HMM
Figure 3.2. Lattice with 3 states for a sequence of length 6. At step n, we keep the best ascendants (plain arrows) and compute the remaining three path metrics

4 Monte-Carlo Methods

Figure 4.1. Typical form of a cumulative function. We choose a value in a uniform manner between 0 and 1, and then deduce the realization t = F⁽⁻¹⁾(u)
Figure 4.2. The greater the value of M, the more samples are required to be drawn before accepting a value. The curve noted MUq(x) in the figure corresponds to a draw of the r.v. U from the interval 0 to 1. For this draw, the only accepted values of Y are those for which p(x) ≥ MUq(x)

5 Hints and Solutions

Figure 5.1. Prediction errors as a function of the supposed order of the model: ‘x’ for the learning base; ‘o’ for the test base. The observation model is of the form y = Zβ + σ_ϵ where Z includes p = 10 column vectors. As the number of predictors increases above and beyond the true value, we “learn” noise; this is known as overtraining
Figure 5.2. Results for the study of the filtering

Preface

This book addresses the fundamental bases of statistical inferences. We shall presume throughout that readers have a good working knowledge of Python® language and of the basic elements of digital signal processing.

The most recent version is Python® 3.x, but many people are still working with Python® 2.x versions. All codes provided in this book work with both these versions. The official home page of the Python® Programming Language is https://www.python.org/. Spyder® is a useful open-source integrated development environment (IDE) for programming in the Python® language. Briefly, we suggest to use the Anaconda Python distribution, which includes both Python® and Spyder®. The Anaconda Python distribution is located at https://www.continuum.io/downloads/.

The large part of the examples given in this book mainly use the modules numPy, which provides powerful numerical arrays objects, Scipy with high-level data processing routines, such as optimization, regression, interpolation and Matplotlib for plotting curves, histograms, Box and Whiskers plots, etc. See a list of useful functions p. xiii.

A brief outline of the contents of the book is given below.

Useful maths

In the first chapter, a short review of probability theory is presented, focusing on conditional probability, projection theorem and random variable transformation. A number of statistical elements will also be presented, including the great number law and the limit-central theorem.

Statistical inferences

The second chapter is devoted to statistical inference. Statistical inference consists of deducing some features of interest from a set of observations to a certain confidence level of reliability. This refers to a variety of techniques. In this chapter, we mainly focus on hypothesis testing, regression analysis, parameter estimation and determination of confidence intervals. Key notions include the Cramer–Rao bound, the Neyman–Pearson theorem, likelihood ratio tests, the least squares method for linear models, the method of moments and the maximum likelihood approach. The least squares method is a standard approach in regression analysis, and it is discussed in detail.

Inferences on HMM

In many problems, the variables of interest are only partially observed. Hidden Markov models (HMM) are well suited to accommodate this kind of problem. Their applications cover a wide range of fields, such as speech processing, handwriting recognition, the DNA analysis and monitoring and control. There are several issues with HMM inference. The key algorithms are the well-known Kalman filter, the Baum–Welch algorithm and the Viterbi algorithm to list only the most famous ones.

Monte-Carlo methods

Monte-Carlo methods refer to a broad class of algorithms that serve to perform quantities of interest. Typically, the quantities are integrals, i.e. the expectations of a given function. The key idea is using random sequences instead of deterministic sequences to achieve this result. The main issues are first the choice of the most appropriate random mechanism and, second, how to generate such a mechanism. In Chapter 4, the acceptance–rejection method, the Metropolis–Hastings algorithm, the Gibbs sampler, the importance sampling method, etc., are presented.

Maurice CHARBIT

October 2016

A Few Functions of Python®

To get function documentation, use .__doc__, e.g. print(range.__doc__), or help, e.g. help(zeros) or help(’def’), or ?, e.g. range.count?

– def: introduces a function definition
– if, else, elif: an if statement consists of a Boolean expression followed by one or more statements
– for: executes a sequence of statements multiple times
– while: repeats a statement or group of statements while a given condition is true
– 1j or complex: returns complex value, e.g. a=1.3+1j*0.2 or a=complex(1.3,0.2)

Methods:

– type A=array([0,4,12,3]), then type A. and tab, it follows a lot of methods, e.g. the argument of the maximum using A.argmax. For help type, e.g. A.dot?.

Functions:

– int: converts a number or string to an integer
– len: returns the number of items in a container
– range: returns an object that produces a sequence of integers
– type: returns the object type

From numpy:

– abs: returns the absolute value of the argument
– arange: returns evenly spaced values within a given interval
– argwhere: finds the indices of array elements that are non-zero, grouped by element
– array: creates an array
– cos, sin, tan: respectively calculate the cosine, the sine and the tangent
– cosh: calculates the hyperbolic cosine
– cumsum: calculates the cumulative sum of array elements
– diff: calculates the n-th discrete difference along a given axis
– dot: product of two arrays
– exp, log: respectively calculate the exponential, the logarithm
– fft: calculates the fft
– isinf: tests element-wise for positive or negative infinity
– isnan: tests element-wise for nan
– linspace: returns evenly spaced numbers over a specified interval
– loadtxt: loads data from a text file
– matrix: returns a matrix from an array-like object, or from a string of data
– max: returns the maximum of an array or maximum along an axis
– mean, std: respectively return the arithmetic mean and the standard deviation
– min: returns the minimum of an array or maximum along an axis
– nanmean, nanstd: respectively return the arithmetic mean and the standard deviation along a given axis while ignoring NaNs
– nansum: sum of array elements over a given axis, while ignoring NaNs
– ones: returns a new array of given shape and type, filled with ones
– pi: 3.141592653589793
– setdiff1d: returns the sorted, unique values of one array that are not in the other
– size: returns the number of elements along a given axis
– sort: returns a sorted copy of an array
– sqrt: computes the positive square-root of an array
– sum: sum of array elements over a given axis
– zeros: returns a new array of given shape and type, filled with zeroes

From numpy.linalg:

– eig: computes the eigenvalues and right eigenvectors of a square array
– pinv: computes the (Moore–Penrose) pseudo-inverse of a matrix
– inv: computes the (multiplicative) inverse of a matrix
– svd: computes Singular Value Decomposition

From numpy.random:

– rand: draws random samples from a uniform distribution over (0, 1)
– randn: draws random samples from the “standard normal” distribution
– randint: draws random integers from ‘low’ (inclusive) to ‘high’ (exclusive)

From scipy:

(for the random distributions, use the methods .pdf, .cdf, .isf, .ppf, etc.)

– norm: Gaussian random distribution
– gamma: gamma random distribution
– f: Fisher random distribution
– t: Student’s random distribution
– chi2: chi-squared random distribution

From scipy.linalg:

– sqrtm: computes matrix square root

From matplotlib.pyplot:

– box, boxplot, clf, figure, hist, legend, plot, show, subplot
– title, txt, xlabel, xlim, xticks, ylabel, ylim, yticks

Datasets:

– statsmodels.api.datasets.co2, statsmodels.api.datasets.nile, statsmodels.api.datasets.star98, statsmodels.api.datasets.heart
– sklearn.datasets.load_boston, sklearn.datasets.load_diabetes
– scipy.misc.ascent

From sympy:

– Symbol, Matrix, diff, Inverse, trace, simplify

∅		empty set
_A(x)	=
(a,b]	=	{x : a < x ≤ b}
δ(t)
Re(z)		real part of z
Im(z)		imaginary part of z
i or j	=
I_N		identity matrix of size N
A^∗		complex conjugate of A
A^T		transpose of A
A^H		transpose-conjugate of A
A⁻¹		inverse matrix of A
A^#		pseudo-inverse matrix of A
r.v./rv		random variable
ℙ		probability measure
ℙ_θ		probability measure indexed by θ
{X}		expectation of X
B_θ X		expectation of X under ℙ_θ
X_c = X − {X}		zero-mean random variable
var (X) = {\|X_c\|²}		variance of X
cov (X, Y ) =		covariance of (X,Y)
cov (X) = cov (X, X) = var (X)		variance of X
{X\|Y}		conditional expectation of X given Y
		a converges in distribution to b
		a converges in probability to b
		a converges almost surely to b
d.o.f.		degree of freedom
ARMA		AutoRegressive Moving Average
AUC		Area Under the ROC curve
c.d.f.		Cumulative Density Function
CRB		Cramer Rao Bound
EM		Expectation Maximization
GLRT		Generalized Likelihood Ratio Test
GEM		Generalized Expectation Maximization
GMM		Gaussian Mixture Model
HMM		Hidden Markov Model
i.i.d./iid		independent and identically distributed
LDA		Linear Discriminant Analysis
MC		Monte-Carlo
MLE		Maximum Likelihood Estimator
MME		Moment Method Estimator
MSE		Mean Square Error
OLS		Ordinary Least Squares
PCA		Principal Component Analysis
p.d.f.		Probability Density Function
ROC		Receiver Operational Characteristic
SNR		Signal to Noise Ratio
WLS		Weighted Least Squares