Transformations of random variables. Probability and Statistics - Basic Facts Centered and Normalized Random Variables

Transformations of random variables

For each random variable X determine three more quantities - centered Y, normalized V and given U. Centered random variable Y is the difference between a given random variable X and its mathematical expectation M(X), those. Y = X – M(X). Expectation of a centered random variable Y equals 0, and the variance is the variance of a given random variable: M(Y) = 0, D(Y) = D(X). Distribution function F Y(x) centered random variable Y related to the distribution function F(x) original random variable X ratio:

F Y(x) = F(x + M(X)).

The densities of these random variables satisfy the equality

f Y(x) = f(x + M(X)).

Normalized random variable V is the ratio of a given random variable X to its standard deviation, i.e. . Expectation and variance of a normalized random variable V expressed through characteristics X So:

Where v– coefficient of variation of the original random variable X. For the distribution function F V(x) and density f V(x) normalized random variable V we have:

Where F(x) – distribution function of the original random variable X, A f(x) – its probability density.

Reduced random variable U is a centered and normalized random variable:

For the given random variable

Normalized, centered and reduced random variables are constantly used both in theoretical studies and in algorithms, software products, regulatory, technical and instructional documentation. In particular, because the equalities make it possible to simplify the justification of methods, the formulation of theorems and calculation formulas.

Transformations of random variables and more general ones are used. So, if Y = aX + b, Where a And b– some numbers, then

Example 7. If then Y is the reduced random variable, and formulas (8) transform into formulas (7).

With each random variable X you can associate many random variables Y, given by the formula Y = aX + b at different a> 0 and b. This set is called scale-shift family, generated by the random variable X. Distribution functions F Y(x) constitute a scale-shift family of distributions generated by the distribution function F(x). Instead of Y = aX + b often use recording

Number With is called the shift parameter, and the number d- scale parameter. Formula (9) shows that X– the result of measuring a certain quantity – goes into U– the result of measuring the same quantity if the beginning of the measurement is moved to the point With, and then use the new unit of measurement, in d times larger than the old one.

For the scale-shift family (9), the distribution of X is called standard. In probabilistic statistical methods of decision making and other applied research, the standard normal distribution, standard Weibull-Gnedenko distribution, standard gamma distribution, etc. are used (see below).

Other transformations of random variables are also used. For example, for a positive random variable X are considering Y= log X, where lg X– decimal logarithm of a number X. Chain of equalities

F Y (x) = P( lg X< x) = P(X < 10x) = F( 10x)

connects distribution functions X And Y.

Centered random variable corresponding to SVX is the difference between the random variable X and its mathematical expectation

The random variable is called normalized, if its variance is 1. A centered and normalized random variable is called standard.

Standard random variable Z, corresponding to the random variable X is found by the formula:

(1.24)

1.2.5. Other numerical characteristics

Discrete SV mode X is defined as such a possible value x m, for which

Continuous SV fashionX called a real number M 0 (X), defined as the point of maximum probability density distribution f(x).

Thus, fashion SV X is its most probable value if such a value is unique. A mode may not exist, have a single value (unimodal distribution), or have multiple values (multimodal distribution).

Median of continuous SVX called a real number M D (X), satisfying the condition

Since this equation can have many roots, the median is determined, generally speaking, ambiguously.

The starting momentm-th order SVX (if it exists) is called a real number m, determined by the formula

(1.27)

Central moment of mth order SVX(if it exists) is called a number m, determined by the formula

(1.28)

Expectation of SV X is its first initial moment, and dispersion is its second central moment.

Among the moments of higher orders, the central moments of the 3rd and 4th orders are of particular importance.

The coefficient of asymmetry ("skewness") A(X) is called the quantity

The coefficient of kurtosis ("sharpness") E(X) NEX is called the quantity

1.3. Some laws of distribution of discrete random variables

1.3.1. Geometric distribution

Discrete SV X has a geometric distribution if its possible values are 0, 1, 2, …, m, ... correspond to the probabilities calculated by the formula

where 0< p< 1,q= 1 –p.

In practice, geometric distribution occurs when a number of independent attempts are made to achieve some result. A and the probability of the event occurring A in every attempt P(A) =P. NE X– the number of useless attempts (before the first experiment in which the event appears A), has a geometric distribution with a distribution series:

x i
p i			q 2 p		q m p

and numerical characteristics:

(1.30)

1.3.2. Hypergeometric distribution

Discrete SV X with possible values 0, 1, …, m, …,M has a hypergeometric distribution with parameters N,M,n, If

(1.31)

Where M≤N,m ≤n,n≤N,m,n,N,M- integers.

A hypergeometric distribution occurs in cases like the following: there is N objects, of which M have a certain characteristic. From available N objects are selected at random n objects.

NE X – the number of objects with the specified attribute among those selected is distributed according to the hypergeometric law.

The hypergeometric distribution is used, in particular, when solving problems related to product quality control.

The mathematical expectation of a random variable having a hypergeometric distribution is equal to:

(1.32)

The difference between a random variable and its mathematical expectation is called deviation or centered random variable:

The distribution series of a centered random variable has the form:

X  M(X)	X 1  M(X)	X 2  M(X)		X n  M(X)
	R 1	p 2		R n

Properties centered random variable:

1. The mathematical expectation of deviation is 0:

2. Variance of deviation of a random variable X from its mathematical expectation is equal to the variance of the random variable X itself:

In other words, the variance of a random variable and the variance of its deviation are equal.

4.2. If deviation X M(X) divide by standard deviation  (X), then we obtain a dimensionless centered random variable, which is called standard (normalized) random variable:

Properties standard random variable:

The mathematical expectation of a standard random variable is zero: M(Z) =0.

The variance of a standard random variable is 1: D(Z) =1.

TASKS FOR INDEPENDENT SOLUTION

In the lottery for 100 tickets, two things are drawn, the cost of which is 210 and 60 USD. Draw up a law for the distribution of the winnings for a person who has: a) 1 ticket, b) 2 tickets. Find numerical characteristics.

Two shooters shoot at a target once. Random value X– the number of points scored in one shot by the first shooter – has a distribution law:

Z– the sum of points scored by both shooters. Determine numerical characteristics.

Two shooters shoot at their target, firing one shot each independently of each other. The probability of hitting the target for the first shooter is 0.7, for the second - 0.8. Random value X 1 – number of hits by the first shooter, X 2 - number of hits by the second shooter. Find the distribution law: a) the total number of hits; b) random variable Z=3X 1  2X 2 . Determine the numerical characteristics of the total number of hits. Check the fulfillment of the properties of mathematical expectation and dispersion: M(3 X  2 Y)=3 M(X)  2 M(Y), D(3 X  2 Y)=9 D(X)+4 D(Y).

Random value X– the company’s revenue – has a distribution law:

Find the distribution law for a random variable Z- profit of the company. Determine its numerical characteristics.

Random variables X And U independent and have the same distribution law:

Meaning

Do random variables have the same distribution laws? X And X + U ?

Prove that the mathematical expectation of a standard random variable is equal to zero and the variance is equal to 1.

has a variance equal to 1 and a mathematical expectation equal to 0.

Normalized random variable V is the ratio of a given random variable X to its standard deviation σ

Standard deviation is the square root of the variance

The mathematical expectation and variance of the normalized random variable V are expressed through the characteristics of X as follows:

MV= M(X)σ=1v, DV= 1,

where v is the coefficient of variation of the original random variable X.

For the distribution function F V (x) and the distribution density f V (x) we have:

F V (x) = F(σx), f V (x) = σf(σx),

Where F(x)– distribution function of the original random variable X, A f(x)– its probability density.

Correlation coefficient.

Correlation coefficient is an indicator of the nature of the mutual stochastic influence of changes in two random variables. The correlation coefficient can take values from -1 to +1. If the absolute value is closer to 1, then this means the presence of a strong connection, and if closer to 0, the connection is absent or is significantly nonlinear. When the correlation coefficient is equal in modulus to one, we speak of a functional relationship (namely a linear dependence), that is, changes in two quantities can be described by a linear function.

The process is called stochastic, if it is described by random variables whose value changes over time.

Pearson correlation coefficient.

For metric quantities, the Pearson correlation coefficient is used, the exact formula of which was derived by Francis Hamilton. Let X and Y be two random variables defined on the same probability space. Then their correlation coefficient is given by the formula:

Chebyshev's inequalities.

Markov's inequality.

Markov's inequality in probability theory, gives an estimate of the probability that a random variable will exceed a fixed positive constant in absolute value, in terms of its mathematical expectation. The resulting estimate is usually quite rough. However, it allows one to get a certain idea of the distribution when the latter is not known explicitly.

Let a random variable be defined on a probability space and its mathematical expectation is finite. Then

Where a > 0.

Chebyshev-Bienieme inequality.

If E< ∞ (E – математическое ожидание), то для любого , справедливо

Law of large numbers.

Law of Large Numbers states that the empirical mean (arithmetic mean) of a sufficiently large finite sample from a fixed distribution is close to the theoretical mean (mathematical expectation) of that distribution. Depending on the type of convergence, a distinction is made between the weak law of large numbers, when convergence in probability occurs, and the strong law of large numbers, when convergence occurs almost everywhere.

There will always be a number of trials in which, with any given probability in advance, the frequency of occurrence of some event will differ as little as desired from its probability. The general meaning of the law of large numbers is that the combined action of a large number of random factors leads to a result that is almost independent of chance.

The weak law of large numbers.

Then Sn P M(X).

Strengthened law of large numbers.

Then Sn→M(X) is almost certain.

In addition to position characteristics - average, typical values of a random variable - a number of characteristics are used, each of which describes one or another property of the distribution. The so-called moments are most often used as such characteristics.

The concept of moment is widely used in mechanics to describe the distribution of masses (static moments, moments of inertia, etc.). Exactly the same techniques are used in probability theory to describe the basic properties of the distribution of a random variable. Most often, two types of moments are used in practice: initial and central.

The initial moment of the sth order of a discontinuous random variable is a sum of the form:

. (5.7.1)

Obviously, this definition coincides with the definition of the initial moment of order s in mechanics, if masses are concentrated on the abscissa axis at points.

For a continuous random variable X, the initial moment of sth order is called the integral

. (5.7.2)

It is easy to see that the main characteristic of the position introduced in the previous n° - the mathematical expectation - is nothing more than the first initial moment of the random variable.

Using the mathematical expectation sign, you can combine two formulas (5.7.1) and (5.7.2) into one. Indeed, formulas (5.7.1) and (5.7.2) are completely similar in structure to formulas (5.6.1) and (5.6.2), with the difference that instead of and there are, respectively, and . Therefore, we can write a general definition of the initial moment of the th order, valid for both discontinuous and continuous quantities:

, (5.7.3)

those. The initial moment of the th order of a random variable is the mathematical expectation of the th degree of this random variable.

Before defining the central moment, we introduce a new concept of “centered random variable.”

Let there be a random variable with mathematical expectation. A centered random variable corresponding to the value is the deviation of the random variable from its mathematical expectation:

In the future, we will agree to denote everywhere the centered random variable corresponding to a given random variable by the same letter with a symbol at the top.

It is easy to verify that the mathematical expectation of a centered random variable is equal to zero. Indeed, for a discontinuous quantity

similarly for a continuous quantity.

Centering a random variable is obviously equivalent to moving the origin of coordinates to the middle, “central” point, the abscissa of which is equal to the mathematical expectation.

The moments of a centered random variable are called central moments. They are analogous to moments about the center of gravity in mechanics.

Thus, the central moment of order s of a random variable is the mathematical expectation of the th power of the corresponding centered random variable:

, (5.7.6)

and for continuous – by the integral

. (5.7.8)

In what follows, in cases where there is no doubt about which random variable a given moment belongs to, for brevity we will write simply and instead of and .

Obviously, for any random variable the central moment of the first order is equal to zero:

, (5.7.9)

since the mathematical expectation of a centered random variable is always equal to zero.

Let us derive relations connecting the central and initial moments of different orders. We will carry out the conclusion only for discontinuous quantities; it is easy to verify that exactly the same relations are valid for continuous quantities if we replace finite sums with integrals, and probabilities with elements of probability.

Let's consider the second central point:

Similarly for the third central moment we obtain:

Expressions for etc. can be obtained in a similar way.

Thus, for the central moments of any random variable the formulas are valid:

(5.7.10)

Generally speaking, moments can be considered not only relative to the origin (initial moments) or mathematical expectation (central moments), but also relative to an arbitrary point:

. (5.7.11)

However, central moments have an advantage over all others: the first central moment, as we have seen, is always equal to zero, and the next one, the second central moment, with this reference system has a minimum value. Let's prove it. For a discontinuous random variable at, formula (5.7.11) has the form:

. (5.7.12)

Let's transform this expression:

Obviously, this value reaches its minimum when , i.e. when the moment is taken relative to the point.

Of all the moments, the first initial moment (mathematical expectation) and the second central moment are most often used as characteristics of a random variable.

The second central moment is called the variance of the random variable. In view of the extreme importance of this characteristic, among other points, we introduce a special designation for it:

According to the definition of the central moment

, (5.7.13)

those. the variance of a random variable X is the mathematical expectation of the square of the corresponding centered variable.

Replacing the quantity in expression (5.7.13) with its expression, we also have:

. (5.7.14)

To directly calculate the variance, use the following formulas:

, (5.7.15)

(5.7.16)

Accordingly for discontinuous and continuous quantities.

The dispersion of a random variable is a characteristic of dispersion, the scattering of the values of a random variable around its mathematical expectation. The word “dispersion” itself means “dispersion”.

If we turn to the mechanical interpretation of the distribution, then the dispersion is nothing more than the moment of inertia of a given mass distribution relative to the center of gravity (mathematical expectation).

The variance of a random variable has the dimension of the square of the random variable; To visually characterize dispersion, it is more convenient to use a quantity whose dimension coincides with the dimension of the random variable. To do this, take the square root of the variance. The resulting value is called the standard deviation (otherwise “standard”) of the random variable. We will denote the standard deviation:

, (5.7.17)

To simplify notations, we will often use the abbreviations for standard deviation and dispersion: and . In the case when there is no doubt which random variable these characteristics relate to, we will sometimes omit the symbol x y and and write simply and . The words “standard deviation” will sometimes be abbreviated to be replaced by the letters r.s.o.

In practice, a formula is often used that expresses the dispersion of a random variable through its second initial moment (the second of formulas (5.7.10)). In the new notation it will look like:

Expectation and variance (or standard deviation) are the most commonly used characteristics of a random variable. They characterize the most important features of the distribution: its position and degree of scattering. For a more detailed description of the distribution, moments of higher orders are used.

The third central point serves to characterize the asymmetry (or “skewness”) of the distribution. If the distribution is symmetrical with respect to the mathematical expectation (or, in a mechanical interpretation, the mass is distributed symmetrically with respect to the center of gravity), then all odd-order moments (if they exist) are equal to zero. Indeed, in total

when the distribution law is symmetrical with respect to the law and odd, each positive term corresponds to a negative term equal in absolute value, so that the entire sum is equal to zero. The same is obviously true for the integral

which is equal to zero as an integral in the symmetric limits of an odd function.

It is natural, therefore, to choose one of the odd moments as a characteristic of the distribution asymmetry. The simplest of these is the third central moment. It has the dimension of the cube of a random variable: to obtain a dimensionless characteristic, the third moment is divided by the cube of the standard deviation. The resulting value is called the “asymmetry coefficient” or simply “asymmetry”; we will denote it:

In Fig. 5.7.1 shows two asymmetric distributions; one of them (curve I) has a positive asymmetry (); the other (curve II) is negative ().

The fourth central point serves to characterize the so-called “coolness”, i.e. peaked or flat-topped distribution. These distribution properties are described using the so-called kurtosis. The kurtosis of a random variable is the quantity

The number 3 is subtracted from the ratio because for the very important and widespread in nature normal distribution law (which we will get to know in detail later) . Thus, for a normal distribution the kurtosis is zero; curves that are more peaked compared to the normal curve have a positive kurtosis; Curves that are more flat-topped have negative kurtosis.

In Fig. 5.7.2 shows: normal distribution (curve I), distribution with positive kurtosis (curve II) and distribution with negative kurtosis (curve III).

In addition to the initial and central moments discussed above, in practice the so-called absolute moments (initial and central) are sometimes used, determined by the formulas

Obviously, absolute moments of even orders coincide with ordinary moments.

Of the absolute moments, the most commonly used is the first absolute central moment.

, (5.7.21)

called the arithmetic mean deviation. Along with dispersion and standard deviation, arithmetic mean deviation is sometimes used as a characteristic of dispersion.

Expectation, mode, median, initial and central moments and, in particular, dispersion, standard deviation, skewness and kurtosis are the most commonly used numerical characteristics of random variables. In many practical problems, a complete characteristic of a random variable - the distribution law - is either not needed or cannot be obtained. In these cases, one is limited to an approximate description of the random variable using help. Numerical characteristics, each of which expresses some characteristic property of the distribution.

Very often, numerical characteristics are used to approximately replace one distribution with another, and usually they try to make this replacement in such a way that several important points remain unchanged.

Example 1. One experiment is carried out, as a result of which an event may or may not appear, the probability of which is equal to . A random variable is considered - the number of occurrences of an event (characteristic random variable of an event). Determine its characteristics: mathematical expectation, dispersion, standard deviation.

Solution. The value distribution series has the form:

where is the probability of the event not occurring.

Using formula (5.6.1) we find the mathematical expectation of the value:

The dispersion of the value is determined by formula (5.7.15):

(We suggest that the reader obtain the same result by expressing the dispersion in terms of the second initial moment).

Example 2. Three independent shots are fired at a target; The probability of hitting each shot is 0.4. random variable – number of hits. Determine the characteristics of a quantity - mathematical expectation, dispersion, r.s.d., asymmetry.

Solution. The value distribution series has the form:

We calculate the numerical characteristics of the quantity:

Note that the same characteristics could be calculated much more simply using theorems on numerical characteristics of functions (see Chapter 10).