When studying correlations try to determine whether there is any relationship between two indicators in the same sample (for example, between the height and weight of children or between the level of IQ and school performance) or between two different samples (for example, when comparing pairs of twins), and if this relationship exists, then whether an increase in one indicator is accompanied by an increase (positive correlation) or a decrease (negative correlation) in the other.

In other words, correlation analysis helps to establish whether it is possible to predict the possible values ​​of one indicator, knowing the value of another.

Until now, when analyzing the results of our experience in studying the effects of marijuana, we have deliberately ignored such an indicator as reaction time. Meanwhile, it would be interesting to check whether there is a connection between the effectiveness of reactions and their speed. This would allow, for example, to assert that the slower a person is, the more accurate and efficient his actions will be and vice versa.

For this purpose, two different methods can be used: the parametric method of calculating the Bravais-Pearson coefficient (r) and calculation of the Spearman rank correlation coefficient (r s ), which applies to ordinal data, i.e., is nonparametric. However, let’s first understand what a correlation coefficient is.

Correlation coefficient

The correlation coefficient is a value that can vary from -1 to 1. In the case of a complete positive correlation, this coefficient is plus 1, and in the case of a completely negative correlation, it is minus 1. On the graph, this corresponds to a straight line passing through the points of intersection of the values ​​of each pair data:


If these points do not line up in a straight line, but form a “cloud,” the correlation coefficient in absolute value becomes less than one and, as this cloud is rounded, approaches zero:

If the correlation coefficient is 0, both variables are completely independent of each other.

In the humanities, a correlation is considered strong if its coefficient is greater than 0.60; if it exceeds 0.90, then the correlation is considered very strong. However, in order to be able to draw conclusions about the relationships between variables, the sample size is of great importance: the larger the sample, the more reliable the value of the obtained correlation coefficient. There are tables with critical values ​​of the Bravais-Pearson and Spearman correlation coefficient for different numbers of degrees of freedom (it is equal to the number of pairs minus 2, i.e. n-2). Only if the correlation coefficients are greater than these critical values ​​can they be considered reliable. So, in order for the correlation coefficient of 0.70 to be reliable, at least 8 pairs of data must be taken into the analysis ( = p - 2 = 6) when calculating r(Table B.4) and 7 pairs of data (= n - 2 = 5) when calculating r s (Table 5 in Appendix B. 5).

Bravais–Pearson coefficient

To calculate this coefficient, use the following formula (it may look different for different authors):

where  XY - the sum of the products of data from each pair;

n - number of pairs;

- average for the given variable X;

Average for variable data Y;

S X - x;

s Y - standard deviation for distribution u.

We can now use this coefficient to determine whether there is a relationship between the subjects' reaction time and the effectiveness of their actions. Take, for example, the background level of the control group.

n= 15  15,8  13,4 = 3175,8;

(n 1)S x S y = 14  3,07  2,29 = 98,42;

r =

A negative correlation coefficient may mean that the longer the reaction time, the lower the performance. However, its value is too small to allow us to talk about a reliable relationship between these two variables.


(n- 1)S X S Y = ……

What conclusion can be drawn from these results? If you think there is a relationship between the variables, is it direct or inverse? Is it reliable [see table 4 (in addition B. 5) with critical values r]?

Spearman's rank correlation coefficientr s

This coefficient is easier to calculate, but the results are less accurate than when using r. This is due to the fact that when calculating the Spearman coefficient, the order of the data is used, and not their quantitative characteristics and intervals between classes.

The point is that when using the rank correlation coefficient Spearman(r s ) they only check whether the ranking of data for any sample will be the same as in a number of other data for this sample, pairwise related to the first (for example, will students be “ranked” equally when they take both psychology and mathematics, or even with two different psychology teachers?). If the coefficient is close to + 1, then this means that both series are practically identical, and if this coefficient is close to - 1, we can talk about a complete inverse relationship.

Coefficient r s calculated by the formula

Where d- the difference between the ranks of conjugate feature values ​​(regardless of its sign), and n-number of pairs

Typically, this nonparametric test is used in cases where it is necessary to draw some conclusions not so much about intervals between the data, how much about them ranks, and also when the distribution curves are too asymmetrical and do not allow the use of parametric criteria such as coefficient r(in these cases it may be necessary to convert quantitative data into ordinal data).

Since this is the case with the distribution of efficiency and reaction time values ​​in the experimental group after exposure, you can repeat the calculations that you have already done for this group, only now not for the coefficient r, and for the indicator r s . This will allow you to see how different the two indicators are*.

*It should be remembered that

1) for the number of hits, rank 1 corresponds to the highest, and 15 to the lowest performance, while for reaction time, rank 1 corresponds to the shortest time, and 15 to the longest;

2) ex aequo data are given a medium rank.

Thus, as in the case of the coefficient r, a positive, although unreliable, result was obtained. Which of the two results is more plausible: r =-0.48 or r s = +0.24? This question can only arise if the results are reliable.

I would like to emphasize once again that the essence of these two coefficients is somewhat different. Negative coefficient r indicates that the efficiency is often higher, the shorter the reaction time, whereas when calculating the coefficient r s it was necessary to check whether faster subjects always respond more accurately, and slower ones - less accurately.

Since in the experimental group after exposure a coefficient was obtained r s , equal to 0.24, a similar trend is obviously not visible here. Try to understand the data for the control group after the intervention on your own, knowing that  d 2 = 122,5:

; Is it reliable?

What is your conclusion?…………………………………………………………………………………………………………………


So, we have looked at various parametric and non-parametric statistical methods used in psychology. Our review was very superficial, and its main task was to make the reader understand that statistics are not as scary as they seem, and require mostly common sense. We remind you that the “experience” data we dealt with here is fictitious and cannot serve as a basis for any conclusions. However, such an experiment would really be worth conducting. Since a purely classical technique was chosen for this experiment, the same statistical analysis could be used in many different experiments. In any case, it seems to us that we have outlined some main directions that may be useful to those who do not know where to start with a statistical analysis of the results obtained.

IN scientific research Often there is a need to find a connection between outcome and factor variables (the yield of a crop and the amount of precipitation, the height and weight of a person in homogeneous groups by gender and age, heart rate and body temperature, etc.).

The second are signs that contribute to changes in those associated with them (the first).

Correlation analysis involves determining the relationship between the characteristics being studied, and therefore the tasks of correlation analysis can be supplemented with the following:

  • identification of factors that have the greatest influence on the resulting characteristic;
  • identification of previously unexplored causes of connections;
  • construction of a correlation model with its parametric analysis;
  • study of the significance of communication parameters and their interval assessment.

Displaying results

The results of correlation analysis can be presented in text and graphic forms. In the first case they are presented as a correlation coefficient, in the second - in the form of a scatter diagram.

In the absence of correlation between the parameters, the points on the diagram are located chaotically, the average degree of connection is characterized by a greater degree of order and is characterized by a more or less uniform distance of the marked marks from the median. A strong connection tends to be straight and at r=1 the dot plot is a flat line. Reverse correlation differs in the direction of the graph from the upper left to the lower right, direct correlation - from the lower left to the upper right corner.

3D representation of a scatter plot

In addition to the traditional 2D scatter plot display, a 3D graphical representation of correlation analysis is now used.

A scatterplot matrix is ​​also used, which displays all paired plots in a single figure in a matrix format. For n variables, the matrix contains n rows and n columns. The chart located at the intersection of the i-th row and the j-th column is a plot of the variables Xi versus Xj. Thus, each row and column is one dimension, a single cell displays a scatterplot of two dimensions.

An example of using the correlation analysis method

An interesting study was undertaken in the UK. It is devoted to the connection between smoking and lung cancer, and was carried out through correlation analysis. This observation is presented below.

Initial data for correlation analysis

Professional group


Farmers, foresters and fishermen

Miners and quarry workers

Manufacturers of gas, coke and chemicals

Manufacturers of glass and ceramics

Workers of furnaces, forges, foundries and rolling mills

Electrical and electronics workers

Engineering and related professions

Woodworking industries


Textile workers

Manufacturers of work clothes

Workers in the food, drink and tobacco industries

Paper and Print Manufacturers

Manufacturers of other products


Painters and decorators

Drivers of stationary engines, cranes, etc.

Workers not elsewhere included

Transport and communications workers

Warehouse workers, storekeepers, packers and filling machine workers

Office workers


Sports and recreation workers

Administrators and managers

Professionals, technicians and artists

We begin correlation analysis. For clarity, it is better to start the solution with a graphical method, for which we will construct a scatter diagram.

It demonstrates a direct connection. However, it is difficult to draw an unambiguous conclusion based on the graphical method alone. Therefore, we will continue to perform correlation analysis. An example of calculating the correlation coefficient is presented below.

Using software (MS Excel will be described below as an example), we determine the correlation coefficient, which is 0.716, which means a strong connection between the parameters under study. Let's determine the statistical reliability of the obtained value using the corresponding table, for which we need to subtract 2 from 25 pairs of values, as a result we get 23 and using this line in the table we find r critical for p = 0.01 (since these are medical data, a more strict dependence, in other cases p=0.05 is sufficient), which is 0.51 for this correlation analysis. The example demonstrated that the calculated r is greater than the critical r, and the value of the correlation coefficient is considered statistically reliable.

The correlation coefficient is the degree of relationship between two variables. Its calculation gives an idea of ​​whether there is a relationship between two data sets. Unlike regression, correlation does not predict the values ​​of quantities. However, calculating the coefficient is an important preliminary step statistical analysis. For example, we found that the correlation coefficient between the level of foreign direct investment and the GDP growth rate is high. This gives us the idea that in order to ensure prosperity, it is necessary to create a favorable climate specifically for foreign entrepreneurs. Not such an obvious conclusion at first glance!

Correlation coefficient formula

