# How to calculate variance: 15 steps (with pictures)

Variance is a measure of the dispersion of a data series. A low variance indicates that the numbers in the data series are close to each other. A high variance indicates that the numbers are very distant. This concept is widely used in statistics. For example, comparing the variance between two sets of data (such as a male and female patient outcome) is one way to check whether a variable has a remarkable effect. Variance is also useful in creating statistical models, as low variance can be a sign of over-tuning.

## Steps

### Method 1 of 2: Calculate the variance of an example

#### Step 1. Write your sample number series

In most cases, statisticians only have access to a sample or a subset of the population they are studying. For example, instead of analyzing the cost of each car in the population of Germany, a statistician could find the cost of a random sample of a few thousand cars. He can use this sample to get a good estimate of German car costs, but it might not exactly match actual costs.

• Example:

the analysis of the number of muffins sold each day in a cafeteria allowed you to obtain this random sample over 6 days: 17, 15, 23, 7, 9, 13. This series of numbers is a sample and not a population, since you do not have the number of muffins sold per day since the cafeteria opened.

• If you have all the daily data since the cafeteria was created, you can calculate the population variance as described in the second method of this article.

#### Step 2. Write down the formula for the sample variance

The variance of a data series indicates how far apart the data is. The closer the variance is to zero, the closer the data is to each other. When working with sample data series, use the following formula to calculate the variance.

• s2 { displaystyle s ^ {2}}

= ∑[(xi{displaystyle x_{i}}

- x̅)2{displaystyle ^{2}}

]/(n - 1)

• s2{displaystyle s^{2}}
• est la variance. La variance est toujours mesurée en unité carrée.

• xi{displaystyle x_{i}}
• représente un terme de votre série de données.

• Le signe ∑, signifiant la somme indique de calculer les termes suivants pour chaque valeur de xi{displaystyle x_{i}}
• , puis de les additionner.

• x̅ est la moyenne de l'échantillon.
• n est le nombre de données de la série.

#### Step 3. Calculate the mean of a sample

The symbol x̅ or x-bar refers to the average of a sample. Calculate this like any other average: sum all the data and then divide the result by the number of existing data.

• Example:

first, sum all the data values: 17 + 15 + 23 + 7 + 9 + 13 = 84.

Next, divide the result by the number of data items in the series, which in this case is: 84 ÷ 6 = 14.

Sample mean = x̅ = 14.

• You can think of the mean as the middle or center of the series. If the data is close to the mean, the variance is small. If it is the opposite, then the variance will be high.

#### Step 4. Subtract the mean of each data value

Now you need to calculate xi { displaystyle x_ {i}}

- x̅, où xi{displaystyle x_{i}}

représente chaque nombre de votre série de données. Chaque résultat indique la distance de ce nombre à la moyenne ou en d'autres mots, permet de savoir à quel point la valeur est distante de la moyenne.

• Exemple:

x1{displaystyle x_{1}}

- x̅ = 17 - 14 = 3

x2{displaystyle x_{2}}

- x̅ = 15 - 14 = 1

x3{displaystyle x_{3}}

- x̅ = 23 - 14 = 9

x4{displaystyle x_{4}}

- x̅ = 7 - 14 = -7

x5{displaystyle x_{5}}

- x̅ = 9 - 14 = -5

x6{displaystyle x_{6}}

- x̅ = 13 - 14 = -1

• Vous pouvez vérifier vos calculs, sachant que la somme des résultats devrait donner zéro. Cela est possible, car les réponses négatives (la distance entre la moyenne et les petites valeurs) annulent exactement les réponses positives (distance entre la moyenne et les plus grandes valeurs).

#### Step 5. Square each result

As said above, the sum of the deviations (xi { displaystyle x_ {i}}

- x̅) donne zéro. Cela signifie que la déviation moyenne sera toujours égale à 0, donc cela ne nous informe pas du niveau de dispersion de la série. Pour résoudre ce problème, élevez au carré chaque déviation. Ainsi, tous les nombres seront positifs et leur somme ne donnera plus zéro.

• Exemple:

(x1{displaystyle x_{1}}

- x̅)2=32=9{displaystyle ^{2}=3^{2}=9}

(x2{displaystyle (x_{2}}

- x̅)2=12=1{displaystyle ^{2}=1^{2}=1}

92 = 81

(-7)2 = 49

(-5)2 = 25

(-1)2 = 1

• Vous avez maintenant la valeur (xi{displaystyle x_{i}}
• - x̅)2{displaystyle ^{2}}

pour chaque donnée de votre échantillon.

#### Step 6. Sum the values ​​squared

It's time to calculate the numerator of the formula: ∑ [(xi { displaystyle x_ {i}}

- x̅)2{displaystyle ^{2}}

]. Le signe ∑ indique la somme des valeurs suivant le signe pour chaque valeur de xi{displaystyle x_{i}}

. Vous avez déjà calculé (xi{displaystyle x_{i}}

- x̅)2{displaystyle ^{2}}

pour chaque valeur de xi{displaystyle x_{i}}

dans votre échantillon, donc tout ce que vous avez à faire c'est sommer tous les résultats.

• Exemple:

9 + 1 + 81 + 49 + 25 + 1 = 166.

#### Step 7. Divide by n - 1, where n is the number of data

Previously, statisticians only divided by when calculating the variance of a sample. This gives the mean value of the deviation squared, which perfectly matches the variance of our sample. But remember, the sample is just an estimate of a large population. If you take another random sample and do the same calculations, you will get a different result. To do this, dividing by n - 1 instead of n will give you a better estimate of the variance of the larger population, which is the aspect we are really interested in. This correction is so common that it is now accepted for the definition and calculation of the variance of a sample.

• Example:

there are six data in our sample, so n = 6.

The variance of the sample = s2 = 1666−1 = { displaystyle s ^ {2} = { frac {166} {6-1}} =}

33, 2.

#### Step 8. Understand the variance and the standard deviation

Note that since there is an exponent in the formula, the variance is measured in the square unit of the original data. It can be difficult to understand intuitively. Instead, it's often more convenient to use the standard deviation which is still the square root of the variance. This is why the variance of a sample is written s2 { displaystyle s ^ {2}}

et l'écart-type s{displaystyle s}

### Méthode 2 sur 2: Calculer la variance d'une population

The term population refers to the total series of relevant observations. For example, if you are studying the age of residents of Texas, your population should include the age of each resident of Texas. You would normally create a table for such a large data series, but here is an example of a small data series.

• Example:

there are exactly 6 fish tanks in an aquarium chamber. The six tanks contain the following numbers of fish:

x1 = 5 { displaystyle x_ {1} = 5}

x2=5{displaystyle x_{2}=5}

x3=8{displaystyle x_{3}=8}

x4=12{displaystyle x_{4}=12}

x5=15{displaystyle x_{5}=15}

x6=18{displaystyle x_{6}=18}

#### Step 2. Write down the formula for the variance of the population

Since the population has all the data you need, this formula will give you the exact variance of the population. To distinguish this formula from that of the variance of a sample (which is only an estimate), statisticians use different variables.

• σ2 { displaystyle ^ {2}}

= (∑(xi{displaystyle x_{i}}

- μ)2{displaystyle ^{2}}

)/.

• Variance de la population = σ2{displaystyle ^{2}}
• . C'est un sigma minuscule au carré. La variance est mesurée en unité carrée.

• xi{displaystyle x_{i}}
• représente un terme dans votre série de données.

• Les termes après le ∑ seront calculés pour chaque valeur de xi{displaystyle x_{i}}
• , puis sommés.

• μ est la moyenne de la population.
• n est le nombre de données dans la population.

#### Step 3. Find the mean of the population

When you analyze a population, the symbol μ represents the arithmetic mean. To find the mean, add all the data, then divide the result by the number of data.

• Consider the example below.
• Example:

mean = μ = 5 + 5 + 8 + 12 + 15 + 186 { displaystyle { frac {5 + 5 + 8 + 12 + 15 + 18} {6}}}

= 10, 5.

#### Step 4. Subtract the mean of each data

For data close to the mean, the result will be close to zero. Repeat the subtraction problem for each piece of data and you will start to get a feel for the dispersion of the data.

• Example:

x1 { displaystyle x_ {1}}

- μ = 5 - 10, 5 = -5, 5

x2{displaystyle x_{2}}

- μ = 5 - 10, 5 = -5, 5

x3{displaystyle x_{3}}

- μ = 8 - 10, 5 = -2, 5

x4{displaystyle x_{4}}

- μ = 12 - 10, 5 = 1, 5

x5{displaystyle x_{5}}

- μ = 15 - 10, 5 = 4, 5

x6{displaystyle x_{6}}

- μ = 18 - 10, 5 = 7, 5.

#### Step 5. Square each answer

Now some of your values ​​from the previous step will be negative and some will be positive. If you represent your data on a row of numbers, negative responses represent numbers to the left of the mean and positive responses represent numbers to the right of the mean. This is not good for calculating variance, as the sum of these numbers will give zero. To avoid this, square each answer.

• Example:

(xi { displaystyle x_ {i}}

- μ)2{displaystyle ^{2}}

pour chaque valeur de i de1 à 6:

(-5, 5)2{displaystyle ^{2}}

= 30, 25

(-5, 5)2{displaystyle ^{2}}

= 30, 25

(-2, 5)2{displaystyle ^{2}}

= 6, 25

(1, 5)2{displaystyle ^{2}}

= 2, 25

(4, 5)2{displaystyle ^{2}}

= 20, 25

(7, 5)2{displaystyle ^{2}}

= 56, 25.

Now you have a value for each data related (indirectly) to the distance of that data from the mean. Calculate the value of these results by summing them and then dividing them by the number of values.

• Example:

population variance = 30, 25 + 30, 25 + 6, 25 + 2, 25 + 20, 25 + 56, 256 = 145, 56 = { displaystyle { frac {30, 25 + 30, 25 + 6, 25 + 2, 25 + 20, 25 + 56, 25} {6}} = { frac {145, 5} {6}} =}

24, 25.

#### Step 7. Connect this to the formula

If you don't know how to relate this to the formula provided at the beginning of this method, try writing the problem in detail.

• After calculating the difference between the mean and the squared value, you have the values ​​(x1 { displaystyle x_ {1}}

- μ)2{displaystyle ^{2}}

, (x2{displaystyle x_{2}}

- μ)2{displaystyle ^{2}}

et ainsi de suite jusqu'à (xn{displaystyle x_{n}}

- μ)2{displaystyle ^{2}}

, où xn{displaystyle x_{n}}

est la dernière donnée dans la série.

• pour trouver la moyenne de ces valeurs, il suffit de les additionner puis les diviser par n: ((x1{displaystyle x_{1}}
• - μ)2{displaystyle ^{2}}

+ (x2{displaystyle x_{2}}

- μ)2{displaystyle ^{2}}

+ … + (xn{displaystyle x_{n}}

- μ)2{displaystyle ^{2}}

) / n

• après avoir récrit le numérateur sous la notation du sigma, vous aurez (∑(xi{displaystyle x_{i}}
• - μ)2{displaystyle ^{2}}

)/, qui est la formule de la variance.

## conseils

• puisqu'il est difficile d'interpréter la variance, cette valeur est généralement calculée comme point de départ pour le calcul de l'écart-type.
• l'utilisation de n-1 au lieu de n dans le dénominateur lors de l'analyse des échantillons est une technique appelée fonction de bessel. l'échantillon n'est qu'une estimation de la population entière et la moyenne de l'échantillon est biaisée pour correspondre à cette estimation. cette correction de bessel supprime ce biais.. cela s'explique par le fait qu'une fois que vous avez énuméré n - 1 données, le dernier n est déjà limité, puisque seules certaines valeurs sont prises en compte dans le calcul de la moyenne de l'échantillon (x̅) utilisé dans la formule de la variance.