# Machine Learning with Python : Statistics

- Categories Machine Learning
- Date 31/05/2018
### Mean, Mode, Median

The mean is the average of a data set.

The mode is the most common number in a data set.

The median is the middle of the set of numbers.

#### What is the Mean?

In statistics, the mean is the average of a set of data.

To find the mean, sum all the numbers and then divide by the number of items in the set. For example, to find the mean of the following set of numbers: 21, 23, 24, 26, 28, 29, 30, 31, 33

First add them all together:

21 + 23 + 24 + 26 + 28 + 29 + 30 + 31 + 33 = 245

Then divide your answer by the number of items in your set. There are 9 numbers, so:

245 / 9 = 27.222

#### What is the Mode?

The mode is the most common number in a set. For example, the mode in this set of numbers is 21:

21, 21, 21, 23, 24, 26, 26, 28, 29, 30, 31, 33

#### What is the Median?

The median is the middle number in a data set. To find the median, list your data points in ascending order and then find the middle number. The middle number in this set is 28 as there are 4 numbers below it and 4 numbers above:

23, 24, 26, 26, 28, 29, 30, 31, 33

Note: If you have an even set of numbers, average the middle two to find the mean. For example, the mean of this set of numbers is 28.5 (28 + 29 / 2).

23, 24, 26, 26, 28, 29, 30, 31, 33, 34

### Normal Distribution

A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. For example, the bell curve is seen in tests like the SAT and GRE. The bulk of students will score the average (C), while smaller numbers of students will score a B or D. An even smaller percentage of students score an F or an A. This creates a distribution that resembles a bell (hence the nickname). The bell curve is symmetrical. Half of the data will fall to the left of the mean; half will fall to the right.

Many groups follow this type of pattern. That’s why it’s widely used in business, statistics and in government bodies like the FDA:

- Heights of people.
- Measurement errors.
- Blood pressure.
- Points on a test.
- IQ scores.
- Salaries.

#### Properties of a normal distribution

- The mean, mode and median are all equal.
- The curve is symmetric at the center (i.e. around the mean, μ).
- Exactly half of the values are to the left of center and exactly half the values are to the right.
- The total area under the curve is 1.

### Standard Deviation

Standard deviation is a measure of dispersement in statistics. “Dispersement” tells you how much your data is spread out. Specifically, it shows you how much your data is spread out around the mean or average. For example, are all your scores close to the average? Or are lots of scores way above (or way below) the average score?

#### What Does it Look Like on a Graph?

The bell curve (what statisticians call a "normal distribution") is commonly seen in statistics as a tool to understand standard deviation.

The following graph of a normal distribution represents a great deal of data in real life. The mean, or average, is represented by the Greek letter μ, in the center. Each segment (colored in dark blue to light blue) represents one standard deviation away from the mean. For example, 2σ means two standard deviations from the mean.

