Join me on Facebook!

— Written by Triangles on October 16, 2015 • updated on November 06, 2015 • ID 19 —

An overview of basic probability concepts, histograms, discrete/continuous variables.

A *probability distribution* shows how each outcome of a random phenomenon is linked to its likeliness. For example, say you are measuring the rain levels and you want to know the exact amount of rain for the next week. Given a set of possible outcomes (from 0 to 100 millimiters for example), a probability distribution will tell you the likeliness for each one of those outcomes. From impossible (0) to certain (1).

We previously met the random variables, whose aim is to store the result of a random phenomenon. Hence they are naturally linked to probability distributions: they contain the data that will be "processed" by the probability distribution in order to churn out the likeliness factor.

Distributions are often represented as tables, histograms or formulas. There are dozens of type of distributions out there but the big, basis discriminant is whether your data is *continuous* or *discrete*.

The probability distribution of a discrete random variable is called the **probability function** or the **probability mass function** (aka **PMF**). It's a function that receives in input a value from the random variable, noted as §x§ and spits out the probability for that value to happen, noted as §p(x)§ or §p_X(x)§ (the subscript just tells you what random variable you are working with).

By definition, the PMF §p_X(x)§ is the probability that our random variable takes the value §x§:

§ p_X(x) = P(X = x) §

That's actually pretty obvious and you will notice it in the example below. The PMF, to be valid, must satisfy a couple of conditions:

- §0 <= p_X(x) <= 1§ — the function must always churn out positive values and never greater than 1;
- §sum_{x}p_X(x) = 1§ — the sum of all single outputs must be exactly 1.

Say you have a digital picture made of 4 total colors: red, green, blue and white, with a random variable X defined as

§ X = {(0, "red"), (1, "green"), (2, "blue"), (3, "white"):} §

and the associated PMF function below:

§ p_X(x) = {(0.23, if x=0), (0.22, if x=1), (0.12, if x=2), (0.43, if x=3):} §

You don't know the actual picture, but someone told you that the PMF function is defined like that and you believe him. Hence if you pick a random color from that picture there is a chance of, for example, 0.43 (or 43%) of picking up a white pixel. So the PMF of white turns out to be

§ p_X(3) = P(X = 3) = 0.43 §

We can also plot a histogram of the probability distribution for each color in the image.

The probability distribution of a continuous random variable is called the **probability density function** (aka **PDF**). Unlike the PMF, a PDF cannot work with single, specific values. Let's go back to the opening example and suppose you want to know the probability that the amount of rain for the next week will be, say, 1.5 millimeters. Not 1.50001 or 1.49999, but the exact value of 1.5. Since such measurement would require an infinite amount of precision (we are working with continuous random variables!), the probability drops to zero. For this very reason the PDF works with ranges of values instead.

Formally you have a probability density function §rho_X(x)§ (that's the greek letter *rho*) for the random variable X and you ask it for the probability that X falls within a specific range by taking the integral of the PDF between the two points of that range:

§ int_{a}^{b} rho_X(x) dx = P(a < X < b) §

The integral does the trick for the range issue: integrating over a single point always produces zero.

On the other hand a probability density function shares similarities with the probability mass function, except that we have to deal with integrals instead of sums:

- §0 <= rho_X(x) <= 1 § — the function must always churn out positive values and never greater than 1;
- §int rho_X(x) dx = 1§ — the total probability for all possible values of the continuous random variable X is 1.

Let's go back again to the initial scenario. Our continuous random variable X contains the amount of rain for the next week:

§ X = "exact amount of rain for the next week" §

Since we are playing with numbers, I don't know how the actual probability distribution function looks like, and what numbers it could spit out. Let's fake a §rho_X(x)§ for the purposes of this example.

The horizontal axis is the amount of rain (in millimiters), the vertical axis the probability (from 0 to 1) of getting that amount of rain. Now you could ask yourself, for example, what is the probability that the amount of rain would be between 1.5 and 2.0 millimiters:

§ P(1.0 < X < 2.0) = int_{1.0}^{2.0} rho_X(x) dx ~= 0.6 §

Which means that there is a 0.6 probability (or 60%) of measuring between 1.0 and 2.0 millimiters of rain in the next week.

Graphically you are asking for the area of the curve in the interval [1.0, 2.0]. That's why looking for a specific and precise probability value, like P(X = 1.0) does not make sense. It would be like asking for the area of a line, which is equal to zero.

The **skewness** (§s§) defines how much a distribution is symmetric around the mean. In particular:

- §s = 0§: the distribution is symmetric;
- §s > 0§: the distribution is asymmetric with a long tail to the right;
- §s < 0§: the distribution is asymmetric with a long tail to the left.

The **kurtosis**, from Greek κυρτός, kurtos, meaning "curved", defines how much a distribution is "tailed" and "peaky". Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution.

Statistic Glossary - *Random variables and probability distributions* (link)

Khan Academy - *Probability density functions* (link)

MIT OpenCourseWare - *Discrete Random Variables* (link)

Math Insight - *The idea of a probability distribution* (link)

comments