Understanding Probability With Python

Published: Fri 09 December 2022
By Alex

In python.

Probability is a branch of mathematics that is often used to make decisions and is concerned with measuring uncertainty.

Introduction

Sets

Set data types in Python have rules similar to set in mathematics: collections are unordered, unchangeable (only removal or addition is applicable), store unique items, and are unindexed.

Experiments and Event

Experiment return values for observation(s), and observations have some level of uncertainty. Single possible outcome of an experiment is a sample point in a set called sample space. Set sample space stores all possible sample points for one experiment. If your experiment is a set of n sample points the full sample space is written as follows for example of coin flip:

my_set = {'H','T'} # set of sample points: heads and tails

Law of Large Numbers

We often assign probabilities to events occuring (outcomes): chance that it will rain tomorrow, candidate x will win an election, coin turns heads. For some of these events our random experiment can be done a large number of times to avoid repeat it an infinite amount of times but still being 'reliable' about all possible outcomes. As with coin therefore, the proportion of heads in a "large" number of coin flips "should be" roughly 1/2. https://en.wikipedia.org/wiki/Law_of_large_numbers One good application of this law is made in Joel Grus book "Data Science from scratch", I added code from it below. Source: https://github.com/joelgrus/data-science-from-scratch/blob/master/scratch/probability.py

import enum, random

# An Enum is a typed set of enumerated values. We can use them
# to make our code more descriptive and readable.
class Kid(enum.Enum):
    BOY = 0
    GIRL = 1

def random_kid() -> Kid:
    return random.choice([Kid.BOY, Kid.GIRL])
both_girls = 0
older_girl = 0
either_girl = 0

random.seed(0)

for _ in range(10000):
    younger = random_kid()
    older = random_kid()
    if older == Kid.GIRL:
        older_girl += 1
    if older == Kid.GIRL and younger == Kid.GIRL:
        both_girls += 1
    if older == Kid.GIRL or younger == Kid.GIRL:
        either_girl += 1

print("P(both | older):", both_girls / older_girl)     # 0.514 ~ 1/2
print("P(both | either): ", both_girls / either_girl)  # 0.342 ~ 1/3

Coin Flip Again

Here I simulates flipping coin many times: we begin to flip a coin, and record the observations of either heads or tails. As the result we see that probability of getting 50/50 converge to its 'true' probability. By simulating a huge number of flips in Python, the true probability of seeing head or tail is equal to 0.5.

Set Operations: Union, Intersection and Complement.

Probability of n different events expressed using union, intersection and complement of set. When you predict the probability of two different events BOTH occuring you need to take into account different rules of probability (applicable to even more than two random events). In set theory 3 interesting features are useful when you look at two sets that may or not have common elements: Union, Intersection, and Complement(observations in a, but not in b).

  • The union of two events mathematically is (A or B),
  • The intersection (A and B) - any element that exist in both of the sets,
  • The complement is all possible outcomes outside of the given set. A set with its complement represent the entire sample space. In die roll example, the set of even numbers and odd numbers would cover all possible rolls: {1, 2, 3, 4, 5, 6}.
In [66]:
A = {1, 2, 3}
B = {3, 4, 5}
C =  {7, 8, 9, 10}

# Union of sets
# union of 2 sets
AUB = A|B # A.union(B)
# union of 3 sets
AUBUC = A|B|C # A.union(B, C)
print('\n','A:', A,'\n',
      'B:',B,'\n',
      'C:',C,'\n','\n',
      'Unions:', '\n',
      "A U B:", AUB,'\n',
      "A U B U C :", AUBUC,'\n')
# Intersection of sets
A_n_B = A & B
A_n_B_n_C = A & B & C
print('\n','Intersections:',
      '\n',"A AND B:", A_n_B,
      '\n', "A_n_B_n_C:", A_n_B_n_C)

# Disjunctive Union (Symmetric Difference)
A_xor_B = A ^ B
A_xor_B_xor_C = A ^ B ^ C
print('\n','Disjunctive Union (Symmetric Difference):',
      '\n',"A_xor_B:", A_xor_B,
      '\n',"A_xor_B_xor_C:", A_xor_B_xor_C)

# Complement (set difference) Ac (or A′)
print('\n','Complement:', 
      '\n','set of elements in A that are not in B - Bc:',A - B,
      '\n','set of elements in B that are not in A - Ac:',B - A)
 A: {1, 2, 3} 
 B: {3, 4, 5} 
 C: {8, 9, 10, 7} 
 
 Unions: 
 A U B: {1, 2, 3, 4, 5} 
 A U B U C : {1, 2, 3, 4, 5, 7, 8, 9, 10} 


 Intersections: 
 A AND B: {3} 
 A_n_B_n_C: set()

 Disjunctive Union (Symmetric Difference): 
 A_xor_B: {1, 2, 4, 5} 
 A_xor_B_xor_C: {1, 2, 4, 5, 7, 8, 9, 10}

 Complement: 
 set of elements in A that are not in B - Bc: {1, 2} 
 set of elements in B that are not in A - Ac: {4, 5}

Independence vs Dependence of an Event

Event 2 is independent if knowing that event 1 occurs do not affect the probability of event2. Otherwise events are dependent. The independence/dependence of events, helps us to get the knowledge about the probability by updating collection of outcomes.The probability of events depend on preexisting knowledge about how the probable outcomes evolves.

Box with Socs

An example of box with socks to illustrate independence/dependence:

In [67]:
from IPython.display import Image
Image(r'C:\thisAKcode.github.io\images\probability_dependent.jpg', width = 400) 
Out[67]:

On a picture you see the box with five socs: two socs are blue and three socs are red. If we pick up one soc out of the box, what is the probability that the second soc we take out is blue? In situation A we pick one soc and depending on a color we get the probability of gettin the blue soc next is 2/4 or 1/4. In situation B we pick up a soc, look at the color then we put it back, and the probability of getting next time the blue one is not changing (2/5 in both cases). In case A events are dependent and are affected by each other.

Card Deck

Another example is about picking a card from deck of cards without returning it back.

Rolling a Die

We roll a die twice: event A (we get a 6 on the first roll) is independent of an event B (we roll a 6 on the second roll). Are events A and B independent?

Flipping a Coin

Coin flips are always independent of previous flips. Suppose we flip a coin four times where A is that we flip 2 heads on the first 2 flips. Event B is that we flip 2 heads on the second 2 flips.

Mutually Exclusive Events

Here is an example of mutually exclusive events: you have a box with 10 items of 2 different colors. The probability that the randomly selected item is either red or blue is simply the sum of the probabilities of each individual event. The probability of drawing a red item is 5/10 = 1/2, and the probability of drawing a blue item is also 5/10 = 1/2. Therefore, the probability of drawing either type is 1/2 + 1/2 = 1. Using Venn-diagram two mutually exclusive events shown as a pair of non-overlapping circles, meaning that no outcome for one event that is in the sample space for the other.

Not Mutually Exclusive Event

Event A: Rolling an odd number. Event B: Rolling a number greater than three.

Addition Rule

Addition rule describes the probability one event OR another event (or both) occurs. Get a six-sided die. What if we want to find the probability of event A - rolling an odd number P(A) and event B - rolling a number greater than two P(B) or both events occurring P(A and B)?

This is the probability of the union of A and B where events aren't mutually exclusive, the addition rule formula with substracted intersection of events A and B (once substracted since included twice in the addition of P(A) and P(B)):

P(A or B)=P(A)+P(B)−P(A and B) If the events are mutually exclusive then you still use the same formula where P(A and B) is 0. or you can remove the substraction since the intersection is empty nothing to remove.

Here is the code representation of this rule:

def prob_A_or_B(A, B, all_possib):
    # probability of event a
    P_A = len(A)/len(all_possib)

    # probability of event b
    P_B = len(B)/len(all_possib)

    # intersection of events a and b
    inter = A.intersection(B)

    # probability of intersection of events a and b
    P_inter = len(inter)/len(all_possib)

    # add return statement here
    return P_A + P_B - P_inter

odds = {1, 3, 5}
greater_than_two = {3, 4, 5, 6}
all_possible_rolls = {1, 2, 3, 4, 5, 6}
if __main__ == "__name__":
    prob_A_or_B(odds, greater_than_two, all_possible_rolls)

On the picture below you see the Venn diagram for the addition rule when both events are occuring at the same time versus mutually exclusive events.

In [1]:
from IPython.display import Image
Image(r'C:\thisAKcode.github.io\Pelican\content\images\proba_add.jpg', width = 200) 
Out[1]:

Conditional Probability

Two dependent events are illustrated with sox example in section "Independence vs Dependence of an Event". If the probability of the second event depends on whether first event occured or not: this probability is described as conditional probability.

Given That Vertical Line...

The probability of one event occurring, given that another one has already occurred is measured by conditional probability. The word "given" denoted with a vertical line: P(Red Second ∣ Blue First)

In [6]:
from IPython.display import Image
Image(r'C:\thisAKcode.github.io\images\probability_dependent.jpg', width = 200) 
Out[6]:

In the diagram A (we do not put back the socs) we know that: P(Red Second|Blue First) = $\frac{3}{4}$

In the diagram B (we pick up socs then it will be put back) the conditional probability of picking out a red soc second or a blue one second is unaffected by the first color:

P(Red Second|Blue First) = P(Red Second)

and

P(Blue First|Red Second) = P(Blue First)

Multiplication Rule For Simultaneous Events

Multiplication rule describes the probability that two events, A and B, happen simultaneously: P(A and B) which is the probability of the intersection of A and B. The general formula is: P(A and B)=P(A)⋅P(B∣A)

Dependent Events

We have box of five socs example: two are blue, and three are red. What if we want to know the probability of choosing a blue soc first AND a blue one second if we pick two without replacement? Those events are dependent and their.

P(Blue 1st and Blue 2nd)=P(Blue 1st)⋅P(Blue 2nd∣Blue 1st)

P(Blue 1st and Blue 2nd)= $\frac{2}{5}*\frac{1}{4}$

Three Diagram

Below I want to illustrate how to use tree diagrams to map out possible outcomes. Each branch of it represents a specific set of events. All possible sets of outcomes sum to one. To calculate the probability that given branch (set of outcomes) will occur we multiply probabilities across that branch.

In [5]:
from IPython.display import Image
Image(r'C:\thisAKcode.github.io\images\three_diagram.png', width = 400) 
Out[5]:

An equation for the product rule being: P(1st Red and 2nd Blue)=(prob1 ∗ prob2)=final_prob, when first selecting a red soc and then a blue one, you will see: P(1st Red and 2nd Blue)=(0.6∗0.5)=0.3

Independent Events

The probability of two independent events occuring is easier to compute: P(A and B)=P(A)⋅P(B) beacuse of P(B∣A) = P(B). In case of fair coin the probability of getting tails on both flips would be:P(A and B)=0.5⋅0.5=0.25. In case of fair dice, the probability of getting 6 on both rolls would be: 1/6 * 1/6 = 1/36.

Example on Conditional Probability

As we saw previously conditional probability uses multiplication rule for independent or dependent events.

Suppose that the following is true :

20 percent of the population has social anxiety.
80 percent of the population does not have it.

Now suppose a group of people make an introvert test. The possible results of these tests are shown in the next set of branches.

If a person has anxiety, there is an 85% chance their test will be positive and a 15% chance it will be negative. This is labeled as:

P(INTR|ANX)=0.85
& 
P(EXTR|ANX)=0.15

If a person does not have anxiety, there is a 98% chance their test will identify an extrovert and a 2% chance it will be introvert. This can be labeled as:

P(EXTR|NO ANX)=0.98
& 
P(INTR|NO ANX)=0.02

Finally, let’s look at the four possible pairs of outcomes that form the marginal branches of our diagram:

P(ANX and INTR)=0.17
P(ANX and EXTR)=0.03
P(NO ANX and INTR)=0.016
P(NO ANX and EXTR)=0.784

Together, all potential outcomes add up to one.

It’s great that we have all this information. However, we are missing something. If someone gets a positive result, what is the probability that they have anxiety? Notationally, we can write this probability as:

P(ANX|INTR)
In [2]:
from IPython.display import Image
Image(r'C:\thisAKcode.github.io\images\three_diag_anx.png', width = 800)
Out[2]:

Look carefully on image under column event 2 we expressed a bunch of conditional probabilities. It remains to calculate how likely a person have anxiety if that person gets a positive introvert test result. Notationally, we can write this probability as: P(ANX|INTR).

We can use tree diagram to calculate the probability that patient have anxiety, given that patient tested positive for interovercy. Pretty silly example I know... you are a person who has tested positive for introvercy and you can feel that you have anxiety, but some metrics needed. That's why you are here.

Bayes’ theorem

Now, look how we solve this problem with the Bayes Theorem, which states the following:

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

$P(ANX|INTR) = \frac{P(INTR|ANX)P(ANX)}{P(INTR)}$

From diagram above we know: P(INTR∣ANX)=0.85.

We also know: P(ANX)=0.20

Is P(INTR) something we know? These are four possible outcomes regarding introvercy test:

Having anxiety and testing introvert             # <--relevant
Having anxiety and testing extrovert
Not having anxiety and testing introvert         # <--relevant 
Not having anxiety and testing extrovert

Two outcomes where a patient tests positives are relevant for P(INTR).

P(INTR)=P(ANX and INTR)+P(NOT ANX and INTR)

P(INTR)=0.17+.016

P(INTR)=0.186

P(ANX∣INTR)=(0.85⋅0.20)/0.186 =0.914

There is a 91.4% chance that you actually is anxious given your are introvert. This is not obvious from the information outlined in our tree diagram.

links

social