Python – Statistics – Probability & Sample Distribution

Probability & Sample Distribution

Conditional Probabilities

You’re testing for a disease and advertising that the test is 99% accurate:

    • if you have the disease, you will test positive 99% of the time,
    • if you don’t have the disease, you will test negative 99% of the time.

Let’s say that 1% of all people have the disease and someone tests positive.

What’s the probability that the person has the disease? Select the correct set up for this problem.

Bayes’ theorem applied

You have two coins in your hand. Out of the two coins, one is a real coin (heads and tails) and the other is a faulty coin with tails on both sides.

You are blindfolded and forced to choose a random coin and then toss it in the air. The coin lands with tails facing upwards. Find the probability that this is the faulty coin.

Central Limit Theorem

With a large enough collection of samples from the same population, the sample means will be normally distributed.
Note that this doesn’t make any assumptions about the underlying distribution of the data; with a reasonably large sample of roughly 30 or more, this theorem will always ring true no matter what the population looks like.
Central Limit theorem matters because it promises our sample mean distribution will be normal , therefore we can perform hypothses tests.
More concretely we can assess the likelihood that a given mean came from a particular distribution and then, based on this, reject or fail to reject our hypothesis? This empowers all of the A/B testing you see in practice.

 

Law of Large Numbers

The law of large numbers states that as the size of a sample is increased, the estimate of the sample mean will be more accurately reflect the population mean

 

Handy tool –> list comprehension

 

Samples of a rolled dice

The law of large numbers is show here –> the more f so, you’re correct! It’s important to distinguish between the law of large numbers and central limit theorem in interviews.

SImulating central limit theorem

 

 

Whether we took 100 or 1000 sample means, the distribution was still approximately normal. This will always be the case when we have a large enough sample (typically above 30). That’s the central limit theorem at work.

 

Probability Distributions

What is it ?

  • describe the likelihood of an outcome
  • the probabilities must all add up to 1 and can be
    • discrete ( like the roll of a dice )
    • continuous ( like the amount of rainfall

 example of a continuous probability distribution where the total area under the curve adds up to 1)

  • Most common :
  • Most common :
    • Bernoulli
    • Binomial
    • Poisson
    • Normal
      –> use the rvs in scipy to simulate all the distributions, visualize with matplotlib

Bernoulli distribution

Discrete distribution that models the probability of two outcome

plt.hist(bernouilli.rvs(p=0.5, size= 1000))

Both heads and tails have the same probability of 0.5, so the values are even in this sample
Since there only 2 possible outcomes in Bernouilli, the probability of one is always 1 minus the probability of the other.

 

 

 

Binomial distribution

The sum of the outcomes of multiple Bernouilli trials, meaning those have an established success and failure.

plt.hist(binom.rvs(2, 0.5, size = 10000))

–> Result of a sample representing the number of heads in two consecutive coint flips using a fair coin, taking the form of a binomial distribution

It’s used to model the number of successful outcomes in trials where there is some consistent probability of success.
These parameters are often referred as
k – number of sucessess
n – number of trials
p – probability of success

For this exercise, consider a game where you are trying to make a ball in a basket. You are given 10 shots and you know that you have an 80% chance of making a given shot. To simplify things, assume each shot is an independent event.

 

Normal distribution

The normal distribution is a bell-curve shaped continuous probability distribution

The overlay is serving as reminder for the
68,2 % – 95 & 99 % –>
68 % falls within 1 st.deviation
95 % falls within 2 st.devs
99 % falls within 3 st devs

 

Poisson distribution

Like binomial distribution the Poisson distribution represents a count or number of times something happened.

It’s calculated not by a probability p and number of trials n, but by an average rate shown by lambda
As the rate of events changes, the distribution changes as well

In a 15 min interval, there is a 20% probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of 1 hour ?