Solving Common Probability Problems with Python Pt.1 — Binomial

Scenario One:

from scipy import stats
import matplotlib.pyplot as plt
import pandas as pd

# Parameterize the case. Variable names are self explained.
total_trial = 5
yes_odds = 1 / 6
num_of_yes = 3
culm_less_equal = num_of_yes - 1

# Declare the statistics binom instance
binom = stats.binom(total_trial, yes_odds)

# Compute the probability of cumulative density of less and equal to success number of ONE(s). (0, 1, 2 times)
less_equal_cdf = binom.cdf(culm_less_equal)

# Remaining is cumulative of greater and equal than 3 ONEs. (3, 4, 5 times)
greater_cdf = 1 - less_equal_cdf
print(f"You have {round(greater_cdf * 100, 2)}% chance of having at least {num_of_yes} x ONEs. Good luck!")

# (optional below) Graph it.
# We want individual probability outcomes as list values.
# pmf() to get individual data point.

# Declare and initialize the empty data variable.
pmf_dict = {
"xtimes": [],
"probability": []
}

# Compute exact probability of 0xONE, 2xONE, 3xONE, ... so forth to 5xONE.
# add them to pmf dictionary. Using probability mass function (PMF)
for i in range(6):
pmf = binom.pmf(i)
pmf_dict["xtimes"].append(i)
pmf_dict["probability"].append(pmf)

# Plot. visualize the data
df = pd.DataFrame(pmf_dict)
print(df)
df.plot.bar(y="probability", x="xtimes")
plt.show()


# Simulation (optional) of counting test.
# Test for ever 100 rounds: how many success events (more than 3 ONEs) we have. do 10 x 100 rounds.
print("\n --- Simulation Starts --- \n")

for i in range(10):
event_count = 0
simulated = binom.rvs(100)

for j in range(100):
if simulated[j] >= 3:
event_count += 1

print(f"{event_count} TIMES of having at least three ONEs in 100 rounds.")

print("\n --- Simulation Ends --- \n")

Scenario Two:

Question to answer:

from scipy import stats
import matplotlib.pyplot as plt
import pandas as pd

# Parameterize the case.
total_trial = 5
yes_odds = 1 / 6

# Geometric Distribution instance
geom = stats.geom(yes_odds)

# Declare the empty data variable.
pmf_dict = {
"num_of_trial": [],
"probability": []
}

for i in range(total_trial):
pmf_dict["num_of_trial"].append(i + 1)
pmf_dict["probability"].append(geom.pmf(i + 1))

# Make the DataFrame instance.
df = pd.DataFrame(pmf_dict)
print(df)

# Plot
df.plot.bar(x="num_of_trial")
plt.show()

# Get the cumulative probability
cdf = geom.cdf(5)
print(f"\nFor each round(5 rolling), we have {round(cdf * 100, 2)}% chance of having a ONE.")
  • to have a ONE showing in the very first rolling is 16.67% which is very intuitive 1/6.
  • To have a ONE first time showing in the very 2nd rolling is 13.89%,
  • in 3rd 11.56%, in 4th 9.65%, in 5th 8.04%.

--

--

--

I occasionally write about software, web, blockchain, machine learning, random thoughts.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Journey of Building a Flight Delay Prediction System in India!

Pandas: Combining Data

Going Where our Team has not Gone Before

Forex indicators are useful in a variety of ways.

My Path to Becoming a Data Scientist

Nature, Society, and Mathematical Models

The Vanishing/Exploding Gradient Problem in Deep Neural Networks

Plot CDF using output of NumPy Histogram function

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Weiming Chen

Weiming Chen

I occasionally write about software, web, blockchain, machine learning, random thoughts.

More from Medium

How to do EDA in python

Data Science Immersive Stories Part I: Python Experiences

Decision Trees in Purchasing with Python

Life Expectancy and GDP