Back to Blog

Math — Probability Distributions

·Reha Tuncer·Math
ProbabilityStatisticsDistributionsBinomialNormalPython
View source on GitHub

Math — Probability Distributions

A progressive study of four fundamental probability distributions implemented as Python classes — binomial, normal (Gaussian), Poisson, and exponential — with parameter estimation from data and PMF/PDF/CDF computation.


Learning Objectives

#Concept
1Implement the binomial distribution: Bernoulli trials, nn and pp parameters
2Implement the normal (Gaussian) distribution: mean μ\mu, standard deviation σ\sigma
3Implement the Poisson distribution: rate parameter λ\lambda, counting processes
4Implement the exponential distribution: rate parameter λ\lambda, waiting times
5Estimate distribution parameters from data using the method of moments
6Compute PMF (probability mass function) for discrete distributions
7Compute PDF (probability density function) for continuous distributions
8Compute CDF (cumulative distribution function) for all four distributions
9Convert between z-scores and x-values on the normal curve

Task-by-Task Reference

Each task below highlights the unique challenge it posed and the new technique introduced — techniques from earlier tasks are not repeated.


Task 0 — Binomial Distribution (binomial.py)

Challenge: Model the number of successes in nn independent Bernoulli trials, each with probability pp — implementing the binomial PMF from scratch using combinatorial formulas.

Approach: The constructor accepts either explicit nn and pp or estimates them from data. From data, compute the mean, then variance, then solve for p=1σ2/μp = 1 - \sigma^2/\mu and n=round(μ/p)n = \text{round}(\mu/p). The PMF computes (nk)pk(1p)nk{n \choose k} p^k (1-p)^{n-k} using iterative factorial accumulation to avoid overflow.

New techniques introduced:

TechniquePurpose
Method of moments estimationEstimate nn and pp from sample mean and variance
Iterative binomial coefficientCompute (nk){n \choose k} without factorials via product
round() vs int() for parameter estimationRound nn to nearest integer (not truncate)
self.p = float(p), self.n = int(n)Explicit type casting for distribution parameters

Key takeaway: The binomial distribution models "number of successes in nn trials." Parameters can be estimated from data: p=1variance/meanp = 1 - \text{variance}/\text{mean}, then n=round(mean/p)n = \text{round}(\text{mean}/p).


Task 1 — Normal Distribution (normal.py)

Challenge: Model the bell-shaped Gaussian distribution and compute probabilities on it — implementing the PDF formula f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} from scratch.

Approach: Store μ\mu (mean) and σ\sigma (stddev) as floats. Provide z_score(x) to convert x-values to z-scores, x_value(z) to convert back, pdf(x) for the density, and cdf(x) using the error function approximation. Class constants e and pi are hardcoded for precision control.

New techniques introduced:

TechniquePurpose
z = (x - mean) / stddevStandardize a value to z-score (number of stddevs from mean)
x = stddev * z + meanReverse standardization: z-score back to raw value
Gaussian PDF formulaf(x)=1σ2πexp((xμ)22σ2)f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
CDF via error function approximationCompute cumulative probability using polynomial approximation of erf
Class constants e and piPre-defined mathematical constants at module level

Key takeaway: The normal distribution is defined by μ\mu (center) and σ\sigma (spread). Z-scores standardize any normal to N(0,1)\mathcal{N}(0,1). The CDF answers "what's the probability of being below x?"


Task 2 — Poisson Distribution (poisson.py)

Challenge: Model the number of events occurring in a fixed interval — implementing the Poisson PMF P(k)=λkeλk!P(k) = \frac{\lambda^k e^{-\lambda}}{k!} and CDF as a sum.

Approach: The rate parameter λ\lambda (lambtha) is either given or estimated as the sample mean. The PMF computes λkeλ/k!\lambda^k e^{-\lambda} / k! using iterative factorial accumulation. The CDF sums PMF values from 00 to kk using the same iterative factorial approach for efficiency.

New techniques introduced:

TechniquePurpose
λ=1nxi\lambda = \frac{1}{n}\sum x_iEstimate Poisson rate as the arithmetic mean of the data
Iterative k!k! accumulationCompute factorial incrementally to avoid recomputation
CDF = j=0kPMF(j)\sum_{j=0}^{k} \text{PMF}(j)Cumulative probability is the sum of individual PMF values

Key takeaway: The Poisson distribution models count data — "how many events in a fixed interval?" λ\lambda is both the mean AND the variance. The PMF uses eλe^{-\lambda} as the base probability of zero events.


Task 3 — Exponential Distribution (exponential.py)

Challenge: Model the waiting time between events in a Poisson process — implementing the exponential PDF f(x)=λeλxf(x) = \lambda e^{-\lambda x} and CDF F(x)=1eλxF(x) = 1 - e^{-\lambda x}.

Approach: The rate λ\lambda is either given or estimated as 1/mean1/\text{mean} of the data (the reciprocal of the sample mean). The PDF computes λeλx\lambda e^{-\lambda x} directly. The CDF uses 1eλx1 - e^{-\lambda x} — a simple closed form, unlike the Poisson which requires summation.

New techniques introduced:

TechniquePurpose
λ=1/xˉ\lambda = 1 / \bar{x}Estimate exponential rate as reciprocal of sample mean
Exponential PDF: λeλx\lambda e^{-\lambda x}Memoryless continuous distribution for waiting times
Exponential CDF: 1eλx1 - e^{-\lambda x}Closed-form cumulative probability — no summation needed

Key takeaway: The exponential distribution is the continuous counterpart to the discrete Poisson. It models waiting times with the "memoryless" property: P(X>s+tX>s)=P(X>t)P(X > s+t \mid X > s) = P(X > t). The rate λ\lambda is the inverse of the expected waiting time.


Technique Inventory

TaskNew technique summarizedCategory
0Binomial PMF, method of moments for nn and ppDiscrete Distributions
1Gaussian PDF/CDF, z-score standardization, erf approximationContinuous Distributions
2Poisson PMF/CDF, λ\lambda as rate, iterative factorial summationDiscrete Distributions
3Exponential PDF/CDF, λ=1/xˉ\lambda = 1/\bar{x}, memoryless propertyContinuous Distributions

Resources