13 Distributions

A dataset is not only a collection of values.

It also has a shape.

Some values gather near a center.
Some stretch into long tails.
Some appear in clumps.
Some are symmetric.
Some are skewed.
Some are tightly concentrated.
Some are dominated by rare extremes.

A distribution is the formal way of describing that structure.

🏵 If variation tells us that values differ, a distribution tells us how they differ.

Not just that the world fluctuates, but in what form it fluctuates.

This is one of the great achievements of statistics: it does not treat variation as mere disorder.
It learns to recognize its geometry.

13.1 🔰 What a distribution describes

A distribution tells us how values are spread across possible outcomes.

Depending on the context, this may mean:

which outcomes are more likely
where values tend to cluster
how wide the spread is
whether the shape is symmetric
how heavy the tails are
whether the variable is discrete or continuous

So a distribution is not just a curve or a formula.

It is a structured answer to questions like:

Where do values tend to live?
How often do extremes occur?
Is there a typical value?
Is the process balanced or biased?
Are rare events negligible, or dominant?

13.2 ☯ Shape as meaning

Two datasets can share the same mean and variance and still behave very differently.

One may be symmetric.
Another may be sharply skewed.
One may have mild tails.
Another may allow violent extremes.

So statistics must go beyond center and spread.

A distribution is where shape enters reasoning.

It is the difference between:

a world of moderation
a world of concentration
a world of multiplicative growth
a world of repeated trials
a world where rare events matter more than average ones

⚜️ In this sense, distributions are not mere technicalities.
They are portraits of process.

13.3 ⚙️ Discrete and continuous distributions

Some variables move in separate steps.

Examples:

number of successes
number of emails
number of failures
count of arrivals

These are often described with discrete distributions.

Other variables vary across intervals.

Examples:

height
time
temperature
weight
voltage

These are often described with continuous distributions.

This distinction matters because the mathematics changes, but the philosophical idea remains the same:

a distribution tells how possibility is populated

13.4 💡 Why there are many distributions

No single shape can describe every process.

Different mechanisms produce different forms of variation.

repeated yes/no trials often lead to the Binomial
many small additive influences often lead to the Normal
uncertainty with small samples calls for the Student
multiplicative growth often leads to the Lognormal
concentration and rare extremes often suggest the Pareto

So a distribution is not chosen only because it is mathematically convenient.

It is chosen because it matches, or approximates, the structure of the process.

13.5 🪄 A distribution is a model

This is important.

A distribution is not the data itself.
It is a model of how the data may have been generated.

That means every distribution is both:

descriptive
interpretive

It describes visible shape, but it also proposes an organizing logic beneath that shape.

This is why distributions are so central to statistical modeling.

They connect data to mechanism.

13.6 🧪 From data to model

A distribution is a model.

But models do not appear from nowhere.

They are extracted from data.

The question is:

How does a collection of observations become a smooth curve?

13.6.1 🔰 Counting before understanding

At first, we just see observations.

We count them.

We group them.

We build a histogram.

Each bar answers a simple question:

how often did values fall here?

A summary showing where our data can be found.

13.6.2 ⚙️ Stability through repetition

As more data arrives, something changes.

The histogram stops jumping wildly.

Patterns begin to stabilize, like magic. But this is not magic.

It is a consequence of a deep principle:

The Law of Large Numbers.

As the number of observations increases, relative frequencies settle.

Not exactly, but enough to reveal structure.

13.6.3 ☯️ From bars to curves

Now something subtle happens.

If we refine the bins, and collect more data, the histogram begins to resemble a curve.

The bars dissolve into a shape.

13.6.4 💡 Interactive exploration

Try changing the number of observations and the number of bins.

Watch what happens.

Code

#| label: fig-hist-convergence
#| fig-cap: "Histogram approaching a probability density as data increases and bins are refined"

import numpy as np
import pandas as pd
import plotly.graph_objects as go

seed = 42
rng = np.random.default_rng(seed)

frames_spec = [
    (32, 8),
    (64, 10),
    (128, 12),
    (256, 16),
    (512, 20),
    (1024, 24),
    (2048, 30),
]

max_sample_size = max(sample_size for sample_size, _ in frames_spec)
full_sample = rng.normal(0, 1, max_sample_size)

x_grid = np.linspace(-4, 4, 400)
normal_density = (
    1 / np.sqrt(2 * np.pi)
) * np.exp(
    -0.5 * x_grid**2
)

first_sample_size, first_bins = frames_spec[0]
first_sample = full_sample[:first_sample_size]

fig = go.Figure(
    data = [
        go.Histogram(
            x = first_sample,
            histnorm = "probability density",
            nbinsx = first_bins,
            marker_color = "tomato",
            name = "Histogram"
        ),
        go.Scatter(
            x = x_grid,
            y = normal_density,
            mode = "lines",
            name = "Normal density"
        )
    ]
)

frames = []

for sample_size, bins_count in frames_spec:
    sample_prefix = full_sample[:sample_size]

    frames.append(
        go.Frame(
            data = [
                go.Histogram(
                    x = sample_prefix,
                    histnorm = "probability density",
                    nbinsx = bins_count,
                    marker_color = "tomato",
                    name = "Histogram"
                ),
                go.Scatter(
                    x = x_grid,
                    y = normal_density,
                    mode = "lines",
                    name = "Normal density"
                )
            ],
            name = f"N={sample_size}, bins={bins_count}"
        )
    )

fig.frames = frames

fig.update_layout(
    title = "Histogram → probability density",
    xaxis_title = "x",
    yaxis_title = "Density",
    bargap = 0.02,
    updatemenus = [
        {
            "type": "buttons",
            "buttons": [
                {
                    "label": "Play",
                    "method": "animate",
                    "args": [
                        None,
                        {
                            "frame": {"duration": 900, "redraw": True},
                            "transition": {"duration": 300},
                            "fromcurrent": True
                        }
                    ]
                }
            ]
        }
    ]
)

fig.show()

Figure 13.1: Histogram approaching a smooth distribution

13.7 ⚠️ The danger of decorative fitting

One of the great temptations in statistics is to fit a distribution because the curve looks elegant.

But elegance is not enough.

A distribution should be judged not only by visual resemblance, but by:

process plausibility
tail behavior
scale
support
assumptions
interpretive meaning

A negative-valued normal model for a strictly positive quantity may fit the center and fail the reality.
A light-tailed model may look smooth and miss the extremes that matter most.

So a good statistical eye does not ask only:

does the curve look nice?

It asks:

what kind of world would produce this shape?

13.8 🔰 The main players

In this Part we focus on several major distributions, not because they exhaust statistics, but because each represents a distinct style of variation:

Normal — balance, symmetry, additive influence
Student — caution under limited information
Lognormal — multiplicative growth and positive skew
Pareto — concentration and heavy extremes
Binomial — repeated yes/no structure

These are not just formulas.
They are ways reality organizes fluctuation.

13.9 🏵 Final thought

A distribution is a grammar of variation.

It tells us not only what may happen, but how the space of outcomes is inhabited.

That is why distributions matter so deeply.

They transform raw difference into intelligible form.

And once one learns to read them, data begins to look less like a pile of values and more like the visible shadow of a process.