13  Distributions

A dataset is not only a collection of values.

It also has a shape.

Some values gather near a center.
Some stretch into long tails.
Some appear in clumps.
Some are symmetric.
Some are skewed.
Some are tightly concentrated.
Some are dominated by rare extremes.

A distribution is the formal way of describing that structure.

๐Ÿต If variation tells us that values differ, a distribution tells us how they differ.

Not just that the world fluctuates, but in what form it fluctuates.

This is one of the great achievements of statistics: it does not treat variation as mere disorder.
It learns to recognize its geometry.


13.1 ๐Ÿ”ฐ What a distribution describes

A distribution tells us how values are spread across possible outcomes.

Depending on the context, this may mean:

  • which outcomes are more likely
  • where values tend to cluster
  • how wide the spread is
  • whether the shape is symmetric
  • how heavy the tails are
  • whether the variable is discrete or continuous

So a distribution is not just a curve or a formula.

It is a structured answer to questions like:

  • Where do values tend to live?
  • How often do extremes occur?
  • Is there a typical value?
  • Is the process balanced or biased?
  • Are rare events negligible, or dominant?

13.2 โ˜ฏ Shape as meaning

Two datasets can share the same mean and variance and still behave very differently.

One may be symmetric.
Another may be sharply skewed.
One may have mild tails.
Another may allow violent extremes.

So statistics must go beyond center and spread.

A distribution is where shape enters reasoning.

It is the difference between:

  • a world of moderation
  • a world of concentration
  • a world of multiplicative growth
  • a world of repeated trials
  • a world where rare events matter more than average ones

โšœ๏ธ In this sense, distributions are not mere technicalities.
They are portraits of process.


13.3 โš™๏ธ Discrete and continuous distributions

Some variables move in separate steps.

Examples:

  • number of successes
  • number of emails
  • number of failures
  • count of arrivals

These are often described with discrete distributions.

Other variables vary across intervals.

Examples:

  • height
  • time
  • temperature
  • weight
  • voltage

These are often described with continuous distributions.

This distinction matters because the mathematics changes, but the philosophical idea remains the same:

a distribution tells how possibility is populated


13.4 ๐Ÿ’ก Why there are many distributions

No single shape can describe every process.

Different mechanisms produce different forms of variation.

  • repeated yes/no trials often lead to the Binomial
  • many small additive influences often lead to the Normal
  • uncertainty with small samples calls for the Student
  • multiplicative growth often leads to the Lognormal
  • concentration and rare extremes often suggest the Pareto

So a distribution is not chosen only because it is mathematically convenient.

It is chosen because it matches, or approximates, the structure of the process.


13.5 ๐Ÿช„ A distribution is a model

This is important.

A distribution is not the data itself.
It is a model of how the data may have been generated.

That means every distribution is both:

  • descriptive
  • interpretive

It describes visible shape, but it also proposes an organizing logic beneath that shape.

This is why distributions are so central to statistical modeling.

They connect data to mechanism.


13.6 ๐Ÿงช From data to model

A distribution is a model.

But models do not appear from nowhere.

They are extracted from data.

The question is:

How does a collection of observations become a smooth curve?


13.6.1 ๐Ÿ”ฐ Counting before understanding

At first, we just see observations.

We count them.

We group them.

We build a histogram.

Each bar answers a simple question:

how often did values fall here?

A summary showing where our data can be found.


13.6.2 โš™๏ธ Stability through repetition

As more data arrives, something changes.

The histogram stops jumping wildly.

Patterns begin to stabilize, like magic. But this is not magic.

It is a consequence of a deep principle:

The Law of Large Numbers.

As the number of observations increases, relative frequencies settle.

Not exactly, but enough to reveal structure.


13.6.3 โ˜ฏ๏ธ From bars to curves

Now something subtle happens.

If we refine the bins, and collect more data, the histogram begins to resemble a curve.

The bars dissolve into a shape.


13.6.4 ๐Ÿ’ก Interactive exploration

Try changing the number of observations and the number of bins.

Watch what happens.

Code
#| label: fig-hist-convergence
#| fig-cap: "Histogram approaching a probability density as data increases and bins are refined"

import numpy as np
import pandas as pd
import plotly.graph_objects as go

seed = 42
rng = np.random.default_rng(seed)

frames_spec = [
    (32, 8),
    (64, 10),
    (128, 12),
    (256, 16),
    (512, 20),
    (1024, 24),
    (2048, 30),
]

max_sample_size = max(sample_size for sample_size, _ in frames_spec)
full_sample = rng.normal(0, 1, max_sample_size)

x_grid = np.linspace(-4, 4, 400)
normal_density = (
    1 / np.sqrt(2 * np.pi)
) * np.exp(
    -0.5 * x_grid**2
)

first_sample_size, first_bins = frames_spec[0]
first_sample = full_sample[:first_sample_size]

fig = go.Figure(
    data = [
        go.Histogram(
            x = first_sample,
            histnorm = "probability density",
            nbinsx = first_bins,
            marker_color = "tomato",
            name = "Histogram"
        ),
        go.Scatter(
            x = x_grid,
            y = normal_density,
            mode = "lines",
            name = "Normal density"
        )
    ]
)

frames = []

for sample_size, bins_count in frames_spec:
    sample_prefix = full_sample[:sample_size]

    frames.append(
        go.Frame(
            data = [
                go.Histogram(
                    x = sample_prefix,
                    histnorm = "probability density",
                    nbinsx = bins_count,
                    marker_color = "tomato",
                    name = "Histogram"
                ),
                go.Scatter(
                    x = x_grid,
                    y = normal_density,
                    mode = "lines",
                    name = "Normal density"
                )
            ],
            name = f"N={sample_size}, bins={bins_count}"
        )
    )

fig.frames = frames

fig.update_layout(
    title = "Histogram โ†’ probability density",
    xaxis_title = "x",
    yaxis_title = "Density",
    bargap = 0.02,
    updatemenus = [
        {
            "type": "buttons",
            "buttons": [
                {
                    "label": "Play",
                    "method": "animate",
                    "args": [
                        None,
                        {
                            "frame": {"duration": 900, "redraw": True},
                            "transition": {"duration": 300},
                            "fromcurrent": True
                        }
                    ]
                }
            ]
        }
    ]
)

fig.show()
Figure 13.1: Histogram approaching a smooth distribution

13.7 โš ๏ธ The danger of decorative fitting

One of the great temptations in statistics is to fit a distribution because the curve looks elegant.

But elegance is not enough.

A distribution should be judged not only by visual resemblance, but by:

  • process plausibility
  • tail behavior
  • scale
  • support
  • assumptions
  • interpretive meaning

A negative-valued normal model for a strictly positive quantity may fit the center and fail the reality.
A light-tailed model may look smooth and miss the extremes that matter most.

So a good statistical eye does not ask only:

does the curve look nice?

It asks:

what kind of world would produce this shape?


13.8 ๐Ÿ”ฐ The main players

In this Part we focus on several major distributions, not because they exhaust statistics, but because each represents a distinct style of variation:

  • Normal โ€” balance, symmetry, additive influence
  • Student โ€” caution under limited information
  • Lognormal โ€” multiplicative growth and positive skew
  • Pareto โ€” concentration and heavy extremes
  • Binomial โ€” repeated yes/no structure

These are not just formulas.
They are ways reality organizes fluctuation.


13.9 ๐Ÿต Final thought

A distribution is a grammar of variation.

It tells us not only what may happen, but how the space of outcomes is inhabited.

That is why distributions matter so deeply.

They transform raw difference into intelligible form.

And once one learns to read them, data begins to look less like a pile of values and more like the visible shadow of a process.