Probability concepts

Probability concepts we discuss are sample spaces and events, probability axioms, discrete probability law, discrete uniform probability law, probability laws and their properties, conditional probability, total probability theorem, Bayes rule, independence, conditional independence.

Probability theory is a framework for quantifying and reasoning with uncertainty.

Two notions of probability

Defining probability based on frequency of occurrence of events (Frequentist approach).
Defining probability based on subjective belief (Bayesian approach).

Elements of a probabilistic model

The sample space $\Omega$, which is the set of all possible outcomes of an experiment.
The probability law, which assigns to a set $A$ of possible outcomes (i.e., an event) a nonnegative number $\mathbf{P}(A)$ (i.e., probability of $A$). This number reflects our knowledge/belief about the likelihood of the elements of $A$.

Discrete sample space

The set of possible outcomes defined at a suitable level of granularity, $\Omega$, must be:
- Mutually exclusive
- Collectively exhaustive

Probability axioms

An event is a subset of the sample space. An event corresponds to grouping of one or more outcomes.

We assign probabilities to events.

The probability law must satisfy certain properties, called probability axioms.

Nonnegativity: $\mathbf{P}(A) \geq 0,$ for every event $A$.
Additivity: If $A$ an $B$ are two disjoint events, then the probability of their union satisfies:

\[\mathbf{P}(A \cup B) = \mathbf{P}(A) + \mathbf{P}(B)\]

If the sample space has an infinite number of elements and $A_{1}, A_{2}, \ldots$ is a sequence of disjoint events, then:

\[\mathbf{P} \left(A_{1} \cup A_{2} \cup \cdots\right) = \mathbf{P}\left(A_{1}\right) + \mathbf{P} \left(A_{2} \right) + \cdots\]

Normalization: The probability of the entire sample space $\Omega$ must be equal to $1$. That is, $\mathbf{P}(\Omega)=1$.

Discrete probability law

Assume that the sample space consists of a finite number of possible outcomes, $\left\{s_{1}, s_{2}, \ldots, s_{n}\right\}$.

The probability law is specified by the probabilities of the events that consist of a single element. That is, $\mathbf{P}(s_{1}), \mathbf{P}(s_{2}), \cdots, \mathbf{P}(s_{n})$.

The probability of any event $\left\{s_{1}, s_{2}, \ldots, s_{n}\right\}$ is the sum of the probabilities of its elements:

\[\mathbf{P}\left(\left\{s_{1}, s_{2}, \ldots, s_{n}\right\}\right)=\mathbf{P}\left(s_{1}\right)+\mathbf{P}\left(s_{2}\right)+\cdots+\mathbf{P}\left(s_{n}\right)\]

Discrete uniform probability law

Assume that the sample space consists of equally likely $n$ possible outcomes. Then, the probability of any event $A$ is:

\[\mathbf{P}(A)=\frac{\text { number of elements of } A}{n}\]

Some properties of probability laws

Let $A, B,$ and $C$ be events.

If $A \subset B$, then $\mathbf{P}(A) \leq \mathbf{P}(B)$
\[\mathbf{P}(A \cup B) = \mathbf{P}(A) + \mathbf{P}(B) - \mathbf{P}(A \cap B)\]
\[\mathbf{P}(A \cup B) \leq \mathbf{P}(A) + \mathbf{P}(B)\]
\[\mathbf{P}(A \cup B \cup C) = \mathbf{P}(A)+ \mathbf{P}\left(A^{c} \cap B\right) + \mathbf{P}\left(A^{c} \cap B^{c} \cap C \right)\]

Buffon’s needle problem

⊕ $A simulation of Buffon's needle experiment. The estimated value of $$\pi$$ ($$y$$-axis) approaches $$3.14$$ as the number of estimations/tosses ($$x$$-axis) approaches infinity. Source: [Wikipedia: Buffon's needle problem](https://en.wikipedia.org/wiki/Buffon%27s_needle_problem#/media/File:Buffon_needle_experiment_compressed.gif)$
A simulation of Buffon’s needle experiment. The estimated value of $\pi$ ($y$-axis) approaches $3.14$ as the number of estimations/tosses ($x$-axis) approaches infinity. Source: Wikipedia: Buffon’s needle problem

Buffon’s needle is a geometric probability problem named after Georges-Louis Leclerc, Comte de Buffon. Consider a floor made of equal length parallel strips of wood. Visualize a line between any two adjacent wooden strips. Now the floor has equally spaced parallel lines. If we drop a needle onto the floor, what is the probability that the needle will lie across any parallel line? Solution to the Buffon’s needle problem stated as a theorem reads: The probability $P(l,d)$ that a needle of length $l$ will randomly land on a line, given a floor with equally spaced parallel lines at a distance $d \ge l$ apart, is $P(l,d) = \frac{2}{\pi} \frac{l}{d}$.

We can estimate the value of $\pi$ from the Buffon’s needle theorem. Rearranging the terms, we have $\pi = \frac{2l}{Pd}$. Suppose we conduct the Buffon’s needle experiment by dropping a needle $n$ times and notice that the $h$ of those needles crossed lines. Now, we can approximate the value of $P$ as $\frac{h}{n}$. This entails the following expression for $\pi$:

\[\pi \approx \frac{2 l n}{t h}\]

Conditional probability

The conditional probability of an event $A,$ given an event $B$ with $\mathbf{P}(B)>0,$ is defined by

\[\mathbf{P}(A \mid B)=\frac{\mathbf{P}(A \cap B)}{\mathbf{P}(B)}\]

We are specifying a new (conditional) probability law on the same sample space $\Omega.$ They can be viewed as a probability law on a new (reduced) universe $B$.

All properties of probability laws remain valid for conditional probability laws.

If the possible outcomes are finitely many and equally likely, then

\[\mathbf{P}(A \vert B) =\frac{\vert A \cap B \vert}{\vert B \vert}\]

Total probability theorem

Let $A_{1}, \ldots, A_{n}$ be disjoint events that form a partition of the sample space. That is, each possible outcome is included in exactly one of the events $A_{1}, \ldots, A_{n}$. Also, assume that $\mathbf{P}\left(A_{i}\right)>0,$ for all $i.$ Then, for any event $B:$

\[\begin{aligned} \mathbf{P}(B) &=\mathbf{P}\left(A_{1} \cap B\right)+\cdots+\mathbf{P}\left(A_{n} \cap B\right) \\ &=\mathbf{P}\left(A_{1}\right) \mathbf{P}\left(B \mid A_{1}\right)+\cdots+\mathbf{P}\left(A_{n}\right) \mathbf{P}\left(B \mid A_{n}\right) \end{aligned}\]

Independence

Two events $A$ and $B$ are independent if:

\[\mathbf{P}(A \cap B)=\mathbf{P}(A) \mathbf{P}(B)\]

Also, if $\mathbf{P}(B)>0$, independence is equivalent to:

\[\mathbf{P}(A \mid B)=\mathbf{P}(A)\]

If $A$ and $B$ are independent, so are $A$ and $B^{c}$.

Two events $A$ and $B$ are said to be conditionally independent, given another event $C$ with $\mathbf{P}(C)>0$, if

\[\mathbf{P}(A \cap B \mid C)=\mathbf{P}(A \mid C) \mathbf{P}(B \mid C)\]

Also, if $\mathbf{P}(B \cap C)>0$, conditional independence is equivalent to:

\[\mathbf{P}(A \mid B \cap C) = \mathbf{P}(A \mid C)\]

Independence does not imply conditional independence, and vice versa.

Independence of several events

We say that the events $A_{1}, A_{2}, \ldots, A_{n}$ are independent if $\mathbf{P}\left(\bigcap_{i \in S} A_{i}\right)=\prod_{i \in S} \mathbf{P}\left(A_{i}\right),\quad$ for every subset $S$ of $\{1,2, \ldots, n\}$

Summary of counting results

Permutations of $n$ objects: $n!$
$k$-permutations of $n$ objects: $n! /(n-k) !$
Combinations of $k$ out of $n$ objects: $\left(\begin{array}{l}n \\ k\end{array}\right)=\frac{n !}{k !(n-k) !}$
Partitions of $n$ objects into $r$ groups, with the $i$th group having $n_{i}$ objects:

\[\left(\begin{array}{c}n \\ n_{1}, n_{2}, \ldots, n_{r}\end{array}\right)=\frac{n !}{n_{1} ! n_{2} ! \cdots n_{r} !}\]

Back to course home