Probability concepts

Probability concepts we discuss are sample spaces and events, probability axioms, discrete probability law, discrete uniform probability law, probability laws and their properties, conditional probability, total probability theorem, Bayes rule, independence, conditional independence.

Probability theory is a framework for quantifying and reasoning with uncertainty.

Two notions of probability

Elements of a probabilistic model

  1. The sample space \(\Omega\), which is the set of all possible outcomes of an experiment.

  2. The probability law, which assigns to a set \(A\) of possible outcomes (i.e., an event) a nonnegative number \(\mathbf{P}(A)\) (i.e., probability of \(A\)). This number reflects our knowledge/belief about the likelihood of the elements of \(A\).

Discrete sample space

Probability axioms

An event is a subset of the sample space. An event corresponds to grouping of one or more outcomes.

We assign probabilities to events.

The probability law must satisfy certain properties, called probability axioms.

\[\mathbf{P}(A \cup B) = \mathbf{P}(A) + \mathbf{P}(B)\] \[\mathbf{P} \left(A_{1} \cup A_{2} \cup \cdots\right) = \mathbf{P}\left(A_{1}\right) + \mathbf{P} \left(A_{2} \right) + \cdots\]

Discrete probability law

Assume that the sample space consists of a finite number of possible outcomes, \(\left\{s_{1}, s_{2}, \ldots, s_{n}\right\}\).

The probability law is specified by the probabilities of the events that consist of a single element. That is, \(\mathbf{P}(s_{1}), \mathbf{P}(s_{2}), \cdots, \mathbf{P}(s_{n})\).

The probability of any event \(\left\{s_{1}, s_{2}, \ldots, s_{n}\right\}\) is the sum of the probabilities of its elements:

\[\mathbf{P}\left(\left\{s_{1}, s_{2}, \ldots, s_{n}\right\}\right)=\mathbf{P}\left(s_{1}\right)+\mathbf{P}\left(s_{2}\right)+\cdots+\mathbf{P}\left(s_{n}\right)\]

Discrete uniform probability law

Assume that the sample space consists of equally likely \(n\) possible outcomes. Then, the probability of any event \(A\) is:

\[\mathbf{P}(A)=\frac{\text { number of elements of } A}{n}\]

Some properties of probability laws

Let \(A, B,\) and \(C\) be events.

Buffon’s needle problem

A simulation of Buffon's needle experiment. The estimated value of $$\pi$$ ($$y$$-axis) approaches $$3.14$$ as the number of estimations/tosses ($$x$$-axis) approaches infinity. Source: [Wikipedia: Buffon's needle problem](https://en.wikipedia.org/wiki/Buffon%27s_needle_problem#/media/File:Buffon_needle_experiment_compressed.gif)
A simulation of Buffon’s needle experiment. The estimated value of \(\pi\) (\(y\)-axis) approaches \(3.14\) as the number of estimations/tosses (\(x\)-axis) approaches infinity. Source: Wikipedia: Buffon’s needle problem

Buffon’s needle is a geometric probability problem named after Georges-Louis Leclerc, Comte de Buffon. Consider a floor made of equal length parallel strips of wood. Visualize a line between any two adjacent wooden strips. Now the floor has equally spaced parallel lines. If we drop a needle onto the floor, what is the probability that the needle will lie across any parallel line? Solution to the Buffon’s needle problem stated as a theorem reads: The probability \(P(l,d)\) that a needle of length \(l\) will randomly land on a line, given a floor with equally spaced parallel lines at a distance \(d \ge l\) apart, is \(P(l,d) = \frac{2}{\pi} \frac{l}{d}\).

We can estimate the value of \(\pi\) from the Buffon’s needle theorem. Rearranging the terms, we have \(\pi = \frac{2l}{Pd}\). Suppose we conduct the Buffon’s needle experiment by dropping a needle \(n\) times and notice that the \(h\) of those needles crossed lines. Now, we can approximate the value of \(P\) as \(\frac{h}{n}\). This entails the following expression for \(\pi\):

\[\pi \approx \frac{2 l n}{t h}\]

Conditional probability

\[\mathbf{P}(A \mid B)=\frac{\mathbf{P}(A \cap B)}{\mathbf{P}(B)}\]

We are specifying a new (conditional) probability law on the same sample space \(\Omega.\) They can be viewed as a probability law on a new (reduced) universe \(B\).

All properties of probability laws remain valid for conditional probability laws.

If the possible outcomes are finitely many and equally likely, then

\[\mathbf{P}(A \vert B) =\frac{\vert A \cap B \vert}{\vert B \vert}\]

Total probability theorem

Let \(A_{1}, \ldots, A_{n}\) be disjoint events that form a partition of the sample space. That is, each possible outcome is included in exactly one of the events \(A_{1}, \ldots, A_{n}\). Also, assume that \(\mathbf{P}\left(A_{i}\right)>0,\) for all \(i.\) Then, for any event \(B:\)

\[\begin{aligned} \mathbf{P}(B) &=\mathbf{P}\left(A_{1} \cap B\right)+\cdots+\mathbf{P}\left(A_{n} \cap B\right) \\ &=\mathbf{P}\left(A_{1}\right) \mathbf{P}\left(B \mid A_{1}\right)+\cdots+\mathbf{P}\left(A_{n}\right) \mathbf{P}\left(B \mid A_{n}\right) \end{aligned}\]

Independence

Two events \(A\) and \(B\) are independent if:

\[\mathbf{P}(A \cap B)=\mathbf{P}(A) \mathbf{P}(B)\]

Also, if \(\mathbf{P}(B)>0\), independence is equivalent to:

\[\mathbf{P}(A \mid B)=\mathbf{P}(A)\]

If \(A\) and \(B\) are independent, so are \(A\) and \(B^{c}\).

Two events \(A\) and \(B\) are said to be conditionally independent, given another event \(C\) with \(\mathbf{P}(C)>0\), if

\[\mathbf{P}(A \cap B \mid C)=\mathbf{P}(A \mid C) \mathbf{P}(B \mid C)\]

Also, if \(\mathbf{P}(B \cap C)>0\), conditional independence is equivalent to:

\[\mathbf{P}(A \mid B \cap C) = \mathbf{P}(A \mid C)\]

Independence does not imply conditional independence, and vice versa.

Independence of several events

We say that the events \(A_{1}, A_{2}, \ldots, A_{n}\) are independent if \(\mathbf{P}\left(\bigcap_{i \in S} A_{i}\right)=\prod_{i \in S} \mathbf{P}\left(A_{i}\right),\quad\) for every subset \(S\) of \(\{1,2, \ldots, n\}\)

Summary of counting results

\[\left(\begin{array}{c}n \\ n_{1}, n_{2}, \ldots, n_{r}\end{array}\right)=\frac{n !}{n_{1} ! n_{2} ! \cdots n_{r} !}\]


Back to course home