3.2.1 The Cumulative Distribution Function

Definition 3.10

The cumulative distribution function (CDF) of a random variable X is a function $F_X : \mathbb{R} \rightarrow [0,1]$ given by $F_X(x) = P(X \leq x), \forall x \in \mathbb{R}$

Example 3.9: Say we toss a coin twice. Let X be the number of heads we observe. Find the CDF of X.

Solution: Notice that $X \sim \text{Binomial} \left(2,\frac{1}{2} \right)$ with range $R_X - \{0,1,2\}$ and a probability mass function given by: $P_X(0) = P(X=0) = \frac{1}{4}$ $P_X(1) = P(X=1) = \frac{1}{2}$ $P_X(2) = P(X=2) = \frac{1}{4}$ To find the CDF, we first note that if $x<0$ then $F_X(x) = P(X \leq x) = 0$ We can also see that for $x \geq 2$ , $F_X(x) = P(X \leq x) = 1$ Now we need to consider the values that lie between 0 and 2. If $0 \leq x < 1$ , $X \leq x$ implies that X can only take the value 0. Therefore: $F_X(x) = P(X \leq x) = P(x=0) = \frac{1}{4}$ for $0 \leq x < 1$ .

Similarly, for $1 \leq x < 2$ , $F_X(x) = P(X \leq x) = P(x=0) + P(X=1) = \frac{3}{4}$ . Putting all of these together we get

$F_X(x) = \begin{cases} 0 & x < 0\\[10pt] \dfrac{1}{4} & 0 \leq x < 1\\[10pt] \dfrac{3}{4} & 1 \leq x < 2 \\[10pt] 1 & x \geq 2 \end{cases}$

Notice that we’ll always have $\lim_{x \rightarrow -\infty} F_X(x) = 0$ and $\lim_{x \rightarrow \infty} F_X(x) = 1$

Theorem: Given that X is a random variable with probability mass function $P_X(x)$ and cumulative distribution function $F_X(x)$ ,

(a) For all $a \leq b$ , we have $P(a < X \leq b) = F_X(b) - F_X(a)$ (b) For any $x$ , we have $P(X<x) = P(X \leq x) - P(X=x) = F_X(x) - P_X(x)$

Example 3.10: Let X be a discrete random variable with range $R_x = \{1,2,3,...\}$ . Suppose the probability mass function of X is given by $P_X(x) = \frac{1}{2^k}, \quad k=1,2,3,...$ (a) Find and plot the cumulative distribution function of X

(b) Find $P(2 < X \leq 5)$

(c) Find $P(X>4)$

Solution: First let’s verify that $P_X(x)$ is indeed a probability mass function. $\sum_{k=1}^{\infty}P_X(k) = \sum_{k=1}^{\infty}\frac{1}{2^k} = \frac{1}{2} \left(\frac{1}{1-\frac{1}{2}} \right) = 1$ Now onto the problem:

(a) Since the range begins at 1, we have $F_X(x) = 0$ for $x < 1$ .

Now, let $k \leq x < k+1$ for any $k$ in $\{1,2,3,...\}$ . $F_X(x) = \frac{1}{2} + \frac{1}{2^2} + \frac{1}{2^3} + ... + \frac{1}{2^k} = \frac{1}{2}\left(1 + \frac{1}{2} + \frac{1}{2^2} + ... + \frac{1}{2^{k-1}} \right)$ $= \frac{1}{2} \left( \frac{1- \frac{1}{2^k}}{1-\frac{1}{2}} \right) = 1 - \frac{1}{2^k} = \frac{2^k-1}{2^k}$

So we have: $F_X(x) = \begin{cases} 0 & x < 1\\[10pt] \dfrac{2^k-1}{2^k} & k \leq x < k+1 \end{cases}$ (b) Using the above theorem, $P(2 < X \leq 5) = F_X(5) - F_X(2) = \frac{2^5-1}{2^5} - \frac{2^2-1}{2^2} = \frac{7}{32}$ (c) $P(X>4) = 1 - F_X(4) = 1 - \frac{15}{16} = \frac{1}{16}$

3.2.2 Expectation

Definition 3.11: Let X be a discrete random variable with range $R_X = \{x_1, x_2, x_3,...\}$ . The expected value of X, denoted EX, is defined as $EX = \sum_{x_k \in R_x} x_k P(X=x_k) = \sum_{x_k \in R_x} x_k P_X(x_k)$ The expected value may be written using several different notations which are all equivalent: $EX = E[X] = E(X) - \mu _x$

Example 3.11: Let $X \sim \text{Bernoulli}(p)$ . Find EX.

Solution: $EX = 0 \cdot P_X(0) = 1 \cdot P_X(1) = 0 \cdot (1-p) = 1 \cdot p = p$ Hence, $EX = p$ .

Example 3.12: Let $X \sim \text{Geometric}(p)$ . Find EX.

Solution: $EX = \sum_{k=1}^{\infty} x_k q^{k-1}p = \sum_{k=1}^{\infty} k q^{k-1}p = p \sum_{k=1}^{\infty} k q^{k-1}$ $\stackrel{(*)}{=} p \frac{1}{(1-q)^2} = \frac{p}{p^2} = \frac{1}{p}$ Hence, $EX = \dfrac{1}{p}$ .

$(*)$ We know that $\sum_{k=0}^{\infty} x^k = \frac{1}{1-x}$ if $|x| < 1$ . Differentiating both sides we get $\frac{d}{dx} \sum_{k=0}^{\infty} x^k = \frac{d}{dx} \frac{1}{1-x}$ $\sum_{k=1}^{\infty} k x^{k-1} = \frac{1}{(1-x)^2}$

Example 3.13: Let $X \sim \text{Poisson}(\lambda)$ . Find EX.

Solution: $EX = \sum_{x_k \in R_x} x_k P_X(x_k) = \sum_{k=0}^{\infty} k \frac{e^{-\lambda} \lambda^k}{k!} = e^{-\lambda} \sum_{k=0}^{\infty} \frac{ \lambda^k}{(k-1)!}$ $= \lambda e^{-\lambda} \sum_{k=0}^{\infty} \frac{ \lambda^k}{k!} = \lambda e^{-\lambda} e^{\lambda} = \lambda$ Hence, $EX = \lambda$ .

Theorem 3.2: The expected value has the following properties:

(a) For any random variable X and $a, b \in \mathbb{R}$ $E[aX+b] = aEX + b$ (b) Given any number of random variables $X_1, X_2,....X_n$ , which may or may not be independent, $E[X_1 + X_2 + ... + X_n] = EX_1 + EX_2 + ... + EX_n$

Example 3.16: Let $X \sim \text{Binomial}(n,p).$ Find EX.

Solution: We know that for a binomial distribution, $X = X_1 + X_2 + ... + X_n$ where $X_i \sim \text{Bernoulli}(p)$ are independent random variables. So we can write: $EX = E[X_1 + X_2 + ... + X_n] = EX_1 + EX_2 + ... + EX_n$ $= p + p + ... + p = np$ Hence, $EX = np$ .

Example 3.15: Let $X \sim \text{Pascal}(m,p).$ Find EX.

Solution: In this case, we also have a sum $X = X_1 + X_2 + ... + X_m$ where $X_i \sim \text{Geometric}(p)$ . So we have: $EX = \sum_{k=1}^m EX_k = \sum_{k=1}^m \frac{1}{p} = \frac{m}{p}$ Hence $EX = \dfrac{m}{p}$ .

3.2.3 Functions of Random Variables

Let X be a random variable and define $Y=g(X)$ to be a function of that random variable. Now, Y itself is also a random variable. So it makes sense to discuss things like the probability mass function, cumulative distribution function, and expected value of this function.

To start off, the range of $Y$ will be $R_Y = \{g(x) : x \in R_X\}$ and we can write $P_Y(y) = P(Y=y) = P (g(x) =y) = \sum_{x : g(x) = y} P_X(x)$

Example 3.16: Let X be a discrete random variable with $P_X(k) = \dfrac{1}{5}, \quad k = -1, 0, 1, 2, 3$ . Let $Y = 2|X|$ . Find $R_Y$ and the probability mass function of $Y$ .

Solution: $R_Y = \{2|X| : x \in R_X \} = \{0,2,4,6\}$ Now, to find the PMF: $P_Y(0) = P(Y=0) = P(2|X| = 0) = P(X=0) = \frac{1}{5}$ $P_Y(2) = P(Y=2) = P(2|X| = 2) = P(X=-1) + P(X=1) = \frac{2}{5}$ $P_Y(4) = P(2|X| = 4) = P(X=2) = \frac{1}{5}$ $P_Y(6) = P(2|X| = 6) = P(X=3) = \frac{1}{5}$

So we have $P_Y(k) = \begin{cases} \dfrac{1}{5} & k = 0,4,6\\[10pt] \dfrac{1}{5} & k =2 \\[10pt] 0 & \text{otherwise} \end{cases}$

The Expected Value of a Function of a Random Variable

The law of the unconscious statistician (or LOTUS) is a theorem which states that the expected value of a function of a random variable, or $E[g(x)]$ , can be expressed using the probability mass function of X (without needing the find the PMF of $g(x)$ !). For a discrete random variable, this is expressed at $E[g(x)] = \sum_{x_k \in R_X} g(x_k) P_X(x_k)$

Example 3.17: Let X be a discrete random variable with $R_X = \{0, \frac{\pi}{4}, \frac{\pi}{2}, \frac{3\pi}{4}, \pi \}$ where $P(0) = P(\frac{\pi}{4}) = P(\frac{\pi}{2}) = P(\frac{3\pi}{4}) = P(\pi) = \frac{1}{5}$ . Find $E[\sin(X)]$ .

Solution: Using LOTUS, we have $E[g(x)] = \sum_{x_k \in R_X} g(x_k) P_X(x_k)$ $= \sin(0) \cdot \frac{1}{5} + \sin\left(\frac{\pi}{4}\right) \cdot \frac{1}{5} + \sin\left(\frac{\pi}{2}\right) \cdot \frac{1}{5} + \sin\left(\frac{3\pi}{4}\right) \cdot \frac{1}{5} + \sin(\pi) \cdot \frac{1}{5}$ $= \frac{\sqrt{2}+1}{5}$

Example 3.18: Prove $E[aX+b] = aEX + b$ .

Solution: Here, $g(x) = aX + b$ , so we can use LOTUS to get: $E[aX+b] = \sum_{x_k \in R_X} (ax_k + b) P_X(x_k)$ $= a \sum_{x_k \in R_X} x_k P_X(x_k) + b \sum_{x_k \in R_X} P_X(x_k) = aEX + b$ $\blacksquare$

3.2.4 Variance

The variance of a random variable X with mean $EX = \mu_X$ is defined as $\text{Var}(X) = E[(X-\mu_x)^2]=\sum_{x_k \in R_X}(x_k - \mu_X)^2 P_X(x_k)$ The standard deviation, in turn, is defined as $\text{SD}(X) = \sigma_X = \sqrt{\text{Var(X)}}$

Theorem: Given a random variable X, $\text{Var(X)} = E[X^2] - [EX]^2$

Proof: We know, by the previous definition, that $\text{Var(X)} = E[(X-\mu_X)^2] = E[X^2 - 2\mu_X X + \mu_X^2]$ Expanding this expression using Theorem 3.2, we get $E[X^2] - 2\mu_X EX + \mu_X^2 = E[X^2] - 2 \mu_X^2 + \mu_X^2$ $= E[X^2] - \mu_X^2 = E[X^2] - [EX]^2$ $\blacksquare$

Example 3.19 Say we roll and fair, 6-sided die, and let X be the resulting number. Find $EX$ , $\text{Var}(XX$ , and $\sigma_X$ .

Solution: First of all, we know that $R_X = \{1,2,3,4,5,6\}$ with $P_X(k) = \frac{1}{6}, \quad k=1,2,3,4,5,6$ Therefore, we have $EX = \sum_{i=1}^6 i \cdot \frac{1}{6} = \frac{1+2+3+4+5+6}{6} = \frac{7}{2}$ Now we can calculate variance using $\text{Var(X)} = E[X^2] - [EX]^2$ . First, we need to find $E[X^2]$ . $E[X^2] = \sum_{x_k \in X} x_k^2 P_X(x_k) = \sum_{i=1}^6 \frac{i^2}{6}$ $= \frac{1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2}{6} = \frac{91}{6}$ Now we can calculate $E[X^2] - [EX]^2 = \frac{91}{6} - \left(\frac{7}{2}\right)^2 = \frac{35}{2} \approx 2.92$ Finally, we have $\sigma_X = \sqrt{\text{Var}(X)} \approx \sqrt{2.92} \approx 1.71$

Theorem 3.3: Given a random variable X and $a,b \in \mathbb{R}$ , $\text{Var}(aX+b) = a^2 \text{Var}(X)$

Proof: Let $Y = aX + b$ . From the previous section, we know that $EY = aEX + b$ Therefore, using our original definition of variance $\text{Var}(Y) = E[(Y-EY)^2] = E[(aX + b - aEX - b)^2]$ $= E[a^2 (X-\mu_x)^2] = a^2 E[(X-\mu_X)^2] = a^2\text{Var}(X)$ $\blacksquare$

Theorem 3.4: If $X_1, X_2, ..., X_n$ are independent random variables and $X = X_1 + X_2 + ... + X_n$ then $\text{Var}(X) = \text{Var}(X_1) + \text{Var}(X_2) + ... + \text{Var}(X_n)$

Proof: $\text{Var}(X) = \text{Var}\left(\sum_{k=1}^n X_k \right)$ $= E \left[ \left( \sum_{k=1}^n X_k - E\left( \sum_{k=1}^n X_k\right)\right)^2\right] = E \left[ \left( \sum_{k=1}^n (X_k - \mu_{X_k})\right)^2\right]$ $= E \left[ \sum_{1 \leq k \leq l \leq n} (X_kX_l - \mu_{x_k} X_l - \mu_{x_1} X_k - \mu_{x_k}\mu_{x_l})\right]$ $= \sum_{1 \leq k \leq l \leq n} \left(E[X_kX_l] - \mu_{X_k}\mu_{X_l} - \mu_{X_k}\mu_{X_l} + \mu_{X_k}\mu_{X_l}\right)$ $= \sum_{k=1}^n (EX^2_k - \mu_{x_k}^2) = \sum_{k=1}^n \text{Var}(X_k)$ $\blacksquare$

Example 3.20: Let $X \sim \text{Binomial}(n,p)$ . Find Var(X).

Solution: Once again, we know that $X = \sum_{k=1}^(n X_k$ where $X_k \sim \text{Bernoulli}(p)$ .

For each $X_k$ , $\text{Var}(X_k) = E[X^2_k] - [EX_k]^2 = 1^2 p + 0^2 (1-p) - p^2 = p(1-p)$ And so we have that $\text{Var}(X) = \sum_{k=1}^n \text{Var}(X_k) = \sum_{k=1}^n p(1-p) = np(1-p)$ Hence, $\text{Var}(X) = np(1-p)$ .

3.2.5 Solved Problems

Problem 1: Let X be a discrete random variable with the following probability mass function:

$P_X(x) = \begin{cases} 0.3 & x=3, 8\\[5pt] 0.2 & x=5, 10\\[5pt] 0 & \text{otherwise} \end{cases}$

Find the cumulative distribution function of X.

Solution: The cumulative distribution function is defined by $F_X(x) = P(X \leq x)$ . So we have:

$F_X(x) = \begin{cases} 0 & x<3 \\[5pt] P_X(3) = 0.3 & 3 \leq x < 5\\[5pt] P_X(3) + P_X(5) = 0.5 & 5 \leq x < 8\\[5pt] P_X(3) + P_X(5) + P_X(8) = 0.8 & 8 \leq x < 10\\[5pt] 1 & x \geq 10 \end{cases}$

Problem 2: Let X be a discrete random variable with the following probability mass function:

$P_X(k) = \begin{cases} 0.1 & k=0\\[5pt] 0.4 & k=1\\[5pt] 0.3 & k=2\\[5pt] 0.2 & k=3\\[5pt] 0 & \text{otherwise} \end{cases}$ (a) Find EX.

(b) Find Var(X).

(c) Let $Y = (X-2)^2$ and find EY.

Solution:

(a) $EX = \displaystyle \sum_{x_k \in R_k} x_k P_X(x_k) = 0(0.1) + 1(0.4) + 2(0.3) + 3(0.2) = 1.6$

(b) First we need to find $E[X^2]$ : $E[X^2] = 0^2(0.1) + 1^2(0.4) + 2^2(0.3) + 3^2(0.2) = 3.4$ Now we have $\text{Var}(X) = 3.4 - (1.6)^2 = 0.84$ (c) Using LOTUS, we know that $E[(X-2)^2] = \sum_{x_k \in R_X} (x_k-2)^2 P_X(x_k)$ $= (0-2)^2 (0.1) + (1-2)^2 (0.4) + (2-2)^2 (0.3) + (3-2)^2 (0.2) = 1$

Problem 3: Let X be a discrete random variable with the following probability mass function:

$P_X(k) = \begin{cases} 0.2 & k=0, 1\\[5pt] 0.3 & k=2, 3\\[5pt] 0 & \text{otherwise} \end{cases}$

Let $Y = X(X-1)(X-2)$ . Find the probability mass function of Y.

Solution: First, note that $R_Y = \{ x(x-1)(x-2) : x=0,1,2,3\} = \{0,6\}$ Thus, $P_Y(0) = P_X(0) + P_X(1) + P_X(2) = 0.7$ $P_Y(6) = P_X(3) = 0.3$ So our probability mass function is $P_Y(k) = \begin{cases} 0.7 & k=0\\[5pt] 0.3 & k=6\\[5pt] 0 & \text{otherwise} \end{cases}$

Problem 4: Let $X \sim \text{Geometric}(p)$ . Find $E\left[\dfrac{1}{2^X} \right]$ .

Solution: The probability mass function of X is given by

$P_X(k) = \begin{cases} pq^{k-1} & k=1,2,3,...\\[5pt] 0 & \text{otherwise} \end{cases}$

where $q = 1-p$ . So we have $E\left[\frac{1}{2^X} \right] = \sum_{k=1}^\infty \frac{1}{2^k} P_X(k) = \sum_{k=1}^\infty \frac{1}{2^k} pq^{k-1} = \frac{p}{2}\sum_{k=1}^\infty \left(\frac{q}{2}\right)^{k-1}$ $= \frac{p}{2} \left( \frac{1}{1-\frac{q}{2}} \right) = \frac{p}{1+1-q} = \frac{p}{1+p}$

Problem 5: Let $X \sim \text{Hypergeometric}(b,r,k)$ . Find EX.

Solution: The probability mass function of X is given by

$P_X(k) = \begin{cases} \cfrac{\binom{b}{x} \binom{r}{k-x}}{\binom{b+r}{k}} & k \in R_X\\[15pt] 0 & \text{otherwise} \end{cases}$

where $R_X = \{\text{max}(0,k,r),..., \text{min}(k,b)\}$ .

Define the indicator random variables as $X_i = \begin{cases} 1 & \text{if the ith chosen marble is blue} \\[5pt] 0 & \text{otherwise} \end{cases}$ where $i = 1,2,...,k$

So we can write $X = X_1 + X_2 + ... + X_K$ which implies that $EX = EX_1 + EX_2 + ... + EX_K$ Now, we have that for each $i$ , $P(X_i=1) = \frac{b}{b+r}$ so we can deduce that $EX_i = 0 \cdot P(X_i = 0) + 1 \cdot P(X_i = 1) = \frac{b}{b+r}$ Finally we have $EX = \sum_{i=1}^k \frac{b}{b+r} = \frac{kb}{b+r}$

Problem 6: Show that if $X \sim \text{Binomial}(n,p)$ , then $EX = np$ .

Solution: $EX = \sum_{k=0}^n k \binom{n}{k} p^k q^{n-k} = \sum_{k=1}^n k \binom{n}{k} p^k q^{n-k}$ $n \sum_{k=1}^n k \binom{n-1}{k-1} p^k q^{n-k} = np \sum_{k=0}^{n-1} \binom{n-1}{k} p^k q^{n-(k+1)}$ $= np(p+q)^{k-1} = np$

Problem 7: Let X be a discrete random variable with $R_X = \{0,1,2,...\}$ . Prove that $EX = \sum_{k=0}^{\infty} P(X>k)$

Solution: First, note that $P(X>0) = P_X(1) + P_X(2) + P_X(3) + P_X(4) + ...$ $P(X>1) = P_X(2) + P_X(3) + P_X(4) + ...$ $P(X>2) = P_X(3) + P_X(4) + ...$ and so on.

Thus, $\sum_{k=0}^{\infty} P(X>k) = P_X(1) + 2P_X(2) + 3P_X(3) + 4P_X(4) + ...$ $= \sum_{k=0}^{\infty} k P_X(k) = EX$

Problem 8: Let $X \sim \text{Poisson}(\lambda)$ . Find Var(X).

Solution: In Example 3.13, we showed that $EX = \lambda$ . Therefore $\text{Var}(X) = E[X^2] - \lambda^2$ . Standard wisdom would tell us to next find $E[X^2]$ , but let’s instead find $E[X(X-1)]$ for reasons that will be clear in a moment. $E[X(X-1)] = \sum_{k=0}^{\infty} k(k-1) P_X(k) = \sum_{k=0}^{\infty} k(k-1) \frac{e^{-\lambda} \lambda^k}{k!}$ $= e^{-\lambda} \lambda^2 \sum_{k=2}^{\infty} \frac{ \lambda^{k-2}}{(k-2)!} = e^{-\lambda} \lambda^2 \sum_{k=0}^{\infty} \frac{ \lambda^{k}}{k!}$ $= e^{-\lambda} \lambda^2 e^{\lambda} = \lambda^2$ So now we have $\lambda^2 = E[X(X+1)] = E[X^2] - EX$ $= E[X^2] - \lambda \Rightarrow E[X^2] = \lambda^2 + \lambda$ Finally, we can plug everything into our formula to get $\text{Var}(X) = E[X^2] - [EX]^2 = \lambda^2 + \lambda - \lambda^2 = \lambda$

Problem 9: Let X and Y be two independent random variables. Suppose that we know $\text{Var}(2X - Y) = 6$ and $\text{Var}(X + 2Y) = 9$ . Find Var(X) and Var(Y).

Solution: We have that $\text{Var}(2X - Y) = \text{Var}(2X) + \text{Var}(Y) = 4\text{Var}(X) + \text{Var}(Y) = 6$ and $\text{Var}(X + 2Y) = \text{Var}(X) + 4 \text{Var}(Y) = 9$ Setting up a system of equations, we can solve for $\text{Var}(X) =1$ and $\text{Var}(Y) =2$

E

H

I

L

P

R

W

Probability (II)

The Cumulative Distribution Function

3.2.1 The Cumulative Distribution Function

3.2.2 Expectation

3.2.3 Functions of Random Variables

3.2.4 Variance

3.2.5 Solved Problems