1
Math Fundamentals
1.1
Integrals, factors and techniques
∞ X
rn =
1 , 1−r
u dv =
uv|ba
n=0 Zb a Z∞
(r2 < 1) Zb
−
v du a
n −ax
x e
=
n! an+1
Z∞ −→
0
0
Γ(n) = (n − 1)!
Γ(1) = 1
1.2
x e−ax
∞ X xn = ex (a + b + c)2 = a2 + b2 + c2 + 2ab + 2ac + 2bc n! n=0 a a Zb 1 −λx 1 −λx −λx a special case: xe dx = xe + λ2 e λ b b a Z∞ Z∞ 1 2 = 2 and x2 e−ax = 3 Γ(n) = xn−1 e−x dx a a 0 0√ 1 π Γ n+ = n [1 · 3 · 5 · · · (2n − 1)] 2 √ 2 √ 3 1 π Γ Γ = π = 2 2 2
Probability relations
If A ⊆ B, then Pr(A) ≤ Pr(B). Pr(A) = Pr(A ∩ B) + Pr(A ∩ B 0 ) S T T S 0 0 ( Ai ) = A0i ( Ai ) = A0i Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C) − Pr(A ∩ B) − Pr(A ∩ C) − Pr(B ∩ C) + Pr(A ∩ B ∩ C) Pr(A ∩ B) Pr(A ∩ B) = Pr(B|A) Pr(A) ⇔ Pr(B|A) = . Pr(A) Pr(B) = Pr(B|A) Pr(A) + Pr(B|A0 ) Pr(A0 ) (Law of Total Probability) Pr(A|B) =
Pr(B|A) Pr(A) (Bayes’; note the “flip-flop” of Pr(A|B) & Pr(B|A)) Pr(B)
Pr(Aj |B) =
Pr(Aj ∩ B) Pr(B|Aj ) Pr(Aj ) = Pn Pr(B) i=1 Pr(B|Ai ) Pr(Ai )
(Generalized Bayes’; the Ai ’s form a partition)
A, B are independent iff Pr(A ∩ B) = Pr(A) Pr(B) and Pr(A|B) = Pr(A). Pr(A0 |B) = 1 − Pr(A|B) Pr((A ∪ B)|C) = Pr(A|C) + Pr(B|C) − Pr((A ∩ B)|C). If A, B are independent, then (A0 , B), (A, B 0 ), and (A0 , B 0 ) are also independent (each pair).
1.3
Counting
n n! There are = ways to choose r objects from a collection of n items denoted n Cr . If order r r!(n − r)! is important, then there are n Pr = r! n Cr permutations of those objects. n n! There are = ways to choose n1 objects of Type 1, n2 objects of Type 2, n1 n2 · · · nm n1 ! n2 ! · · · nm ! m X etc. Note that ni = n. i=1
1
Binomial Theorem: When expanding (1+t)N , the coefficient of tk is
∞ X N N k , so that (1 + t)N = t . k k k=0
N
Multinomial Theorem: When expanding (t1 + t2 + · · · + ts ) where N is a positive integer, the P N N! k1 k2 ks coefficient of t1 t2 · · · ts (where ki = N ) is = . k1 k2 · · · ks k1 !k2 ! · · · ks !
2 2.1
Probability Distributions Essential definitions
I only list examples for continuous variables. Note that for a discrete distribution, integration is replaced Rb P by summation. a dx → a<x
Survival Function sX (x) = Pr(X > x) = 1 − FX (x) Hazard Rate (failure rate) h(x) = λ(x) =
d fX (x) = − ln 1 − FX (x) sX (x) dx
2.1.1
Expectation values and other parameters R∞ R∞ R∞ In general, E(g(X)) = −∞ g(x)fX (x) dx. In particular, E(X) = −∞ xfX (x) dx and E(X 2 ) = −∞ x2 fX (x) dx. There are a couple of special cases when X > 0: R∞ • If X is continuous, E(X) = 0 sX (x) dx P∞ P∞ P • If X is discrete, E(X) = n=1 Pr(X ≥ n) = n=0 Pr(X > n) = n sX (n) = sX (0) + sX (1) + . . . R∞ • If X ≥ a almost surely, then E(X) = a + a sX (x) dx Rb • If a ≤ X ≤ b, then E(X) = a + a sX (x) dx If Pr(X ≥ 0) = 1, then we can write E min(X, a) =
Za sX (x) dx
and
Z∞
E max(X, a) = a +
sX (x) dx. a
0
2 2 The variance of X, Var(X) = σX = E(X 2 ) − E(X) . Since the variance is not a linear operator, it must often be calculated manually after obtaining the first two moments. This is particularly important for mixed distributions! Note that E(aX + bY + c) = aE(X) + bE(Y ) + c, and Var(aX + bY + c) = a2 Var(X) + b2 Var(Y ) (X, Y independent) p σX . The standard deviation of X is σX = Var(X) and the coefficient of variation is µX 2
2.1.2
Moment generating function and friends
If it exists, the moment generating function (MGF) is defined as dn MX (0) = E(X n ) dtn
MX (t) = E(eXt )
and
MX (t) =
n X
pi exi t
i=1
The latter is the sum of the probability of getting a particular value of x. The MGF of the sum of independent random variables is the product of their MGFs. Several properties of the MGF are worth remembering: MX (0) = 1
MaX = MX (at)
MX+Y (t) = MX (t)MY (t) if X and Y are independent The cumulant generating function is given by ΨX (t) = ln MX (t)
h k i dk Ψ(0) = Ψ(k) (0) = E X − E(X) k dt
which leads to
The probability generating function is given by PX (t) = E(tX ) iff E(tX ) exists. If 0 < p < 1, then the 100p−th percentile of the distribution of X is the number xp which satisfies both of the following inequalities: Pr(X ≤ xp ) ≥ p
Pr(X ≥ xp ) ≥ 1 − p.
and
For a continuous distribution, Pr(X ≤ xp ) = p. The 50th percentile is p = 0.50, and is called the median. For discrete distributions, the median and percentiles need not be unique! The mode of a distribution is where fX (x) is maximized. If a random variable X hasmean µ and standard deviation σ, we can create a new, “standardized” . Then the skewness of X is defined as γ = E(Z 3 ) and its kurtosis is random variable Z = X−µ σ given by κ = E(Z 4 ). A positive skewness indicates a long tail to the right. A large kurtosis indicates that the variance is influenced more by a few extreme outliers rather than several small deviations. 2.1.3
Important inequalities
Jensen’s: Pick h such that h00 (x) exists. Then if h00 (x) ≥ 0 if h00 (x) ≤ 0 Markov: Pr(X ≥ a) ≤
E(X) a
E h(X) ≥ h E(X) then E h(X) ≤ h E(X)
then
(a > 0 and X nonnegative real)
Chebyshev: Pr(|X − µ| ≥ κ) ≤
σ2 1 . Equivalently, Pr(|Z| ≥ r) ≤ 2 where Z = (X − µ)/σ. κ2 r
You must remember the Chebyshev Inequality!
3
2.1.4
Transformation of a random variable
Given random variable X with known functions fX and FX , and random variable Y = Φ(X) is a function of X. We want to find fY and FY . Note that Y =
Φ(X)
X= X continuous The following is useful:
= Y (X)
−1
Φ (Y ) = X(Y ) ( FX (x(y)) if y 0 > 0 FY = sX (x(y)) if y 0 < 0
dx fY (y) = fX (x(y)) dy X fY (y) = fX (x)
X discrete
x∈Φ−1 ({y})
2.2
Commonly Used Distributions
The most commonly used distributions for the purposes of this exam are summarized in tables 1, 2, and 3. The Binomial, Negative Binomial, and Poisson distributions all obey the following recursion relations: b fX (n) = a + fX (n − 1) n Binomial Negative Binomial Poisson
3
a
b
− pq q 0
(n + 1) pq (r − 1)q λ
Multivariate Distributions
These are almost always best started by drawing a graph of the region where fX,Y > 0. This is very useful for identifying the proper limits of integration or determining the ratio of areas.
3.1
Joint and marginal distributions
We now concern ourselves of the case when we have two random variables, call them X and Y , and wish to know features of their joint probability. That is, we want to study the probability density function fX,Y (x, y) = Pr(X = x ∩ Y = y) = Pr(X = x, Y = y) with cumulative probability FX,Y (x, y) = Pr(X ≤ x ∩ Y ≤ y) = Pr(X ≤ x, Y ≤ y) and the two are related as before: Zx Zy FX,Y (x, y) =
fX,Y (s, t) dt ds −∞ −∞
Expectation values are as before: E[h(X, Y )] =
R∞ R∞ −∞ −∞
4
h(x, y)fX,Y (x, y) dx dy
∂2 FX,Y (x, y) ∂x ∂y R RR P PP As in the single variable case, for discrete variables, replace → and → . If one plots the probability distribution as a function of X and Y , then it may be interesting to note how X behaves for a fixed value of Y , or vice-versa. Holding X fixed, we can sum FX,Y over all the allowed y for that X, and record it next to the graph—in the margin. We define the marginal distribution of X as Z∞ Z∞ fX (x) = fX,Y (x, y) dy and the marginal dist. of Y fY (y) = fX,Y (x, y) dx The pdf can be found from fX,Y (x, y) =
−∞
−∞
The marginal CDFs are given by FX (x) = lim FX,Y (x, y)
FY (y) = lim FX,Y (x, y)
y→+∞
x→+∞
If the random variables are independent, then the joint probability can be factored: fX,Y (x, y) = fX (x)fY (y)
FX,Y (x, y) = FX (x)FY (y)
The expectation values can be factored as well: E(X 2 Y 2 ) = E(X 2 )E(Y 2 )
E(XY ) = E(X)E(Y )
A plot of fX,Y will be a rectangle with sides parallel to the axes if X and Y are independent. The conditional distribution of Y given X has a direct analog to basic conditional probability. Recall that Pr(A ∩ B) fX,Y (x, y) so that fY |X=x (y|X = x) = fY (y|x) = Pr(B|A) = Pr(A) fX (x) R∞ The expectation value is found in the usual way; E(Y |X = x) = −∞ yfY (y|x) dy. There are two important results: E(Y ) = E[E(Y |X)]
3.2
Var(Y ) = E[Var(Y |X)] + Var[E(Y |X)]
Covariance
The covariance and correlation coefficient are given by 2 σXY = Cov(X, Y ) = E(XY ) − E(X)E(Y )
σ2 Cov(X, Y ) √ = XY σX σY Var X Var Y Note that ρ = Cov = 0 if X and Y are independent. Covariance is a linear operator: ρ(X, Y ) = √
Cov(aX1 + bX2 + c, Y ) = a Cov(X1 , Y ) + b Cov(X2 , Y ) and we can generalize the variance of a multivariate distribution to Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2ab Cov(X, Y ) Let X1 , X2 ,. . . ,Xn make a random sample from a distribution with variance σ 2 , then the above rule (with a = b = 1) can be extended as Var(X1 + X2 + · · · + Xn ) = nσ 2 . 5
3.3
Moment generating functions and transformations of a joint distribution
Similar to the single-variable case, MX,Y (s, t) = E(esX+tY ) The relations are explicitly MX (s) = MX,Y (s, 0) and MY (t) = MX,Y (0, t). If X and Y are independent, then MX,Y (s, t) = MX (s) · MY (t). Expectation values of the moments can be found from the relation ∂ m+n = E(X m Y n ) MX,Y (s, t) ∂sm ∂tn s=t=0
3.4
Transformations and Convolution
Let (U, V ) = Φ(X, Y ) be a differentiable function of two random variables such that known joint probability function. Then the joint pdf of U and V is given by ∂x ∂(x, y) ∂(x, y) where fU,V (u, v) = fX,Y (x(u, v), y(u, v)) = det ∂u ∂y ∂(u, v) ∂(u, v) ∂u
X and Y have a ∂x ∂v ∂y ∂v
Convolution is especially pertinent for the sum of two random variables. It is the weighting of one variable by the other when the two are constrained by a condition such as a sum. The following table summarizes fX+Y (k) = Pr(X + Y = k) for discrete and continuous random variables for independent and dependent X, Y pairs. Note that since X + Y = k, Y = X − k.
General case
Discrete
Continuous
k X
fX,Y (x, k − x)
R∞
fX (x)fY (k − x)
R∞
−∞
fX,Y (x, k − x) dx
x=0
X, Y indpt
k X
−∞
fX (x)fY (k − x) dx
x=0
P If we have a collection of several random variables, Xi , with different weights, αi such that αi = 1, we can construct a new mixed distribution of fX (x) = α1 fX1 (x) + . . . + αn fXn (x). Then the various moments are weighted averages of the individual moments. For example, to find the variance, you must first find the 1st two moments for each Xi , then weight them to get the moments of mixed distribution, then finally compute E(X 2 ) − [E(X)]2 .
3.5
Central Limit Theorem
For any sufficiently large sample size, e.g. ≥ 30, any distribution can be approximated by a normal with the same mean and variance of the original. The practical implication that a sum of independent identically distributed random variables can be approximated by a normal distribution with the same mean and variance as the sum.This means that several of our earlier distributions become normal distributions: b(n, p) NEGBIN(r) Poisson Γ
N (np, npq) rq N ( rq p , p2 ) N (λ, λ) α N(α β , β2 )
Sample Avg. (X)
N (µX ,
6
2 σX n )
In principle, one must be careful about approximating a discrete distribution with a continuous one. We therefore have the continuity correction: Let X be the original discrete distribution and Y the continuous approximation. Then, Pr(X ≤ n) → Pr(X ≤ n + 21 ) ≈ Pr(Y ≤ n + 12 ). Also, Pr(X < n) ≈ Pr(Y < n − 21 ) and Pr(X ≥ n) ≈ Pr(Y > n − 21 ). In practice, though, the difference is small and not likely to be a factor in choosing between multiple choice options on the exam.
3.6
Order Statistics
Let X1 through Xn be a collection of independent random variables. Let Y1 be the smallest of the Xi , and Yn be the largest. The collection of Yi has the same mean and variance as the collection of Xi , but the Y are now dependent. There are two interesting cases: when the smallest Y is larger than a particular value, and when the largest Y is smaller than some limit. Pr(Y1 > y) = sY1 (y) = [sX (y)]n Pr(Yn < y) = FYn (y) = [FX (y)]n If the Xi come from a continuous distribution, then the ordered pdf is fY1 ,Y2 ,...,Yn (y1 , . . . , yn ) = n!fX (y1 )fX (y2 ) · · · fX (yn ) and the k th order is fYk (y) =
h i n−k n! FX (y)k−1 1 − FX (y) fX (y) (k − 1)!(n − k)!
where the bracketed terms represent the probability of k − 1 samples being less than y, the probability of n − k samples being greater than y, and the probability of one sample being in the interval [y, y + dy].
4
Risk Management
Some general definitions that are common to describing losses: X = loss, actual full loss, “ground up” loss Y = claim to be paid E(Y ) = net premium, pure premium, expected claim σY = unitized risk, coefficient of variation µY
4.1
Risk models
The individual risk model considers n policies where the claim for policy i has a random variable Xi . All the Xi are independent and identically distributed, with finite mean and variance. n X S= Xi is the aggregate claim random variable. i=1
E(S) =
X
E(Xi ) = nµ
Var(S) =
i
X
Var(Xi ) = nσ 2
i
p Var(S) 1 σ −−−−→ The coefficient of variation is then =√ n → ∞ 0. E(S) nµ The collective risk model is an extension of the IRM by allowing n to also be a random variable, N . PN Note that often S = i=1 can be approximated as N (nµ, nσ 2 ). If S is the total loss paid to an individual or group, then E(S) is the pure premium for the policy. The actual premium before expenses and profits is given by Q = (1 + θ)E(S) where θ is the relative security loading. 7
4.2
Deductibles and policy limits
Let X represent the loss amount, d, the deductible on the policy, and Y the amount paid on the claim. Ordinary deductible insurance is sometimes called excess loss insurance and has Y = max(X − d, 0) = R∞ R∞ R∞ (X − d)+ . The pure premium is E(Y ) = d sX (x) dx = d (1 − FX (x)) dx = d (x − d)fX (x) dx. There are two common variations on the deductible. The franchise deductible pays in full if loss exceeds the deductible. The disappearing deductible has both upper (dU ) and lower (dL ) deductibles, and the payout increases linearly from zero to full loss between the limits. The franchise deductible:
The disappearing deductible :
X ≤ dL 0 Y = X − d L X X>d Y = dU dL < X ≤ dU dU − dL X X > dU A policy may have a limit of u on the maximum payout on a policy. Then Y = max(X, u) and R∞ Ru E(Y ) = 0 sY (y) dy = 0 sX (x) dx. Note the similarity to the ordinary deductible. An insurance cap specifies a maximum claim amount m on a policy. If there is no deductible, then this is identical to a policy limit. If there is a deductible, then m = u − d is the maximum payout. Proportional insurance pays a fraction, α, of the loss X. For example 80-20 medical plans. When the amount paid is not the same as the loss, the following random variables are sometimes used: ( 0
X≤d
• Y ∗ is the amount paid, conditional on the event that payment was made, sometimes termed payment per payment. • Y is the amount paid per loss. For example, a policy with deductible d has Y ∗ = (X − d)|(X > d)
and Y = max(X − d, 0).
The mean excess loss in this case is E(Y ∗ ) expected loss eliminated from a claim . Note that the loss expected total loss eliminated from payment is made up by the insured (customer).
The loss elimination ratio is defined as
Reinsurance is an insurance policy purchased by an insurance company, primarily to limit catastrophic claims or tax purposes. These policies can have caps, deductibles, and proportional payments.
8
9
Poisson
Hypergeometric
Negative Binomial
λx x! λ
nK M
K M −K x n−x M n e−λ
n/p
x − 1 n x−n p q n−1
1/p
Binomial
pq x−1
np
n x n−x p q x
Geometric
p
n+1 2
E(X)
px q 1−x
fX (x)
Bernoulli
Uniform
Distribution
λ
nK(M − K)(M − n) M 2 (M − 1)
nq/p2
q/p2
npq
pq
n −1 12
2
Var(X)
eλ(e
t
n
−1)
p 1 − qet
Counting events or time between events
Choose n objects from a group of M , partitioned into K and M − K
nth success on xth Bernoulli trial
Perform Bernoulli trials until success
pet 1 − qet
n Bernoulli trials with x successes.
Succeed OR Fail; p is chance of success
Notes and Comments and application
(q + pet )n
q + pet
M (t) nt e e −1 n et − 1 t
Table 1: Common Discrete Distributions
10
1/λ
n/λ
λe−λx
λn e−λx n−1 x Γ(n)
Exponential
Gamma
xσ 2π
2
/2σ 2 ]
Weibull e−[(ln x−µ)
β x β−1 −( αx )β e α α
Beta
1 √
Γ(α + β) α−1 x (1 − x)β−1 Γ(α)Γ(β)
Pareto
Lognormal
α α+β
αθα (x + θ)α+1
Chi-Squared
eµ+σ
2
/2
αΓ(1 + β1 )
θ α−1
e−x/2 xn/2−1 Γ(n/2)2n/2 n
µ
2 2 1 √ e−[(x−µ) /2σ ] σ 2π
b+a 2
Normal
2
2
(eσ − 1)e2µ+σ
2
n
n/2
λ λ−t
1 1 − 2t
σ 2 t2 2
λ λ−t
eµt+
e −e (b − a)t
at
M (t) bt
h i α2 Γ 1 + β2 − Γ2 1 + β1
αβ 2 (α + β) (α + β + 1)
αθ2 (α − 1)2 (α − 2)
2n
n/λ2
1/λ2
σ2
(b − a) 12
Var(X)
E(X)
1 b−a
fX (x)
Uniform
Distribution
Table 2: Common Continuous Distributions
θ x+θ
α
β
ln X ∼ N (µ, σ 2 )
x
sX (x) = e−( α )
Y =
X β α
is an exp. of λ = 1.
• α = β = 1 is U[0,1]. • If x out of n items are defective and the prior dist. is Beta(α,β), then the posterior dist. is Beta(α + x,β + n − x)
sX (x) =
• If Z1 , . . . , Zn is a sample from the std. normal dist., then Z12 + . . . + Zn2 is chi-sq. with n deg. of freedom. • n = 2 is exponential with mean 2 (λ = 0.5)
Add n (independant) exponential distributions together. General form has λ → β and n → α.
Waiting time between failures. sX (t) = e−λt , median = (ln 2)/λ
95th %ile is 1.645; Pr(|Z| ≤ 1.96) = 0.95
Notes and Comments
11
Bivariate Uniform
Bivariate Normal
Multinomial
Distribution npi qi
Var(Xi ) −npi pj
Cov(Xi .Xj )
2 1 z1 + z22 − 2ρz1 z2 2 2(1 − ρ )
npi
E(Xi )
γ=−
fX1 ,...,Xk (x1 , . . . , xk ) n px1 · . . . · pxkk x1 · · · xk 1
1 p eγ , 2 2πσ1 σ2 1 − ρ
Joint density must be 1/Area of region where positive.
Xi ∼ N (µi , σi2 ) (X1 |X2 = x2 ) ∼ N µ1 + ρσ1 z2 , (1 − ρ2 )σ12
zi = (xi − µi )/σi
An experiment with k possible outcomes performed n times.
Notes and Comments and application
Table 3: Common Multivariate Distributions