# Cholesky decomposition

## Linear Algebra and probability distributions #

Well those were two things I definitely thought that I wouldn't be needing

again after graduating university. Turns out I was very wrong, and I wish I

paid more attention learning probability, and applying linear algebra. In my

experience so far, the intuition that you can gain from studying probability

and linear algebra is actually quite useful in the field of anything

numerical. If you ever need to look at a large data sample, and figure out if

variables are correlated, or how to best classify the information, you will

most likely run into some form of probability and linear algebra.

Given information in a dataset, for example the `mtcars`

dataset within R. We

can analyze the correlation coefficients of each variable by using the handy

`cor`

function within R. `df = mtcars hist(df$hp) cor_df = cor(df)`

The correlation matrix that is returned displays a matrix where each variable

is given a correlation coefficient that displays the dependence of each

variable pair permutation. The default method is using Pearson correlation

coefficients, which is a procedure which comes with it's set of assumptions.

Using the Pearson method comes from the assumption that the data you are

looking at is somewhat a normally distributed dataset and linearlly dependent.

This means that each variable in the dataset follows a normal distribution

roughly, and we can assume that each variable is linear in relation to the

other variables.

It's useful to look at the distributions of your data as part of your

exploratory analysis, this way it will give you insight into outliers if they

occur in the future. Or if your data starts off looking normal, and over time

there is a skewing trend towards another distribution.

## Cholesky decomposition #

One of the most important things you learn without knowing (at least I didn't

know) in your introductary linear algebra class is how to decompose matricies.

LU decomposition is the basic one that everybody learns by hand, because this

method is easy to teach by handle, and easy to compute by hand. Decomposition

solves the equation of: `A = Bx`

Solving for the matrix B will let you solve

for your variable of choice. But LU decomposition isn't actually the most

efficient, or the fastest if you are able to use a computer. This is where

something called the Cholesky decomposition comes into play. As an example,

using `mtcars`

say we know that some model of cars are correlated with each

other, and we want to find out that correlation. A Honda Civic and a Toyota

Corolla are pretty similar right? For the sake of this example, say we had

more continuous information in the dataset.

```
library(reshape2)
df.wide = dcast(df, cyl+disp+hp ~ rownames(df), value.var = 'mpg')
```

Now the df.wide dataframe will contain the information that shows wide format

data of each car model and it's mpg. Using this information, we can make some

sort of statement where we are wondering about what kind of mpg we will get

given a car model, a cylinder number, a displacement number, and horsepower.

## Predicting mpg based on car model features #

Given the wide format information, we can use that to create our correlaton

matrix. The correlation matrix is used in the Cholesky decomposition process,

to transform a matrix describing correlation coefficients into a lower

triangle matrix that is then used to project the correlation onto another

variable.

These variables in our case are going to be error distributions. This way, we

can make predictions on the mpg a car will get, look at the error distribution

of the car, and be able to create a joint distribution that contains the

correlations of the mpg error of each car model. While writing this I

understand that this probably sounds very confusing, it's because I'm not too

clear about it myself, and I haven't picked the best example to work with.

Where Cholesky decomposition is used widely is in the financial sector, where

they take correlation of products, and then use Cholesky decomposition in

order to form a joint distribution that gives correlation of errors, then

apply this joint distribution in simulating prices over a time frame. This

will allow for simulation to see how prices move, with correlation of products

built in.