## Linear Algebra and probability distributions

Well those were two things I definitely thought that I wouldn’t be needing again after graduating university. Turns out I was very wrong, and I wish I paid more attention learning probability, and applying linear algebra. In my experience so far, the intuition that you can gain from studying probability and linear algebra is actually quite useful in the field of anything numerical. If you ever need to look at a large data sample, and figure out if variables are correlated, or how to best classify the information, you will most likely run into some form of probability and linear algebra.

Given information in a dataset, for example the `mtcars`

dataset within R. We can analyze the correlation coefficients of each variable by using the handy `cor`

function within R.
```
df = mtcars
hist(df$hp)
cor_df = cor(df)
```

The correlation matrix that is returned displays a matrix where each variable is given a correlation coefficient that displays the dependence of each variable pair permutation. The default method is using Pearson correlation coefficients, which is a procedure which comes with it’s set of assumptions. Using the Pearson method comes from the assumption that the data you are looking at is somewhat a normally distributed dataset and linearlly dependent. This means that each variable in the dataset follows a normal distribution roughly, and we can assume that each variable is linear in relation to the other variables.

It’s useful to look at the distributions of your data as part of your exploratory analysis, this way it will give you insight into outliers if they occur in the future. Or if your data starts off looking normal, and over time there is a skewing trend towards another distribution.

## Cholesky decomposition

One of the most important things you learn without knowing (at least I didn’t know) in your introductary linear algebra class is how to decompose matricies. LU decomposition is the basic one that everybody learns by hand, because this method is easy to teach by handle, and easy to compute by hand. Decomposition solves the equation of:
```
A = Bx
```

Solving for the matrix B will let you solve for your variable of choice. But LU decomposition isn’t actually the most efficient, or the fastest if you are able to use a computer. This is where something called the Cholesky decomposition comes into play. As an example, using `mtcars`

say we know that some model of cars are correlated with each other, and we want to find out that correlation. A Honda Civic and a Toyota Corolla are pretty similar right? For the sake of this example, say we had more continuous information in the dataset.

```
library(reshape2)
df.wide = dcast(df, cyl+disp+hp ~ rownames(df), value.var = 'mpg')
```

Now the df.wide dataframe will contain the information that shows wide format data of each car model and it’s mpg. Using this information, we can make some sort of statement where we are wondering about what kind of mpg we will get given a car model, a cylinder number, a displacement number, and horsepower.

## Predicting mpg based on car model features

Given the wide format information, we can use that to create our correlaton matrix. The correlation matrix is used in the Cholesky decomposition process, to transform a matrix describing correlation coefficients into a lower triangle matrix that is then used to project the correlation onto another variable.

These variables in our case are going to be error distributions. This way, we can make predictions on the mpg a car will get, look at the error distribution of the car, and be able to create a joint distribution that contains the correlations of the mpg error of each car model. While writing this I understand that this probably sounds very confusing, it’s because I’m not too clear about it myself, and I haven’t picked the best example to work with.

Where Cholesky decomposition is used widely is in the financial sector, where they take correlation of products, and then use Cholesky decomposition in order to form a joint distribution that gives correlation of errors, then apply this joint distribution in simulating prices over a time frame. This will allow for simulation to see how prices move, with correlation of products built in.