Identify noise in Correlation and Covariance matrices
This post will describe how to identify and remove noise from a Correlation or Covariance matrix
Practical use: Correlation and Covariance matrices are used as the basis for many quant finance algorithms. If these matrices are corrupted through noise, the results could be invalid. If there is an easy workflow to identify the presence of noise in our dataset, we can then take steps to denoise.
Physical meaning: We first need a template of what to examine, what good looks like, and how we can compare our covariance matrix against this template, to determine if action is required.
What to examine: We will be examining the probability density function of the Eigen values of our covariance/correlation matrices. This is an excellent video on the meaning of probability density function: https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/random-variables-continuous/v/probability-density-functions
What good looks like: The Marcenko–Pastur Theorem tells us that for independent and identically distributed random variables, the Eigen values of the covariance matrix will be distributed as per a particular profile. The Theorem allows us to generate a custom template profile for different values of number of independent variables and observations. This is the absolute genius of this Theorem.
How we can compare: We can determine the covariance matrix, compute the Eigen values, and then can compute the probability density function. This is an excellent article that explains how to compute the probability density function by hand, and the significance of the bandwidth attribute: https://medium.com/analytics-vidhya/kernel-density-estimation-kernel-construction-and-bandwidth-optimization-using-maximum-b1dfce127073
If the probability density function of our Eigen values, does not match the target probability density, then that means we should consider denoising algorithms for our covariance matrix.
Results: Figure 1 shows an instance where the Actual probability density function (PDF) of a covariance matrix matches well with the template PDF predicted by the Marcenko-Pastur Theorem. The code for generating this is provided below
Compute using Python:
Comments
Post a Comment