Identify noise in Correlation and Covariance matrices

August 07, 2020

This post will describe how to identify and remove noise from a Correlation or Covariance matrix

Practical use: Correlation and Covariance matrices are used as the basis for many quant finance algorithms. If these matrices are corrupted through noise, the results could be invalid. If there is an easy workflow to identify the presence of noise in our dataset, we can then take steps to denoise.

Physical meaning: We first need a template of what to examine, what good looks like, and how we can compare our covariance matrix against this template, to determine if action is required.

What to examine: We will be examining the probability density function of the Eigen values of our covariance/correlation matrices. This is an excellent video on the meaning of probability density function: https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/random-variables-continuous/v/probability-density-functions

What good looks like: The Marcenko–Pastur Theorem tells us that for independent and identically distributed random variables, the Eigen values of the covariance matrix will be distributed as per a particular profile. The Theorem allows us to generate a custom template profile for different values of number of independent variables and observations. This is the absolute genius of this Theorem.

How we can compare: We can determine the covariance matrix, compute the Eigen values, and then can compute the probability density function. This is an excellent article that explains how to compute the probability density function by hand, and the significance of the bandwidth attribute: https://medium.com/analytics-vidhya/kernel-density-estimation-kernel-construction-and-bandwidth-optimization-using-maximum-b1dfce127073

If the probability density function of our Eigen values, does not match the target probability density, then that means we should consider denoising algorithms for our covariance matrix.

Results: Figure 1 shows an instance where the Actual probability density function (PDF) of a covariance matrix matches well with the template PDF predicted by the Marcenko-Pastur Theorem. The code for generating this is provided below

Compute using Python:

Search This Blog

Machine Learning in Finance

Identify noise in Correlation and Covariance matrices

Comments

Post a Comment

Popular posts from this blog

PyCharm Productivity Hacks

2.2 Labeling: Triple barrier method

1.2 Structured Data: Information Driven Bars