Information Theory approach to estimate a Random variable

This post will describe an Information Theoretic approach to estimating a random variable. This approach again highlights how important Electrical & Electronics Engineering concepts are applicable to Capital Markets.

1. Caveats of correlation: Correlation is not a useful metric of co-dependency between two variables if:

  1. If there is a non-linear relationship between the variables
  2. If there are significant outliers that skew the correlation metric
  3. If the two variables do not jointly follow a bi-variate normal distribution
If correlation is judged to be a useful metric instead of its usual form, it is worth using it in a modified form to judge the distance between two metrics

2. Shannon's Information-Theoretic approach: We need a metric that highlights the degree of randomness in a random variable. E.g. the degree of sunlight at a point throughout a 24 hour period is a random variable, but there is a low degree of randomness in this random variable. The entropy in a random variable can be defined. Let X be a discrete random variable that can take values x within a set Sx, with probability Px for each value of x. The Entropy of this random variable is measured as:
Entropy is 0 when all the probability is concentrated within a single value of x within the set Sx. (Note log(1) = 0), and Entropy is maximum when Px is equal for each value of x (i.e. uniform distribution).

3. Physical meaning:

1. Marginal entropy represents the degree of randomness in the random variable. It can be estimated using the histogram of the random variable, say X
2. If we have another random variable, say Y, we can compute the Mutual Information Score from the joint histogram or joint probability distribution of the two variables. 
3. The joint entropy of two random variables, is the sum of the marginal entropy of each variable less their Mutual Information Score. If the mutual information score is zero, then there is no particular benefit in knowing one random variable to predict the value of the other random variable
4. The conditional entropy of a random variable, X, is it's state of minimum randomness, when we apply insights gained from the joint probability distribution of another random variable, Y, and we pass on the specific value of other random variable, Y. At this stage, the state of randomness of random variable X, is as low as it gets.
5. If a random variable follows the standard normal distribution, its entropy H[X] = 1.42. This is a useful metric against which to assess other values of entropy
6. Variation of Information: In the context of unsupervised learning, variation of information is useful for comparing outcomes from a partitional (non-hierarchical) clustering algorithm.
7. The mutual information quantifies the amount of information shared by two random variables. The normalized mutual information takes real values within the range [0 to 1], like the absolute value of the correlation coefficient. We can consider the normalized mutual information as the information-theoretic analogue to linear algebra’s correlation coefficient


3. Compute using Python:
# Shannon.py
# Import Libraries
import numpy as np
import scipy.stats as ss
from sklearn.metrics import mutual_info_score

# Generate data
x = np.random.normal(0,1,100)
y = np.random.normal(0,1,100)

# Generate the joint probability distribution
cXY = np.histogram2d(x,y) [0]

# Compute Marginal Entropy by passing the probability distribution of X, represented as the Histogram of X
# Marginal Entropy represents the degree of randomness in the random variable
hX = ss.entropy(np.histogram(x) [0])
hY = ss.entropy(np.histogram(y) [0])

# Compute the Mutual information score by passing the joint probability distribution
# Mutual information represents the decrease in uncertainty or the increase in the certainty
# of the value of a random variable by knowing the value of another random variable
iXY = mutual_info_score(None,None,contingency=cXY)
iXYn = iXY/min(hX,hY)

# Joint Entropy
# The degree of randomness of the joint distribution will the sum of the randomness
# of each variable less the level of mutual information by virtue of knowing the
# Joint distribution of the random variables
hXY = hX + hY -iXY

# Conditional Entropy
# The degree of randomness of a random variable, when the know the joint distribution
# and when we know the specific value of the previous random variable
hX_Y = hXY - hY
hY_X = hXY - hX

# Variation of Information
VI = hX + hY - 2*iXY ## Variation of information is also equal to hXY - iXY
VIn = VI/(hX+hY-iXY) ## Variation of information can be normalised so it is comparable by dividing Variance of information
# by hXY

print(hX,hY, hXY, hX_Y,hY_X,VI,VIn)
#print(cXY)





Comments

Popular posts from this blog

1.2 Structured Data: Information Driven Bars

2.1 Labeling: Fixed Horizon Method

2.2 Labeling: Triple barrier method