Posts

Showing posts from September, 2020

How to break out of for loop python

 https://www.digitalocean.com/community/tutorials/how-to-use-break-continue-and-pass-statements-when-working-with-loops-in-python-3

Precision vs Recall

I wanted to link to an excellent tutorial on Precision vs Recall, and its harmonic mean F1 score  https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/

The secret to labeling financial data

This post will describe the secret of labeling financial data To avoid the curse of dimensionality, it is advisable to reframe a regression problem into a classification problem where possible.  Fixed horizon method: Fixed horizon method is a popular method to do this. There are two components to this, a. generating the fixed horizon, and b. labeling the data. Generating the fixed horizon is done through the use of bars, time bars/tick bars/volume bars/dollar bars. The bars other than time bars are preferred because the return series is closer to the normal distribution. Labeling the data is done by setting up a return threshold, t. If the return, r, over a fixed number of bars is less than -t, label the bar as -1, between -t and t label 0, and beyond t label 1 A variation of the above method, that is very appealing, is to replace the raw return r with the standardised return z, which is the return that is adjusted for the volatility predicted over the interval of bars we are calculati

Normalised Mutual information replaces Correlation

Image
The objective of this post is to introduce normalised Mutual Information as a better metric of co-dependency between two variables Physical meaning: Correlation is a good measure of co-dependency as it is bounded between [0 to 1], and is simple to understand. However, it suffers from a drawback in that it only measures the linear relationship between two variables. If the relationship between two variables is non-linear, as is frequently the case in financial data, we will have low values of correlation, though the variables maybe perfectly predictable using a non-linear function.  Normalised Mutual information is a standardised measure that overcomes these drawbacks. It is based on Information Theory, as opposed to correlation [Linear Algebra], and can accurately quantify the extent to which a relationship exists between two variables [Linear or Non-Linear]. We can then use ML techniques to model the relationship Experimental results: Using Python code below 1. Zero Linear Relationshi