Machine Learning in Finance

Posts

2.2 Labeling: Triple barrier method

February 11, 2021

This post will describe how to label a financial dataset to support supervised learning The post is directly based on content from the book "Advances in Financial Machine Learning" from Marcos Lopez de Prado Physical meaning: Algorithm description: Python code: # Import libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import math # Import data df = pd.read_csv( r'C:\Users\josde\OneDrive\Denny\Deep-learning\Data-sets\Trade-data\ES_Trades.csv' ) df = df.iloc[: , 0 : 5 ] df[ 'Dollar' ] = df[ 'Price' ] * df[ 'Volume' ] print (df.columns) # Generate thresholds d = pd.DataFrame(pd.pivot_table(df , values = 'Dollar' , aggfunc = 'sum' , index = 'Date' )) DOLLAR_THRESHOLD = ( 1 / 50 ) * np.average(d[ 'Dollar' ]) # Generate bars def bar_gen (df , DOLLAR_THRESHOLD): collector , dollarbar_tmp = [] , [] dollar_cusum = 0 for i , (price , dollar) in enumerate ( zip (df[ 'Price

2.1 Labeling: Fixed Horizon Method

February 11, 2021

This post will describe how to assign labels to bars (financial features) to support supervised learning. Upon successful training, the algorithm will be able to predict the probability of a label once a bar (financial feature) has been observed. The post is directly based on content from the book "Advances in Financial Machine Learning" from Marcos Lopez de Prado Physical meaning: Algorithm description: Python code: import numpy as np import pandas as pd import matplotlib.pyplot as plt # Import data df = pd.read_csv( r'C:\Users\josde\OneDrive\Denny\Deep-learning\Data-sets\Trade-data\ES_Trades.csv' ) df = df.iloc[: , 0 : 5 ] df[ 'Dollar' ] = df[ 'Price' ]*df[ 'Volume' ] # Generate thresholds d = pd.DataFrame(pd.pivot_table(df , values = 'Dollar' , aggfunc = 'sum' , index = 'Date' )) DOLLAR_THRESHOLD = ( 1 / 50 )*np.average(d[ 'Dollar' ]) # Generate bars def bar_gen (df , DOLLAR_THRESHOLD): collector , dollarb

1.3 Structured Data: Event driven bars

January 29, 2021

This post will describe how to process & store High Frequency financial data to support your Machine Learning Algorithm, based on events that happen in the data The post is directly based on content from the book "Advances in Financial Machine Learning" from Marcos Lopez de Prado Physical meaning: The inspiration for event driven bars, is really Control Theory, where control limits (upper & lower) are placed around the mean outcome of a particular system. The outcome of a system will be variable, but the system is determined to be in control if the outcome remains within these upper & lower control limits. When the system is outside these limits, the system is determined to be out of control, and demands attention. In event driven bars, we attempt to compress the financial data, by sampling only when some valuable financial outcome is achieved, e.g. a target level of return is achieved, or a spike in volatility happens. Here, we know that financial data is inheren

1.2 Structured Data: Information Driven Bars

January 28, 2021

This post will describe how to process & store High Frequency financial data to support your Machine Learning Algorithm, based on how much information is contained in the data The post is directly based on content from the book "Advances in Financial Machine Learning" from Marcos Lopez de Prado Physical meaning: This approach is inspired by Claude Shannon's Information Theory, that is the basis of most compression algorithms. Shannon argued that only departure from the norm is new information. A famous example, is that the sun has risen in the morning. Yes, the event has happened, but because the probability of this event was 1.0, this event does not represent information. In information driven bars, we attempt to use the same insight, to compress the financial data. Here, we know that financial data is inherently noisy, and put guardrails around what level of variance we expect for a particular behavior. It is only when the variance exceeds these guardrails, that we

1.1 Structured Data: Standard Bars

January 15, 2021

This post will describe how to process & store High Frequency financial data to support your Machine Learning Algorithm, based on statistical properties of the data The post is directly based on content from the book "Advances in Financial Machine Learning" from Marcos Lopez de Prado Physical meaning: High frequency financial data is voluminous and data hungry. E.g. 20 days worth of High frequency (tick by tick or trade by trade) financial data makes up 5 million records (Excel has an internal limit of 1 million records), and will make up a file size of 300MB. Unless the data is summarised, we will quickly run into data processing limitations, as we process months/years of data. The data that appears, on financial charts, on popular websites, is such summarised data. Visually, we do not lose out on anything significant by working on such summarised data Algorithm description: The basic idea is to slice the raw high frequency data into slices, based on a consistent rule.

The Process

January 14, 2021

This post will describe the overall process to craft an investment strategy. It is a 10 step process. Each step will be described separately. I will update this page with an overview, after I have completed all steps. The posts describe the work of Marcos De Prado Structured Data Standard Bars Time bars Tick bars Volume bars Dollar/Dynamic Dollar bars Information Driven Bars Tick imbalance bars Volume/Dollar Imbalance bars Tick run bars Event driven bars Structural breaks Cusum Tests Explosiveness Tests Right Tail Unit root tests Sub/super Martingale Tests Entropy features Microstructural features Tick rule Roll model High/Low volatility estimator VPIN Distribution of Order Sizes Cancellation Rates, Limit Orders, Market Orders Labeling Fixed Horizon Method Triple Barrier Method Trend Scanning Method Meta Labeling Weighting Fixed-width Window fracdiff Ensembles Bagging Random Forest Ada Boost Cross Validation K-Fold Cross Validation Feature Importance Mean Decrease Impurity Mean Decreas

Search This Blog