/Two-Sigma-Financial-Modelling-Challenge

Uncover predictive value in an uncertain world

Primary LanguagePython

TWO SIGMA FINANCIAL MODELING

Can you uncover predictive value in an uncertain world?

This project was run on Kaggle (https://www.kaggle.com/c/two-sigma-financial-modeling)

How can we use the world’s tools and intelligence to forecast economic outcomes that can never be entirely predictable? This question is at the core of countless economic activities around the world – including at Two Sigma Investments, who has been applying technology and systematic strategies to financial trading since 2001.

For over 15 years, Two Sigma has been at the forefront of applying technology and data science to financial forecasts. While their pioneering advances in big data, AI, and machine learning in the financial world have been pushing the industry forward, as with all other scientific progress, they are driven to make continual progress. Through this exclusive partnership, Two Sigma is excited to explore what untapped value Kaggle's diverse data science community can discover in the financial markets.

Economic opportunity depends on the ability to deliver singularly accurate forecasts in a world of uncertainty.By accurately predicting financial movements, Kagglers will learn about scientifically-driven approaches to unlocking significant predictive capability. Two Sigma is excited to find predictive value and gain a better understanding of the skills offered by the global data science crowd.

Dataset

Data set can be dowloaded from this link data

This dataset contains anonymized features pertaining to a time-varying value for a financial instrument. Each instrument has an id. Time is represented by the 'timestamp' feature and the variable to predict is 'y'. No further information will be provided on the meaning of the features, the transformations that were applied to them, the timescale, or the type of instruments that are included in the data. Moreover, in accordance with competition rules, participants must not use data other than the data linked from the competition website for the purpose of use in this competition to develop and test their models and submissions.

Data is saved and accessed as a .h5 file in the Kernels environment. We have used the .h5 file format instead of the standard .csv format to achieve faster read speeds. The training set file is available for download and offline modeling outside of Kernels. The test set is not available for download. Test set was not available as it was code competition.