/trading-nlp

Trading News NLP

Primary LanguageJupyter Notebook

Natural Language Processing for Signal Generation on News Data

Motivation

Stock prices could react significantly to the contents and the presentation of the news release of an impactful story on an industry or company. Hence, news is a great source in the decision-making process of a certain stock's investment. However, the massive amount of news data renders it simply impossible for a human to manually make a systematic approach to trading on news signals. In Finance, every second of the decision-making process matters to result in an effective decision. In our model, we can leverage deep learning in order to train models that provide sentiment scores for headlines, articles, tweets, and posts. These sentiments can produce valuable numerical signals to support a buy/sell/hold decision as well as valuation models.

Frameworks

  • Data Manipulation
    • numpy: Manipulation for large, multi-dimensional arrays and matrices
    • pandas: Manipulation and analysis on numerical tables and time series
  • Machine Learning
    • sklearn: light weight machine learning packaged tools for classification, regression, and clustering algorithms
    • tensorflow: Popular deep neural network frameworks.
    • keras: high-level API to build and train deep learning models in tensorflow.
  • Natural Language Processing
    • nltk: Natural Language Toolkit
    • Word Embedings: files that could be read in to generate the embeding matrix
  • Utility
    • tqdm: Offers progress bar over iterable (e.g. for loop)

Data Source

This project used a open-source dataset from the Figure Eight platform. Link to the dataset: https://www.figure-eight.com/data-for-everyone/

Usage and Examples

Trading_News_NLP_intro.ipynb: this notebook gives a brief introduction about the project itself and some useful concept that you may want to know before looking at the code
Trading_News_NLP_Model.ipynb: this notebook will guide you through the code for our sentiment analysis model. we provide detailed explanation of each part of our code and some graphs demonstrating the structure of the Nerual Network.
Trading_News_NLP_Strategy.ipynb: this final notebook gives an example of how to use the sentiment scores generated by our model.

Results

Our model achieved 57% accuracy on predicting three sentiment labels on the news dataset.

Futrue works

  1. Generate our own dataset for sentiment analysis
  2. Add state of art NLP techniques (Attention, transformer) to our model
  3. Build the model into a system which can automatically generate sentiment signals by scraping data from the internet
  4. Try other NLP methods on news data (for example: Topic mining)

License

haven't decided