/FinanceAnalysis

Primary LanguageJupyter NotebookMIT LicenseMIT

Project FinanSP

FinanSP is a project aimed at studying the use of Apache Spark (dataframes, machine learning, structured streaming) to an application related to finance. The main development tool is PySpark.

Project goals

The primary goals of the project are:

  • Analyze websites that supply financial data to determine what information is available for free through an API.
  • Develop dataframes-based PySpark applications to get simple statistical data (mean, median, statistical deviation, max, min, etc.) of the stock information of some selected companies.
  • Design a study to predict the stock market price of a company using machine learning techniques.
  • Apply clustering algorithms to a group of companies.
  • Develop a streaming application that gathers information from one or more data sources.
  • Use of GitHub features: version control, wiki, project, pull request.

Project development

The project will be developed in three iterations:

  • Iteration 1 (one week): Web site data analysis, data adquisition, simple statistical processing, development of prototypes with simulated data.
  • Iteration 2 (two weeks): Data wrangling, development of prototypes with real data.
  • Iteration 3 (two weeks): Development of the prediction, clustering, and streaming applications with real data.
  • Final reports: Summary of the conducted work.