Data-Pipeline-for-Portfolio-Management

Project Overview

This is a robust data engineering solution focused on streamlining the collection, transformation, and analysis of financial data specific to gold as a commodity. This project aims to empower investment portfolio managers with timely and accurate insights for making informed decisions in the world of precious metals investments, gold against the us dollar.

What I learned

handling market data (ohlc)
web scraping using python
sentiment analysis using LLMs
text summarization

Architecture

Pipelines in detail

The news data pipeline

Scrapes news articles from a website.
Performs sentiment analysis on the articles.
Uploads the resulting datasets to AWS S3.

The market data pipeline

Scrapes ohlc market data via the twelvedata api.
Adds a new column based on the difference between the open price and close price.
Uploads the resulting datasets to AWS S3.