/NLP-QA

Cryptocurrency Market Analysis and Question Answering System

Primary LanguagePythonMIT LicenseMIT

🚀 Cryptocurrency Market Analysis and Question Answering System

This repository contains a project that scrapes cryptocurrency price analysis data from TradingView, processes the text using Natural Language Processing (NLP) techniques for normalization, and implements a Question-Answering (QA) system using Retrieval-Augmented Generation (RAG) to answer questions about cryptocurrency trends, particularly focusing on Bitcoin (BTC) and Ethereum (ETH).

Table of Contents

Overview

Overview

This project automates the process of:

  1. Scraping crypto market analysis articles from TradingView for BTC and ETH.
  2. Normalizing the scraped textual data by cleaning it, removing unnecessary elements like emojis and stop words, and preparing it for further processing.
  3. Building a Question-Answering System using Retrieval-Augmented Generation (RAG), where users can ask questions about the future trends and market outlook for Bitcoin and Ethereum, and the system retrieves relevant articles and generates a detailed answer.

Features

  • Real-Time Scraping: Retrieves the latest cryptocurrency analysis articles from TradingView.
  • Text Normalization: Cleans the raw data by removing emojis, stop words, and normalizing the text for consistent results.
  • Question Answering (RAG): Implements a question-answering system that retrieves relevant articles and generates human-like answers.
  • Multiple Models: Leverages state-of-the-art models like T5 for text generation and FAISS for document retrieval.

Required Files

  1. BTC_normalized.txt and ETH_normalized.txt should contain your cleaned and normalized data.
  2. If you don't have these files, the script will generate them through the scraping and normalization steps.

Usage

Web Scraping

The scraping component fetches market analysis data from TradingView for both Bitcoin and Ethereum.

Run the scraping script:

python scrapping.py

This will save raw market analysis data to:

  • data/BTC.txt
  • data/ETH.txt

Text Normalization

To normalize the scraped data (remove stop words, emojis, etc.), run the text normalization script:

python textNormalization.py

The cleaned data will be saved in:

  • data/BTC_normalized.txt
  • data/ETH_normalized.txt

Question Answering System (RAG)

To use the Question Answering system, which uses RAG (Retrieval-Augmented Generation), run:

python indexing.py

Then you can query the model with questions like:

  • "What is the market outlook for Bitcoin in 2024?"
  • "What is the expected price of Ethereum after the elections?"

The model will return a detailed, generated answer based on the latest retrieved documents and data.

Example:

Q: What is the market outlook for Ethereum after the elections?
A: Ethereum might correct to the $2800-$3000 range due to market uncertainty following the elections, with potential short-term declines expected.

Technologies Used

  • Web Scraping: BeautifulSoup, requests
  • Natural Language Processing: nltk, spacy, regex
  • Machine Learning Models: Hugging Face Transformers (T5, BART), Sentence-BERT
  • Vector Search: FAISS (Facebook AI Similarity Search)
  • Text Generation: T5 (via transformers library)
  • Other Tools: pandas, numpy

Future Improvements

  • Fine-Tuning the Model: Fine-tune the generative model on crypto-specific datasets to improve the relevance of generated answers.
  • Domain-Specific Models: Implement financial or crypto-specific models like FinBERT for better performance.
  • Interactive Web Interface: Build a web interface using Streamlit or Flask to allow users to interact with the system in real-time.
  • Data Visualization: Add visualizations for the trends and predictions generated by the system.

Contributing

Contributions are welcome! If you'd like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new feature branch (git checkout -b feature/new-feature).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/new-feature).
  5. Open a Pull Request.

Please ensure that your code adheres to the project's style guidelines and is well-tested.

License

This project is licensed under the MIT License. See the LICENSE file for details.