Problem Statement

Fraud in telecommunications is a major problem that costs the industry billions of dollars every year. Fraudsters use various techniques to exploit weaknesses in the telecoms infrastructure, including hacking into phone systems, stealing identities, and exploiting vulnerabilities in billing systems. The challenge for telecoms companies is to detect and prevent fraud in real-time, before it causes significant financial damage. Your task is to develop a real-time data visualization dashboard that monitors Twitter for mentions of telecoms fraud and visualizes the data in an interactive and informative way.

Project Requirements

  1. Connect to Twitter API and collect real-time tweets related to telecoms fraud.
  2. Process the tweets to extract useful information, including the tweet text, user name, location, and date/time.
  3. Analyze the data to identify patterns and trends related to telecoms fraud.
  4. Use Streamlit to create an interactive data visualization dashboard that displays real-time information about telecoms fraud.
  5. The dashboard should include at least one chart or graph that displays the data in a meaningful way, e.g., a bar chart showing the number of fraud mentions by location or a line chart showing the frequency of fraud mentions over time.
  6. The dashboard should be easy to use and visually appealing, with clear and concise labels and instructions.

Project Steps

  1. Set up a Twitter Developer Account and create an app to access the Twitter API.
  2. Install the necessary Python packages, including tweepy, pandas, and Streamlit.
  3. Write a Python script to connect to Twitter API and collect real-time tweets related to telecoms fraud. The script should use the tweepy library to authenticate the API access and collect the tweets using the Twitter Streaming API.
  4. Process the tweets to extract useful information, including the tweet text, user name, location, and date/time. Store the data in a Pandas DataFrame for further analysis.
  5. Analyze the data to identify patterns and trends related to telecoms fraud. Use Pandas and NumPy libraries to perform statistical analysis on the data and visualize the results using Matplotlib or Seaborn libraries.
  6. Use Streamlit to create an interactive data visualization dashboard that displays real-time information about telecoms fraud. The dashboard should include at least one chart or graph that displays the data in a meaningful way, e.g., a bar chart showing the number of fraud mentions by location or a line chart showing the frequency of fraud mentions over time.
  7. Test the dashboard by running it locally and ensure that it updates in real-time as new tweets are collected.
  8. Deploy the dashboard to a cloud-based platform such as Heroku or AWS to make it publicly accessible.

Deliverables

  1. Python script to collect and process real-time tweets from Twitter API.
  2. Interactive data visualization dashboard created using Streamlit.
  3. Deployment of the dashboard to a cloud-based platform.