End-to-end solution for analysing social media contents
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
This is a simple and scalable end-to-end solution that analyses tweets & users of selected topics. Here, tts made to be crypto / blockchain related, such as crypto KOLs, applications and coins. It deploys
- Scalable crawler that crawls
- Tweets of topics mentioning influencers such as Elon Musk, Vitalik, applications such as DeFi, Metaverse, coins suchs as $BTC, $ETH, $ADA
- Users mentioning the tweets above, are snapshotted daily
- Stream ingestion of crawled data into data warehouse
- Tweets data
- Users data
- Clean and transform data into dimensions that models
- Coin interests - grouped by crypto KOLs
- Coin interests - grouped by crypto applications
- BI Dashboard - shows insights of coin interests in different segment of social media users
Currently, the setup can only be done in a single machine. Everything is in containers, runs on docker-compose, so you will need docker installed.
You will need docker engine & docker-compose cli such as docker-compose. I am on MacOS and use a combination of colima + docker-compose
-
docker-compose cli
$ brew install docker-compose
You will need a Google Cloud Platform account in order to setup BigQuery and the credentials in order for it to work
You will need python libraries and dependency manager such as pip. I use poetry
-
$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
-
Clone the repo
$ git clone https://github.com/koksang/social-media-analysis.git
-
To install dependencies locally especially DBT, you can do
-
Poetry
$ poetry install
Or if you use pip
- Pip
$ poetry export -f requirements.txt --without-hashes $ pip install -r requirements.txt ---> 100%
-
-
Start
docker-compose.yaml
by doing as below. It will start in detached mode$ docker-compose -f infra/docker-compose.yaml up -d
-
Set environment variables
-
Google Cloud specific
$ export GOOGLE_APPLICATION_CREDENTIALS=${YOUR PATH TO GOOGLE_APPLICATION_CREDENTIALS} $ export GOOGLE_PROJECT_ID=${YOUR GOOGLE_PROJECT_ID}
-
Todo
- Deploy BI Dashboard for insights visualization
- Finalize docker-compose deployment
- Convert docker-compose deployment into K8s
- Use other BI dashboards (such as: apache/superset)
- Crawl more
- Nested tweet replies
- Top trends
- User profiles snapshot
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.