This project works as a data extractor for social networks. Once you're logged in with the social network you can use either an API or a UI to query topics.
This work has been done in the context of a postgraduate course about data intelligence on big data at the University of La Plata. The idea is to build software that is able to store lot of data (GBs or TBs) with an acceptable performance.
On this first version, the stack is as follows:
Spring Data Portal
: It's the entry point used to log in with Twitter. The session is stored inredis
. It orchestratesSocial Data Extractor
andSocial Data Persistor
.Social Data Extractor
: It's the connection to social networks. It's aspring-boot
microservice that usesspring-social
to interact withTwitter
. It queries tweets and returns them in aTopicResult
format without storing anything - it's all in memory.Social Data Persistor
: It's the persistence layer. It storesTopicResults
in a NoSQL database (Cassandra
in this case). It also stores the data onElasticSearch
, so we can search for results faster.
The portal and the API by default gets entries from ElasticSearch
. If needed, it's possible to pass a parameter to the API so it fetches new results about that topic.
- Run
mvn clean install -DskipTests
-> tests pass only if you have cassandra, redis and elasticsearch running. Spring Data Portal
: Replace changeme app secret for twitter with your app secret.Spring Data Extractor
: Replace changeme app secret for twitter with your app secret.- Launch a
Cassandra
instance Spring Data Persistor
: Point to yourCassandra
instance- Launch a
Redis
instance Run all the services
- Start elasticsearch:
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.2.2
- Start cassandra:
docker run --name cassandra-container -p 9042:9042 -p 9160:9160 -e CASSANDRA_START_RPC=true -d cassandra:2.2.12
- Start redis:
docker run -d redis -p 6379:6379
- Replace changeme app secret for twitter with your app secret.
mvn clean install
java -jar social-data-persistor-service/target/social-data-persistor-service.jar
java -jar social-data-portal/target/social-data-portal.jar
java -jar social-data-extractor-service/target/social-data-extractor-service.jar
- In chrome ->
http://localhost:8080/words?searchTerm=lollapalooza
- Enjoy!
https://stackoverflow.com/questions/26909408/export-cassandra-query-result-to-a-csv-file
- The session stored in redis works also with browsers without a session. This means that once someone logged in with a social network, all the users are going to work with that session.
Add streaming support Integration Tests to run cassandra, redis and elasticsearch embedded