/data-science

Primary LanguageJupyter NotebookMIT LicenseMIT

SoMe - Social Media Management Platform

MIT

You can find the project at SoMe

Contributors

Labs 24

Amin Azad Jacob Padgett Lawrence Kimsey

Labs 23

Andrew Lowe Sarah Xu Jud Taylor

Project Overview

Trello Board

Product Canvas

So-Me is a social media management tool for small businesses and tech professionals. Users of So-Me will be able to post to any of their company's major social media platforms (LinkedIn, Instagram, Facebook, Twitter) from the app, supported by a simple to use drag and drop design. Our app will provide users with optimal time recommendation's for posting, keywords user can use to increase user engagement and feedback on drafted posts using their follower's engagement data.

Deployed Front End

Tech Stack

Languages: Python

Frameworks: FastAPI

Services: AWS, Docker, Jupyter Notebooks, Postman, TablesPlus

Below is an annotated breakout of the Cloud Architecture for SoMe.

Models:

Topic Modeling

Topic Modeling is a technique to extract the hidden topics from large volumes of text.Our team used an Latent Dirichlet Allocation (LDA) model from Gensim Python package to generate the most important words drawing engagement from user followers. One of our main challenges was how to extract good quality of topics that are clear, segregated and meaningful. This depends heavily on the quality of text preprocessing and the strategy of finding the optimal number of topics. To improve the quality of the text we recieved from the Twitter API we used various techniques such as extensive data wrangling by cleaning tweets from emojies and html marks, combining Spacy, Gensim and Wordcloud stop word libraries into one library, add our custom stop words and lemmitizing all the text. After generating topics we used pyLDAvis package to visualize all the topics, computed coherence scores and then worked through getting to optimal number of topics.

Explanatory Variables:

  • The time followers engage with posts
  • Follower Engagement data
  • Tweets Followers engaged with the most

Data Sources

Python Notebooks

Data Wrangling class

Topic Modeling

Sentiment Analysis

How to connect to the DS API

route description
GET: / Verifies the API is deployed, and links to the docs.
POST: /recommend With Twitter handle input, returns optimal post time.
POST: /topic_model/schedule With Twitter handle input, returns topic modeling processing time.
POST: /topic_model/status Returns status of topic modeling process.
POST: /topic_model/get_topics Returns a dictionary of all topics and a list of keywords.
POST: /engagement Returns a dictionary of calculated engagement values from users tweets over 30 days.

Go to https://api.so-me.net/docs for more information and to test these endpoints.

POST Request for scheduling topic modeling :

API Request URL:

https://api.so-me.net/topic_model/schedule

API Request Body:

{
  "twitter_handle": "dutchbros",
  "num_followers_to_scan": 500,
  "max_age_of_tweet": 7,
  "words_to_ignore": [
    "shooting",
    "violence"
  ]
}

API Response:

{
  "success": true
}

POST Request for topic modeling status:

API Request URL:

https://api.so-me.net/topic_model/status

API Request Body:

{
  "twitter_handle": "dutchbros"
}

API Response:

{
  "success": true,
  "queued": false,
  "processing": true,
  "model_ready": true
}

POST Request for getting topic modeling results:

API Request URL:

https://api.so-me.net/topic_model/get_topics

API Request Body:

{
  "twitter_handle": "dutchbros"
}

API Response:

{
  "topics": {
    "1": [
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "..."
    ],
    "2": [
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "..."
    ],
    "3": [
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "..."
    ],
    "4": [
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "..."
    ],
    "5": [
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "...",
      "..."
    ]
  },
  "success": true
}

Deployment and Configuration

This package uses enviornment variables stored in a .env file to store secrets, example of used variables here:

/app/.env file (replace with real variables!)

# Twitter credentials
TWITTER_API_KEY="KEY_HERE"
TWITTER_API_SECRET="SECRET_HERE"

# Credentials for AWS database
DB_NAME = "database_name_here"
DB_USER = "database_login_here"
DB_PASSWORD = "database_password_here"
DB_HOST = "database_url_here"
DB_PORT = "database_port_here"

Deployment

This app is designed to be deployed using AWS Elatic Beanstalk as a docker container. Please read the commands.md file for a list of relevant commands on how to do this.

For deployment to work, the following needs to be done in addition to cloning this repo locally:

  • The .env file needs to be downloaded and added to the directory app (the same directory as main.py ). See above for the required variables in this file. Contact your team lead or previous team members for this information if you have trouble finding it.

  • The file config.yml needs to be added to the .elasticbeanstalk directory. This can be done by the command eb init -p docker So-Me-DS-API as listed in commands.md. You will need to connect to your AWS account to do this.

Additional Pipenv related files are included in the repo, but Pipenv is NOT used during deployment, and only included for development and testing purposes.

Issues

We are documenting outstanding issues on the issues page of this repo: https://github.com/Lambda-School-Labs/social-media-strategy-ds/issues

Future Ideas! (Read this, Labs 25!)

For future teams that contribute to this project, there are a number of different directions they could go. However, here are some ideas that have been floated around:

  • The newly created "analytics" page on the So-Me site is a perfect place to put any future machine learning features.

  • The front end does not currently incorporate the additional inputs for the 'scheduling' endpoint. Work with them to incorporate this feature more fully, including variables like custom stopwords, tweet age, and number of followers to scan.

  • A function already exists that builds a corpus out of posts that a twitter user's followers engage with. This corpus could be used for things other than the existing topic modeling feature.

  • Currently, the only thing being returned is the topic modeling results. However, returning things like the most common words, #hashtags, and @mentions BEFORE topic modeling might be useful information that could easily be added to the topic modeling process.

  • A number of changes could be made the architecture of the project:

    • The Twitter scanning and machine learning model building could be offloaded from the FastAPI app to a singular background worker app, running something like Celery. This app should be given a seperate Twitter API key.

    • Security features could be added to the API, preventing unauthorized users from accessing it. The same could be done for the database, which currently uses a very open security group.

    • Models could be pickled and stored on AWS S3 for future use. Currently, the results of the model in JSON format are the only thing being stored.

Issue/Bug Request

If you are having an issue with the existing project code, please submit a bug report under the following guidelines:

  • Check first to see if your issue has already been reported.
  • Check to see if the issue has recently been fixed by attempting to reproduce the issue using the latest master branch in the repository.
  • Create a live example of the problem.
  • Submit a detailed bug report including your environment & browser, steps to reproduce the issue, actual and expected outcomes, where you believe the issue is originating from, and any potential solutions you have considered.

Feature Requests

We would love to hear from you about new features which would improve this app and further the aims of our project. Please provide as much detail and information as possible to show us why you think your new feature should be implemented.

Pull Requests

If you have developed a patch, bug fix, or new feature that would improve this app, please submit a pull request. It is best to communicate your ideas with the developers first before investing a great deal of time into a pull request to ensure that it will mesh smoothly with the project.

Remember that this project is licensed under the MIT license, and by submitting a pull request, you agree that your work will be, too.

Pull Request Guidelines

  • Ensure any install or build dependencies are removed before the end of the layer when doing a build.
  • Update the README.md with details of changes to the interface, including new plist variables, exposed ports, useful file locations and container parameters.
  • Ensure that your code conforms to our existing code conventions and test coverage.
  • Include the relevant issue number, if applicable.
  • You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.

Attribution

These contribution guidelines have been adapted from this good-Contributing.md-template.

Sudoku Image Processing was developed with reference to: Sarthak Vajpayee's https://medium.com/swlh/how-to-solve-sudoku-using-artificial-intelligence-8d5d3841b872

Use your own Algorithim with AWS Sagemaker: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html AWS: Bring Your own Container: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own/container

Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda: https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

Help with Sudoku Solver Code: Peter Norvig, https://norvig.com/sudoku.html

Naked Twins Solver Technique Reference: http://hodoku.sourceforge.net/en/tech_naked.php

Documentation

See Backend Documentation for details on the backend of our project.

See Front End Documentation for details on the front end of our project.