Curated List of Practical Deep Learning, Machine learning project Ideas
- 25+ ideas
- Relevant to academia and industry both
- Range from beginner friendly to research projects
Problems are motivated by the ones shared at:
- Text - including natural language under this section
- Forecasting - mostly Time Series and similar forecasting challenges
- Recommender Systems
- Vision - includes image and video processing
- Music and Audio - combine ideas from language, and audio to understand music
Autonomous Tagging of Stack Overflow Questions
- Make a multi-label classification system that automatically assigns tags for questions posted on a forum such as Stackoverflow or Quora
- Dataset: StackLite or 10% sample
Keyword/Concept Identification
- Identify keywords from millions of text questions
- Dataset: StackOverflow Questions Sample by Facebook
Topics Identification
- Multi-label classification of printed media articles to topics
- Dataset: Greek Media Monitoring Multilabel Classification
Automated Essay Grading
- The purpose of this project is to implement and train machine learning algorithms to automatically assess and grade essay responses.
- Dataset: Essays with human graded scores
Sentence to Sentence Semantic Similarity
- Can you identify question pairs that have the same intent? Or sentences that have the same meaning?
- Dataset: Quora Question Pairs with similar questions marked
Fight online abuse
- Can you confidently and accurately tell via a particular is abusive?
- Dataset: Toxic Comments on Kaggle
Open Domain Question Answering
- Can you build a bot which answers questions according to the student's age or her curriculum?
- Facebook's FAIR built similar for Wikipedia
- Dataset: NCERT books for K-12/school students in India, NarrativeQA by Google DeepMind and SQuAD by Stanford
Automatic Text Summarization
- Can you create a summary with the major points of the original document?
- Abstractive (write your own summary) and Extractive (select pieces of text from original) are two popular approaches
- Dataset: CNN and DailyMail News Pieces by Google Deepmind
Copy-cat Bot
- Generate plausible new text which looks like some other text
- Obama Speeches? For instance, you can create a bot which writes new speeches in Obama's style
- Trump Bot? Or a Twitter bot which mimics @realDonaldTrump
- Narendra Modi bot saying doston? Start by scrapping off his Hindi speeches from his personal website
- Example Dataset: English Transcript of Modi speeches
Check mlm/blog for hints
Sentiment Analysis
- Do Twitter Sentiment Analysis on tweets sorted by geography and timestamp
- Dataset: Tweets sentiment tagged by humans
De-anonymization
- Can you classify the text of an e-mail message to decide who sent it?
- Dataset: 150,000 Enron emails
Univariate Time Series Forecasting
- How much will it rain this year?
- Dataset: 45 years of rainfall data
Multi-variate Time Series Forecasting
- How polluted will your town air be? Pollution Level Forecasting
- Dataset: Air Quality dataset
Demand/load forecasting
- Find a short term forecast on electricity consumption of a single home
- Dataset: Electricity Consumption of a household
Predict Blood Donation
- We're interested in predicting if a blood donor will donate within a given time window
- More on the problem statement at Driven Data
- Dataset: UCI ML Datasets Repo
Movie Recommender
- Can you predict the rating a user will give on a movie?
- Do this using the movies that user has rated in the past, as well as the ratings similar users have given similar movies.
- Dataset: Netflix Prize and MovieLens Datasets
Search + Recommendation System
- Predict which Xbox game a visitor will be most interested in based on their search query
- Dataset: BestBuy
Can you predict Influencers in the Social Network?
- How can you predict social influencers?
- Dataset: PeerIndex
Image Classification
- Object recognition or image classification task is how Deep Learning shot up to it's present-day resurgence
- Datasets:
- CIFAR-10
- ImageNet
- MS COCO is the modern replacement to the ImageNet challenge
- MNIST Handwritten Digit Classification Challenge is the classic entry point
- Character recognition (digits) is the good old Optical Character Recognition problem
- Bird Species Identification from an Image using the Caltech-UCSD Birds dataset dataset
- Diagnosing and Segmenting Brain Tumors and Phenotypes using MRI Scans
- Dataset: MICCAI Machine Learning Challenge aka MLC 2014
- Identify endangered right whales in aerial photographs
- Dataset: MOAA Right Whale
- Can computer vision spot distracted drivers?
- Dataset: State Farm Distracted Driver Detection on Kaggle
Bone XRay Competition
- Can you identify if the hand is broken from a X-ray radiographs automatically with better than human performance?
- Stanford's Bone XRay Deep Learning Compeition with MURA Dataset
Image Captioning
- Can you caption/explain the photo a way human would?
- Dataset: MS COCO
Image Segmentation/Object Detection
- Can you extract an object of interest from an image?
- Dataset: MS COCO, Carvana Image Masking Challenge on Kaggle
Large-Scale Video Understanding
- Can you produce the best video tag predictions?
- Dataset: Youtube 8M
Video Summarization
- Can you select the semantically relevant/important parts from the video?
- Example: Fast-Forward Video Based on Semantic Extraction
- Dataset: Unaware of any standard dataset or agreed upon metrics, Youtube 8M might be good starting point
Style Transfer
- Can you recompose images in the style of other images?
- Dataset: fzliu on Github shared target and source images with results
Face Recognition
- Can you identify whose photo is this? Similar to Facebook's photo tagging or Apple's FaceId
- Dataset: face-rec.org, or facedetection.com
Clinical Diagnostics: Image Idenitification, classification & segmentation
- Can you help build an open source software for lung cancer detection to help radiologists?
- Link: Concept to Clinic challenge on DrivenData
Satellite Imagery Processing for Socioeconomic Analysis
- Can you estimate the standard of living or energy consumption of a place from night time satellite imagery?
- Reference for Project details: Stanford Poverty Estimation Project
Satellite Imagery Processing for Automated Tagging
- Can you automatically tag satellite images with human features such as buildings, roads, waterways and so on?
- Help free the manual effort in tagging satellite imagery: Kaggle Dataset by DSTL, UK
Music/Audio Recommendation Systems
- Can you tell if two songs are similar using their sound or lyrics?
- Dataset: Million Songs Dataset and it's 1% sample. Example: Anusha et al
Can I use the ideas here for my thesis? Yeah, totally. I'd love to know how it went.
Do you want to share my solution/code to a problem here? Yeah, sure - why not? Go to Github issues in the repository and let me know there.
How can I add my ideas here? Just send a pull request and we'll discuss?
Hey @NirantK, something is wrong here! Yikes, I am sorry. Please tell me by raising a Github issue. I'll try to fix it as soon as possible.
Nirant Kasliwal compiled the ideas in this repository Find him on Twitter or Linkedin