CMPE 251 : Data Science and Social Media Analysis
Course Web Site. I suggest you to watch online course Analytics in Python
Topics Covered So Far
- Introduction
- Python and Libraries
- Machine Learning
- Data Science: Getting Data From Twitter
- Compare Basic Machine Learning Algorithms
- LinearRegression
- LogisticRegression
- Decision Tree
- Application: Introduction and test and compare different algorithms on synthethic data
- Anomaly Detection Kaggle Kernel
- 3 Sigma Rule and link
- Isolation Forest
- Application : Google Trend Data
- Intro 2 Text Mining
- Text Summarization
- Algorithm to Extract Summary
- Application : Summary of Wikipedia page
- Web Scraping
- Application : Berkay Öztürk's talk on UKitapScraper
- Fundamentals of Machine Learning From Data
- Midterm Questions
- Predict political party based on votes
- Advanced NLP
- Network Analysis
Projects
As course advances we will add more alternative projects. You must do at least one project. You can also propose a new project.
Below you can find the link for determining your project groups.
Use this link to write the name of the project, your data source and your team mates and the name of your team.
The most critical part of your project is the correctness of your traninig labeled data. If your data is not good, you will receive very low points.
1. Sentiment Analysis On EksiSozluk
Data Collection You will get data from eksisozluk with web scraper. Each student will label 500 comments on eksi. 5 label means 5 class.
- 5 Very Positive
- 4 Positive
- 3 Neutral
- 2 Negative
- 1 Very Negative
Each group MUST have different data sources. Different "gundem" topics from Eksisozluk.
Machine Learning Use ML algorithms for Sentiment Analysis On EksiSozluk. Indicate your results.
2. Fake News Detection
Data Collection You will get data from Zaytung and normal newspapers websites with web scraper.
- Zaytung news: 1
- Normal newspapers: 0
Before you might need to look for irony detection
Machine Learning Use ML algorithms for Sentiment Analysis On EksiSozluk. Indicate your results.
3. Create a New Elvis Presley Song Lyric
Data Collection Use NRC Emotion Lexicon and Kaggle song lyrics dataset
Machine Learning Use textrank algorithm to create a new song lyric from a popular singer. Then use it to create a combined lyric of various singers.
we are defining a different relation, which determines a connection between two sentences if there is a “similarity” relation between them, where “similarity” is measured as a function of their content overlap.
Task Create new songs of some artists based on 2 algorıthms
- Simple text summarization
- Text summarization based on textrank algorithm
And compare the results.
4. Compare Unsupervised Anomaly Detection Algorithms
2 datasets
- https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/OPQMVF/GIPF3O&version=1.0
- https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/OPQMVF/MTUJ5F&version=1.0
3 algorithms
- Isolation Forest
- Self-organizing maps
- Local Outlier Factor
Project Presentations
Dear students,
I updated the time slots for CMPE 251 projects. Go to the following link below, to see the updated time slots
You can also see here, updated time slots.
- Please come 10 minutes earlier than your presentation time!!
- Bring printed version of your project report with you
- Your project code&data&report&slides should be given within a CD-ROM & USB
- I will take the CD-ROM but not the USB
Group Name | Project Name | Presentation Time 21 December 2018 |
---|---|---|
SpaceX | Sentiment Classification on EksiSozluk | 09:00-09:10 |
Lord of The Electronics | Sentiment Classification on EksiSozluk | 09:10-09:20 |
Plekumatlar Back..! | Sentiment Classificatiion on EksiSozluk | 09:20-09:30 |
Meşhur Sarıyer Börekçileri | Sentiment Classification on EksiSozluk | 09:30-09:40 |
hatefuleight | Sentiment Classification on EksiSozluk | 09:40-09:45 |
Kumpir | Sentiment Classification on EksiSozluk | 09:50-10:00 |
In Zemberek We Trust | Fake News Detection | 10:00-10:10 |
Placeholder | Fake News Detection | 10:10-10:20 |
Al Gore Rhythms | Sentiment Classification on EksiSozluk | 10:20-10:30 |
the procrastinators | Anamoly Detection | 10:30-10:40 |
NoName | Fake News Detection | 10:40-10:50 |
DSML | Create a New Elvis Presley Song Lyric | 10:50-11:00 |
CilginFurkan | Eksi | 11:00-11:05 |
SarpAlkan | Sentiment Classificatiion on EksiSozluk | 11:05-11:10 |
Genesis | Sentiment Classificatiion on EksiSozluk | 11:10-11:15 |
1789 aksaray | sentiment classification on EksiSozluk | 11:15-11:25 |