- Course: Advanced Text Analytics for Business
- Instructor: Dr. Ehsan M. Ardehaly
- Meetings: 6:00 - 8:30 pm F, Burnham Hall 208
- E-mail: ehsan at uic.edu
- Office hours: 4:45 - 5:45 pm F, Business Learning Center (BLC) - L270
- TA: Ramah Al Balawi ralbal2@uic.edu
- TA office hours: 4:00 - 5:00 pm W, Business Learning Center (BLC) - L270
Description: This course will explore latest algorithms for text mining. Fundamentals of text analysis will also be covered, with an emphasis on the application of text mining in sentmient analysis and demographic classification. Topics include tokenization, supervised learning, word embedding, document clustering, dimensionality reduction, and topic modeling. Emphasis will be placed on the application of this technology for business.
Prerequisites: CS-572 (Data Mining or equivalent). It is recommended that you have a good working familiarity with at least one programming language (Python is recommended, while R also fine). Significant programming experience will be helpful as you can focus more on the algorithms being explored rather than the syntax of programming languages. Solid mathematics background is also required. Since this is a graduate-level course, you are supposed to know important concepts in calculus (e.g., derivative), probability (e.g., conditional probability), linear algebra (e.g., vector, matrix and inner product, and matrix operations). Good knowledge in mathematics will help you gain in-depth understanding of the methods discussed in the course and develop your own idea for new solutions.
Recommended books:
- Fundamentals of Predictive Text Mining (2nd Edition), Sholom M. Weiss, Nitin Indurkhya, Tong Zhang, 2015
- Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai, Springer, 2012
- Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman
Grading:
40% - Final exam
60% - Three assignments
Percent | Grade |
---|---|
100-90 | A |
89-80 | B |
79-70 | C |
69-60 | D |
< 60 | F |
Assignment policy
- Please read university regulations: https://grad.uic.edu/university-regulations
- All assignment you turn must be done by you alone.
- The first violation will result in a failing grade for that assignment.
- The second will result in a failing grade for the course.
- Late Submission Policy: 4% per hour
- Grade dispute: Within 7 days of the receipt of the grade
Schedule:
Date | Content | Slides | Notebook |
---|---|---|---|
1/19/2018 | Introduction, Scientific Python | L01 | N01 |
1/26/2018 | Tokenization, TF-IDF, Sparse matrix, Sentiment Analysis-1 | L02 | N02 |
2/2/2018 | Supervised learning, k-NN, Naive Bayes, Logistic Regression | L03 | N03 |
2/6/2018 | Assignment 1 due date | -- | -- |
2/9/2018 | No class, winter storm | -- | -- |
2/16/2018 | More classification, Senitiment Analysis-2, Demographic classification | L04 | N04 |
2/23/2018 | Word Embedding, Multi Layer Perceptron (MLP), Deep Learning | L05 | N05 |
2/25/2018 | Assignment 2 due date | -- | -- |
3/2/2018 | Latent Semantic Analysis (LSA), Document clustering | L06 | N06 |
3/9/2018 | Singular Value Decomposition (SVD), K-Means, Topic Modeling (LDA) | L07 | N07 |
3/13/2018 | Assignment 3 due date | -- | -- |
3/16/2018 | Final exam | -- | -- |