/IDS-566

Advanced Text Analytics for Business

Primary LanguageJupyter Notebook

IDS-566: Advanced Text Analytics for Business

  • Course: Advanced Text Analytics for Business
  • Instructor: Dr. Ehsan M. Ardehaly
  • Meetings: 6:00 - 8:30 pm F, Burnham Hall 208
  • E-mail: ehsan at uic.edu
  • Office hours: 4:45 - 5:45 pm F, Business Learning Center (BLC) - L270
  • TA: Ramah Al Balawi ralbal2@uic.edu
  • TA office hours: 4:00 - 5:00 pm W, Business Learning Center (BLC) - L270

Description: This course will explore latest algorithms for text mining. Fundamentals of text analysis will also be covered, with an emphasis on the application of text mining in sentmient analysis and demographic classification. Topics include tokenization, supervised learning, word embedding, document clustering, dimensionality reduction, and topic modeling. Emphasis will be placed on the application of this technology for business.

Prerequisites: CS-572 (Data Mining or equivalent). It is recommended that you have a good working familiarity with at least one programming language (Python is recommended, while R also fine). Significant programming experience will be helpful as you can focus more on the algorithms being explored rather than the syntax of programming languages. Solid mathematics background is also required. Since this is a graduate-level course, you are supposed to know important concepts in calculus (e.g., derivative), probability (e.g., conditional probability), linear algebra (e.g., vector, matrix and inner product, and matrix operations). Good knowledge in mathematics will help you gain in-depth understanding of the methods discussed in the course and develop your own idea for new solutions.

Recommended books:

  • Fundamentals of Predictive Text Mining (2nd Edition), Sholom M. Weiss, Nitin Indurkhya, Tong Zhang, 2015
  • Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai, Springer, 2012
  • Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman

Grading:

40% - Final exam
60% - Three assignments

Percent Grade
100-90 A
89-80 B
79-70 C
69-60 D
< 60 F

Assignment policy

  • Please read university regulations: https://grad.uic.edu/university-regulations
  • All assignment you turn must be done by you alone.
  • The first violation will result in a failing grade for that assignment.
  • The second will result in a failing grade for the course.
  • Late Submission Policy: 4% per hour
  • Grade dispute: Within 7 days of the receipt of the grade

Schedule:

Date Content Slides Notebook
1/19/2018 Introduction, Scientific Python L01 N01
1/26/2018 Tokenization, TF-IDF, Sparse matrix, Sentiment Analysis-1 L02 N02
2/2/2018 Supervised learning, k-NN, Naive Bayes, Logistic Regression L03 N03
2/6/2018 Assignment 1 due date -- --
2/9/2018 No class, winter storm -- --
2/16/2018 More classification, Senitiment Analysis-2, Demographic classification L04 N04
2/23/2018 Word Embedding, Multi Layer Perceptron (MLP), Deep Learning L05 N05
2/25/2018 Assignment 2 due date -- --
3/2/2018 Latent Semantic Analysis (LSA), Document clustering L06 N06
3/9/2018 Singular Value Decomposition (SVD), K-Means, Topic Modeling (LDA) L07 N07
3/13/2018 Assignment 3 due date -- --
3/16/2018 Final exam -- --