Information Retrieval Academic Projects in the Computer Science Graduate Program at University at Buffalo
Description:
This course will introduce students to text-based information retrieval (IR) techniques, i.e. search engines. The course begins with the fundamentals of processing large-scale, multilingual text document collections. Various IR models such as the Boolean model, vector space model, and probabilistic models will be studied. Efficient indexing techniques for (i) general document collections, (ii) specialized collections (e.g. Wikipedia, biomedical, patents) and (iii) high velocity data such as social media will be discussed. Techniques for improving search efficiency, improving performance as well as evaluation methodology will be covered. The latter part of the course will focus on web search including link analysis techniques such as PageRank and HITS. The use of word vectors (Word2vec, GloVe) generated through neural models and their use in IR systems will be introduced. Students will work on programming projects (implemented on the AWS cloud computing platform) to gain hands-on expertise in building IR systems. This course provides the foundation for the follow-on course (CSE 635) which discusses natural language processing (NLP) and deeper text mining solutions.
Prerequisites: Programming expertise in Python, Linear Algebra