/cse584-ml-psu-f24

Homework assignments for CSE 584 - Machine Learing course for PSU Fall 24

Primary LanguageJupyter Notebook

CSE 584: Machine Learning - Tools and Algorithms

GitHub contributors GitHub forks GitHub stars GitHub watchers GitHub followers LinkedIn Twitter

Table of Contents

About

Homework assignments for CSE 584 - Machine Learing course for PSU Fall 24

Built With

  • LaTex

Homework 1

Active Learning Paper Reviews

Link to HW1

  1. Sandra Ebert, Mario Fritz, and Bernt Schiele. Ralf: A reinforced active learning formulation for object class recognition. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3626–3633. IEEE, 2012.
  2. Ksenia Konyushkova, Raphael Sznitman, and Pascal Fua. Learning active learning from data. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  3. William Muldrew, Peter Hayes, Mingtian Zhang, and David Barber. Active preference learning for large language models. ICML 2024.

Mid-Term Project

Machine-Generated Text (MGT) Detection

Link to Mid-term Project

Advancements in text generation technology have made it easy for malicious users to generate large volumes of human-like content without technical expertise, raising concerns about misuse, such as fake news and phishing. Detecting AI-generated text is crucial for responsible use and content moderation, leading to increased research in Machine-Generated Text (MGT) Detection. A more specific challenge is attributing MGT to the specific model that generated it, known as MGT Attribution. In this paper, we review existing datasets and methods for model attribution and introduce a new dataset to train a classifier for identifying the source LLM behind a given text. We study the effects of different datasets and varying lengths of generated texts on LLM attribution. We empirically show that it is easier for a classifier model to attribute the input text to an LLM when the LLMs have been prompted with technical questions than being prompted with only simple text completion tasks. We also show that it is easier to attribute longer texts than shorter texts. Finally, we gain some preliminary insights on the effects of parameter size and pre-training dataset on text generation.

Homework 2

Reinforcement Learning code explanation

Link to HW2

Deep Q-Learning using Double DQN to play Atari-like games in OpenAI Gym, from code by Denny Britz here.

Final Project

Link to data

Contact

Sinjoy Saha