Project Description

Dataset Analysis: Calculated and visualized basic statistics such as average document length and average vocabulary size.
Word2Vec Embedding: Trained a Word2Vec embedding on the data and analyzed its properties.
Document Clustering: Clustered the documents and visualized the clusters to identify groups or known classes. Indexed the documents for keyword search functionality.

This repository contains the project work for the "Natural Language Processing" (NLP) course. The main objective of this project was to fine-tune a Large Language Model (LLM) into a chatbot using the OpenAssistant-Guanaco dataset.

Team Members

Tasks Overview

alessandrom10/OpenAssistant-Guanaco-NLP-Project

Project Description

Team Members

Tasks Overview