Pinned Repositories
challenge-auditoria
Project to detect inventory impairment in any company dataset using AI. Its goal is making the work of the auditors easier and it was the winning project of the Challenge IA i Auditoria in the Col·legi de Censors Jurats de Comptes de Catalunya.
aguacate
UPC 2023 Datathon Challenge
car-conduction
Use of machine learning to make a car complete a circuit (not great, but it was for kids)
CatGPT-fork
Fork from rogerbaiges CatGPT, a Catalan-specific language model, trained from scratch with 111 million parameters. Designed for educational purposes, it provides a simple tool to explore Catalan natural language processing.
ChessNezha
cucafera
Catalan LLM with 244M parameters trained and coded from scratch, using Pytorch and Huggingface
patufet
Patufet is a set of catalan datasets, created to train and finetune LLMs in that language.
sentiment-analysis
In-depth exploration of sentiment analysis on movie reviews with two different approaches: supervised learning, using techniques like BoW; and unsupervised, using SentiWordnet and word sense disambiguation.
top-streaming-songs-modeling
This repository contains the PMAAD course project from the Artificial Intelligence Degree at Universitat Politècnica de Catalunya. It models and analyzes Spotify's top 40 weekly streamed songs (2017-2021) using R. Techniques include clustering, textual analysis, and geospatial analysis to uncover music trends and characteristics.
word-embeddings
Word embeddings are very useful representations of words that can represent semantic information. This project trains some Word2Vec embeddings, uses RoBERTa (and other embeddings) for semantic text similarity and also does text classifcation
pauhidalgoo's Repositories
pauhidalgoo/top-streaming-songs-modeling
This repository contains the PMAAD course project from the Artificial Intelligence Degree at Universitat Politècnica de Catalunya. It models and analyzes Spotify's top 40 weekly streamed songs (2017-2021) using R. Techniques include clustering, textual analysis, and geospatial analysis to uncover music trends and characteristics.
pauhidalgoo/aguacate
UPC 2023 Datathon Challenge
pauhidalgoo/sentiment-analysis
In-depth exploration of sentiment analysis on movie reviews with two different approaches: supervised learning, using techniques like BoW; and unsupervised, using SentiWordnet and word sense disambiguation.
pauhidalgoo/word-embeddings
Word embeddings are very useful representations of words that can represent semantic information. This project trains some Word2Vec embeddings, uses RoBERTa (and other embeddings) for semantic text similarity and also does text classifcation
pauhidalgoo/cucafera
Catalan LLM with 244M parameters trained and coded from scratch, using Pytorch and Huggingface
pauhidalgoo/patufet
Patufet is a set of catalan datasets, created to train and finetune LLMs in that language.
pauhidalgoo/car-conduction
Use of machine learning to make a car complete a circuit (not great, but it was for kids)
pauhidalgoo/CatGPT-fork
Fork from rogerbaiges CatGPT, a Catalan-specific language model, trained from scratch with 111 million parameters. Designed for educational purposes, it provides a simple tool to explore Catalan natural language processing.
pauhidalgoo/ChessNezha
pauhidalgoo/DijkstraMaze
pauhidalgoo/MatchNumbers
pauhidalgoo/pauhidalgoo
pauhidalgoo/pauhidalgoo.github.io
pauhidalgoo/SeleniumSnakePlayer
pauhidalgoo/TDR-MusicNN