/tachiwin_notebooks

Tachiwin notebooks for pretraining, finetuning and inference of Tachiwin models

Primary LanguageJupyter Notebook

Tachiwin: Indigenous Language Model Toolkit

Overview

Tachiwin is an open-source project developing Large Language Models (LLMs) for indigenous languages of Mexico, focusing on Tutunakú linguistic resources.

Notebooks

  • Llama 3.1 8B Instruct pretraining on indigenous language corpora
  • Domain-specific fine-tuning for translation and linguistic tasks
  • Model deployment and inference pipeline

Google Colab Notebooks

Requirements

  • Python 3.10+
  • PyTorch
  • Transformers
  • Unsloth
  • Llama 3.1 8B Instruct weights

Huggingface Collection of Datasets and Models

Android Application "Tachiwin" Snapshot

Fully functional app to demonstrate the translation capabilities offline or online

Features

  • Multilingual support (Tutunakú/Spanish/English)
  • Low-resource language model development
  • Open-source linguistic technology

License

Apache 2.0

Contributors

  • Luis J Camargo
  • Fidencio Hernández Hernández