LLMS

Introduction

This guide is tailored for users with Apple Silicon devices interested in experimenting with Large Language Models (LLMs). It includes summaries of key repositories and resources, focusing on those optimized for Apple Silicon hardware. The guide covers a variety of topics, including Retrieval-Augmented Generation (RAG), training and fine-tuning LLMs, vision-language models (VLM), datasets, and user interfaces (UI). Each section provides links to repositories and tools, helping users explore and implement LLMs effectively on their Apple Silicon devices.

Retrieval-Augmented Generation (RAG) Repositories:

ChatLLM.cpp

Repository: chatllm.cpp
Summary: This repository includes documentation and code for implementing Retrieval-Augmented Generation (RAG) models using LLMs, providing guidelines and examples for integrating retrieval mechanisms with language models.

ModelScope RAG

Repository: ModelScope RAG
Summary: This Jupyter notebook provides a tutorial on implementing RAG with reranking using the Llamaindex library. It showcases how to enhance retrieval-based generation with ranking mechanisms for improved performance.

Quad-AI RAG

Repository: Quad-AI RAG
Summary: This notebook demonstrates the implementation of RAG using llama.cpp for effective information retrieval and response generation. It includes step-by-step instructions and code snippets.

MLX RAG GGUF

Repository: MLX RAG GGUF
Summary: This repository contains code and documentation for implementing RAG with GGUF embeddings using the MLX framework. It focuses on integrating various embedding techniques for enhanced retrieval.

FlagOpen Embedding

Repository: FlagOpen Embedding
Summary: The repository provides tools and methods for embedding generation and integration in retrieval-augmented generation systems. It includes various models and techniques for embedding tasks.

LM Cocktail

Repository: LM Cocktail
Summary: This project focuses on combining different embedding techniques and models to create a robust retrieval system for augmented generation tasks. It includes various configurations and examples.

OSX Apple Silicon:

OSX Apple Silicon Gist

Repository: OSX Apple Silicon Gist
Summary: A gist providing detailed steps and configurations for running MLX models on Apple Silicon devices, including setup instructions and performance tips.

Hugging Face MLX Models

Repository: Hugging Face MLX Models
Summary: A collection of models optimized for MLX, available on Hugging Face. This repository provides a variety of models suitable for different ML tasks, optimized for performance and efficiency.

MLX Examples

Repository: MLX Examples
Summary: This repository contains example implementations and use cases for MLX models, including step-by-step guides and code snippets for various ML tasks.

Phi-3 Vision MLX

Repository: Phi-3 Vision MLX
Summary: A project showcasing the integration of vision models with the MLX framework, providing examples and implementation details for vision-related tasks.

LILM

Repository: LILM
Summary: This repository focuses on implementing and optimizing language models for low-resource settings, utilizing the MLX framework for enhanced performance on Apple Silicon devices.

ML Stable Diffusion

Repository: ML Stable Diffusion
Summary: A project dedicated to implementing and optimizing stable diffusion models for Apple Silicon, addressing performance issues and providing solutions for efficient execution.

MLX LORA Fine-tuning

Repository: MLX LORA Fine-tuning
Summary: Documentation and code for fine-tuning models using LORA within the MLX framework, providing step-by-step instructions and examples.

MLX Discussions

Repository: MLX Discussions
Summary: A forum for discussing various aspects of the MLX framework, including usage tips, troubleshooting, and community contributions.

NanoGPT MLX

Repository: NanoGPT MLX
Summary: A minimal implementation of GPT models optimized for the MLX framework, providing code and documentation for quick setup and execution.

LM Evaluation Harness

Repository: LM Evaluation Harness
Summary: Tools and benchmarks for evaluating language models, including standardized tests and performance metrics for various tasks.

PikaGPT

Repository: PikaGPT
Summary: A lightweight implementation of GPT models optimized for efficiency and performance on Apple Silicon devices, providing examples and setup instructions.

Local LLM Training Apple Silicon

Repository: Local LLM Training Apple Silicon
Summary: A project focusing on training local LLMs on Apple Silicon devices, providing code and documentation for setup and execution.

MLX Notes

Repository: MLX Notes
Summary: A collection of notes and guides for using the MLX framework, covering various aspects from setup to advanced usage.

LORA Gist

Repository: LORA Gist
Summary: A gist providing detailed instructions for fine-tuning models using LORA, including code snippets and setup tips.

Llama3 Mac Silicon Example

Repository: Llama3 Mac Silicon Example
Summary: An example notebook demonstrating the use of Llama3 models on Mac Silicon, including setup, execution, and performance tips.

Deep Dive into AI with MLX

Repository: Deep Dive into AI with MLX
Summary: A comprehensive guide and tutorial for diving deep into AI using the MLX framework, covering various models and use cases.

Vision-Language Models (VLM):

MLX Llava Finetuning

Repository: MLX Llava Finetuning
Summary: A repository focused on fine-tuning vision-language models using the MLX framework, providing examples and detailed instructions.

Bunny

Repository: Bunny
Summary: A project dedicated to developing and optimizing vision-language models, providing code, documentation, and examples for various tasks.

LoRA/Training/Fine-Tuning:

InternLM Ecosystem

Repository: InternLM Ecosystem
Summary: Documentation and code for integrating and using InternLM models within various ecosystems, focusing on training and fine-tuning.

InternLM Fine-tuning

Repository: InternLM Fine-tuning
Summary: A repository providing detailed instructions and code for fine-tuning InternLM models, including setup and execution guidelines.

ModelScope Swift

Repository: ModelScope Swift
Summary: A project dedicated to optimizing and deploying models using the Swift framework, providing examples and documentation for various tasks.

LitGPT

Repository: LitGPT
Summary: A repository focused on implementing and optimizing GPT models using the Lightning framework, providing examples and detailed instructions for training and fine-tuning.

Datasets:

The Cauldron

Repository: The Cauldron
Summary: A dataset repository on Hugging Face providing a collection of datasets for training and evaluating machine learning models, including detailed descriptions and usage guidelines.

GUI:

LibreChat

Repository: LibreChat
Summary: LibreChat is an open-source project for building and managing chat interfaces for language models, providing customizable features and integration options for various platforms.

Docker etc

brew install llama.cpp
llama-cli --hf-repo reach-vb/Meta-Llama-3.1-8B-Instruct-Q6_K-GGUF --hf-file meta-llama-3.1-8b-instruct-q6_k.gguf -p "Sup?" --ctx-size 8192

The model uses ~7.0GB RAM.

eli-osherovich/LLMS-OSX