Pinned Repositories
ann-benchmarks
Benchmarks of approximate nearest neighbor libraries in Python
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
flash-attention
Fast and memory-efficient exact attention
ha-mo-ref
TensorFlow and PyTorch Reference models for Gaudi(R)
Habana-LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Intel_Gaudi3_Software
Intel® Gaudi® Software is an implementation of the runtime and graph compiler for Gaudi3
kaggle-2014-criteo
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
billishyahao's Repositories
billishyahao/ann-benchmarks
Benchmarks of approximate nearest neighbor libraries in Python
billishyahao/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billishyahao/flash-attention
Fast and memory-efficient exact attention
billishyahao/ha-mo-ref
TensorFlow and PyTorch Reference models for Gaudi(R)
billishyahao/Habana-LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
billishyahao/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
billishyahao/Intel_Gaudi3_Software
Intel® Gaudi® Software is an implementation of the runtime and graph compiler for Gaudi3
billishyahao/kaggle-2014-criteo
billishyahao/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
billishyahao/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
billishyahao/keras-mmoe
A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)
billishyahao/llm.c
LLM training in simple, raw C/CUDA
billishyahao/long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
billishyahao/mmclassification
OpenMMLab Image Classification Toolbox and Benchmark
billishyahao/models
Model Zoo for Intel® Architecture: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors
billishyahao/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
billishyahao/optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
billishyahao/PaddleSpeech
Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.
billishyahao/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
billishyahao/rocm-Megatron-LM
Ongoing research training transformer models at scale
billishyahao/tensorflow
An Open Source Machine Learning Framework for Everyone
billishyahao/tvm.schedule
examples for tvm schedule API
billishyahao/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
billishyahao/yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
billishyahao/YOLOv6
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
billishyahao/yolov8
YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite