karpathy/llama2.c

Inference Llama 2 in one file of pure C

CMIT

Issues

numpy llama2 for fun and learning
#450 opened a year ago
0
ld: warning: ignoring duplicate libraries: '-lgcc'
#449 opened a year ago
0
-O3 does not apply auto-vectorization on X86-64 CPU
#448 opened a year ago
1
How to train a chat model
#447 opened a year ago
0
Prefill Processing
#445 opened a year ago
2
Significant Quality Degradation with q8 Quantization in Small Models
#443 opened a year ago
4
Llama transformer walkthrough
#442 opened a year ago
0
Plans C (compromised iOS devices, apologies not me)
#438 opened 8 months ago
0
How to run interface on GPU
#437 opened a year ago
0
can the custom model in llama2.c format be exported to hf format?
#434 opened a year ago
1
runomp on Mac M1 Max is slower than runfast
#432 opened a year ago
10
Questions about converting models and tokenizers downloaded from huggingface
#431 opened a year ago
0
tok512.model adding a token at start.
#430 opened a year ago
0
About runq.c
#428 opened a year ago
0
I added bidirectional attention, and those who need it can study it.
#427 opened a year ago
5
"Can I increase or decrease the size of individual model layers separately?"
#426 opened a year ago
2
Is this project still active?
#425 opened a year ago
7
Question: Sliding window attention
#424 opened a year ago
3
Questions about the matmul function in run.c
#423 opened a year ago
0
I found that the dim parameter affects the learning loss and n_layers affects the training speed.
#422 opened a year ago
0
How to convert to huggingface model format?
#421 opened a year ago
1
The trained model will not be saved.
#419 opened a year ago
0
Q: How to finetune?
#418 opened a year ago
2
Please double check formula for hidden dimension
#416 opened a year ago
0
Export tokenizers to huggingface (eg: Tinystories260K)
#411 opened a year ago
0
Is it possible to increase or decrease the size of only some of the layers of the model structure?
#406 opened a year ago
0
How to save checkpoints at each step?
#405 opened a year ago
1
-
#404 opened a year ago
0
How to convert the huggingface model with GQA to bin?
#403 opened a year ago
4
How does this part of the Train code work?
#401 opened a year ago
1
Mojo version?
#396 opened a year ago
2
What is a good pretrain dataset for llama2c?
#393 opened a year ago
3
Evolution of tinystories. Open sourced.
#392 opened a year ago
3
[Feature Request] Support InternLM Deploy
#390 opened a year ago
0
Is it possible to adapt this code from DDP to FSDP? If yes, what are the potential issues to look out for?
#385 opened a year ago
0
Pure JavaScript port of llama2.c
#384 opened a year ago
0
llama2_7b_chat have no any response
#382 opened a year ago
1
Incorrect parameter counts for 15M, 42M, 110M models?
#378 opened a year ago
3
Optimized code for matmul() works 3.5 faster (for Mac M1 Max with ARM NEON) ... and even more...
#377 opened a year ago
4
Interpretability of models
#375 opened a year ago
4
Trained and LoRA fine-tuned the models to follow instructions for writing tiny stories
#373 opened a year ago
8
HF candle
#371 opened a year ago
0
llama2.c text generation Inference server in c
#369 opened a year ago
0
run export.py with chat-llama-2-7b-chat-hf, then memory is over.
#368 opened a year ago
1
how to generate word embeddings when doing custom tokenizers
#366 opened a year ago
0
Chat functionality requires big 7B model
#357 opened a year ago
5
Code Llama rope_theta parameter
#356 opened a year ago
2
260K Model Parameter count not right?
#354 opened a year ago
1
How should I calculate the parameter count of a model?
#349 opened a year ago
1
why not use key and value caches in model.py?
#346 opened a year ago
2