BlackSamorez/tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference

PythonMIT

Pinned issues

Example: inference OPT-13B in kaggle with benchmarks

#18 opened 2 years ago by justheuristic

Open0

Example: Flan-T5 XXL in kaggle with load_in_8bit=True

#19 opened 2 years ago by justheuristic

Closed1

Issues

Does tensor_parallel support multi-node tensor parallel training?
#84 opened a year ago by liguodongiot
6
Compatibility with `transformers > 4.36`: error: `AttributeError: 'tuple' object has no attribute 'to_legacy_cache'`
#137 opened 2 months ago by Dr-Left
2
Customized generate func support?
#136 opened 6 months ago by MonolithFoundation
0
RuntimeError: NCCL Error 3: internal error
#121 opened a year ago by smallmocha
1
Slow inference performance for large Llama models compared to naive MP
#66 opened 2 years ago by sgsdxzy
26
tensor_parallel int4 LLM is not working since release v2.0.0
#133 opened 10 months ago by ReinForce-II
0
Now, does tensor_parallel no longer support the huggingface trainer?
#132 opened 10 months ago by HanGyeol-Yoo
0
Can I use tensor_parallel to inference for a GPTQ quantized model?
#131 opened a year ago by minlik
0
No implement of generate() when using models from hugging face.
#130 opened a year ago by 342215448
0
TensorParallel object has no attribute save_pretrained
#129 opened a year ago by toufunao
0
No output when using tensor_parallel
#128 opened a year ago by yyya9
1
How to use the model in a scenario where it is stored in the Safetenors format?
#127 opened a year ago by yxk9810
0
Out of GPU memory for two A10 GPUs
#126 opened a year ago by JunyiYe
1
AttributeError: object has no attribute 'devices'
#125 opened a year ago by QiueY514
0
ValueError: Model parameters were moved to incorrect devices, did call on model.cuda() or model.to(device)? If so, please avoid doing that
#124 opened a year ago by Khyat
0
Max Recursion Error when using with lora
#122 opened a year ago by Ar-Kareem
2
Can I parallelize just one large layer?
#83 opened a year ago by chinmayjog13
1
Segmentation fault (core dumped)
#120 opened a year ago by jameswu2014
0
Support of 8-bit and 4-bit quantization
#119 opened a year ago by ludwigflo
1
Would it suitable for the multi-GPU parallel inference for llama2?
#118 opened a year ago by aclie
0
2x slowdown using TP
#117 opened a year ago by jph00
0
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#116 opened a year ago by SparkJiao
0
distributed TP model forward output's requires_grad is False
#115 opened a year ago by lxuechen
5
why raised cuda error?
#95 opened a year ago by YooSungHyun
18
tensor_parallel method distributed=True
#114 opened a year ago by Johnno1011
2
model.generate() with inputs_embeds
#112 opened a year ago by ZhaoxuanWu
3
Error loading LLAMA model config
#107 opened a year ago by tonywang16
0
Issues if GPU > 2
#98 opened a year ago by Tom-Ryder
6
GPT-2 broken starting in v1.2.5
#99 opened a year ago by eric-mitchell
1
When I try to do the tensor_parallel on NLLB from meta, there is an error:
#104 opened a year ago by 342215448
1
When I try to do the tensor_parallel on NLLB from meta, there is an error
#105 opened a year ago by 342215448
0
Cloud Tensor_parallel add multiple accelerator inference support with torch.distributed?
#97 opened a year ago by hijeffwu
4
Example Question (got error) : Try new 40B LLMs demo in Kaggle
#96 opened a year ago by YooSungHyun
2
Possibility to run on different GPUs
#94 opened a year ago by Ch4mpa9ne
2
Support for PEFT LoRA and 4-bit quantization
#80 opened a year ago by morecry
6
Request to fix the content about parallelformers in README.
#81 opened a year ago by hyunwoongko
1
Question on custom models
#88 opened a year ago by vince62s
23
Not work with 4bit quant
#79 opened a year ago by laoda513
6
TypeError when multi-thread inference using tensor_parallel
#89 opened a year ago by liulhdarks
1
Does tensor_parallel support the model inference concurrently or in multi-threads?
#86 opened a year ago by zoubaihan
2
Does tensor_parallel support data parallel and tensor parallel hybrid training?
#85 opened a year ago by liguodongiot
0
Error in README.Md, hence not able to load model with limited memory.
#77 opened a year ago by vishakudupa
5
Torch version requirement
#76 opened a year ago by treya-lin
4
Great work！ and can this work with deepspeedzero?
#75 opened a year ago by laoda513
0
Huggingface Accelerate
#74 opened a year ago by conceptofmind
1
What is the difference between this project and autotp of deepspeed?
#72 opened a year ago by frankxyy
1
cuda memory not evenly distributed between devices
#71 opened 2 years ago by frankxyy
6
set distributed=True, return AttributeError: 'NoneType' object
#69 opened 2 years ago by hijeffwu
2
How to load lora weights？
#67 opened 2 years ago by Vincent131499
13
Support LLaMA Models, including HuggingFace-adapted variants
#51 opened 2 years ago by dustydecapod
7