[Bug]: Fails to start with error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Nero10578 opened this issue · 2 comments
Your current environment
Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (conda-forge gcc 11.3.0-19) 11.3.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: glibc-2.35
Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA TITAN X (Pascal)
GPU 1: NVIDIA TITAN X (Pascal)
GPU 2: NVIDIA TITAN X (Pascal)
GPU 3: NVIDIA TITAN X (Pascal)
Nvidia driver version: 555.85
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 4
BogoMIPS: 7391.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 6 MiB (6 instances)
L3 cache: 8.3 MiB (1 instance)
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0
[pip3] triton==2.3.0
[conda] blas 2.16 mkl conda-forge
[conda] libblas 3.8.0 16_mkl conda-forge
[conda] libcblas 3.8.0 16_mkl conda-forge
[conda] liblapack 3.8.0 16_mkl conda-forge
[conda] liblapacke 3.8.0 16_mkl conda-forge
[conda] mkl 2020.2 256
[conda] numpy 1.26.4 pypi_0 pypi
[conda] pytorch 2.3.0 py3.11_cuda12.1_cudnn8.9.2_0 pytorch
[conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchtriton 2.3.0 py311 pytorchROCM Version: Could not collect
Aphrodite Version: 0.5.3
Aphrodite Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
🐛 Describe the bug
This is running on a 4x Titan X Pascal machine running Windows 10 and inside WSL2 running Ubuntu 22.04 LTS. A similar setup works fine, but that was only 2 GPUs, is 4x GPUs the cause of this issue?
When I try to run aphrodite it fails at this error:
INFO: Extracting config from GGUF...
WARNING: gguf quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO: Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. But it may cause slight accuracy drop without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for common inference criteria.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/awan/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in <module>
run_server(args)
File "/home/awan/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
engine = AsyncAphrodite.from_engine_args(engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awan/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 349, in from_engine_args
initialize_ray_cluster(engine_config.parallel_config)
File "/home/awan/aphrodite-engine/aphrodite/engine/ray_tools.py", line 100, in initialize_ray_cluster
ray.init(address=ray_address, ignore_reinit_error=True)
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/worker.py", line 1642, in init
_global_node = ray._private.node.Node(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/node.py", line 336, in __init__
self.start_ray_processes()
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/node.py", line 1396, in start_ray_processes
resource_spec = self.get_resource_spec()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/node.py", line 580, in get_resource_spec
).resolve(is_head=self.head, node_ip_address=self.node_ip_address)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/resource_spec.py", line 215, in resolve
accelerator_manager.get_current_node_accelerator_type()
File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/accelerators/nvidia_gpu.py", line 71, in get_current_node_accelerator_type
device_name = device_name.decode("utf-8")
^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Seems like a ray issue. Can you run ray start --head
? What does that output?
Seems like a ray issue. Can you run
ray start --head
? What does that output?
I pulled my hair out trying to figure this out. It turns out my Gigabyte MW51-HP0 motherboard is the one causing issues. The bottom 4 PCIe slots comes from a PLX chip and when I try to access two GPUs attached to said PLX chip the commands gets lost in the PLX chip and it hangs.
I even tried just installing straight Ubuntu 22.04 LTS and it instead just causes ray to hang instead of showing that error. If I just use one GPU aphrodite can start with any of the 4 GPUs. But as soon as I simultaneously use two of the GPUs from the PLX chip it craps out.
Time to get a different motherboard I guess...