PygmalionAI/aphrodite-engine

[Bug]: Fails to start with error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Nero10578 opened this issue · 2 comments

Your current environment

Collecting environment information...
PyTorch version: 2.3.0
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (conda-forge gcc 11.3.0-19) 11.3.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: glibc-2.35
Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA TITAN X (Pascal)
GPU 1: NVIDIA TITAN X (Pascal)
GPU 2: NVIDIA TITAN X (Pascal)
GPU 3: NVIDIA TITAN X (Pascal)

Nvidia driver version: 555.85
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             12
On-line CPU(s) list:                0-11
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz
CPU family:                         6
Model:                              85
Thread(s) per core:                 2
Core(s) per socket:                 6
Socket(s):                          1
Stepping:                           4
BogoMIPS:                           7391.99
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves md_clear flush_l1d arch_capabilities
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          192 KiB (6 instances)
L1i cache:                          192 KiB (6 instances)
L2 cache:                           6 MiB (6 instances)
L3 cache:                           8.3 MiB (1 instance)
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        KVM: Mitigation: VMX unsupported
Vulnerability L1tf:                 Mitigation; PTE Inversion
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT Host state unknown
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0
[pip3] triton==2.3.0
[conda] blas                      2.16                        mkl    conda-forge
[conda] libblas                   3.8.0                    16_mkl    conda-forge
[conda] libcblas                  3.8.0                    16_mkl    conda-forge
[conda] liblapack                 3.8.0                    16_mkl    conda-forge
[conda] liblapacke                3.8.0                    16_mkl    conda-forge
[conda] mkl                       2020.2                      256
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] pytorch                   2.3.0           py3.11_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchtriton               2.3.0                     py311    pytorchROCM Version: Could not collect
Aphrodite Version: 0.5.3
Aphrodite Build Flags:
CUDA Archs: Not Set; ROCm: Disabled

🐛 Describe the bug

This is running on a 4x Titan X Pascal machine running Windows 10 and inside WSL2 running Ubuntu 22.04 LTS. A similar setup works fine, but that was only 2 GPUs, is 4x GPUs the cause of this issue?
When I try to run aphrodite it fails at this error:

INFO:     Extracting config from GGUF...
WARNING:  gguf quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO:     Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. But it may cause slight accuracy drop without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for common inference criteria.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/awan/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in <module>
    run_server(args)
  File "/home/awan/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
    engine = AsyncAphrodite.from_engine_args(engine_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awan/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 349, in from_engine_args
    initialize_ray_cluster(engine_config.parallel_config)
  File "/home/awan/aphrodite-engine/aphrodite/engine/ray_tools.py", line 100, in initialize_ray_cluster
    ray.init(address=ray_address, ignore_reinit_error=True)
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/worker.py", line 1642, in init
    _global_node = ray._private.node.Node(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/node.py", line 336, in __init__
    self.start_ray_processes()
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/node.py", line 1396, in start_ray_processes
    resource_spec = self.get_resource_spec()
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/node.py", line 580, in get_resource_spec
    ).resolve(is_head=self.head, node_ip_address=self.node_ip_address)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/resource_spec.py", line 215, in resolve
    accelerator_manager.get_current_node_accelerator_type()
  File "/home/awan/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/ray/_private/accelerators/nvidia_gpu.py", line 71, in get_current_node_accelerator_type
    device_name = device_name.decode("utf-8")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Seems like a ray issue. Can you run ray start --head? What does that output?

Seems like a ray issue. Can you run ray start --head? What does that output?

I pulled my hair out trying to figure this out. It turns out my Gigabyte MW51-HP0 motherboard is the one causing issues. The bottom 4 PCIe slots comes from a PLX chip and when I try to access two GPUs attached to said PLX chip the commands gets lost in the PLX chip and it hangs.

I even tried just installing straight Ubuntu 22.04 LTS and it instead just causes ray to hang instead of showing that error. If I just use one GPU aphrodite can start with any of the 4 GPUs. But as soon as I simultaneously use two of the GPUs from the PLX chip it craps out.

Time to get a different motherboard I guess...