Usage instructions for 4090?
Opened this issue · 12 comments
Trying nv-isa-solver --arch sm_89
Is sharing a disasm_cache.txt
okay? I tried deleting it but then it didn't find anything.
Hello, bootstrapping a disasm_cache.txt
is a bit tricky and might be a little broken. We use it for both caching disassembly and discovering instructions.
The way I bootstrap the one for hopper is using the populate_cache
script on a 128 core machine which took 2 hours.
You can use nv-isa-solver-scan
to ingest sass disassembly files.
Ahh, so for nv-isa-solver-scan
I need example sass files?
Don't see an easy way to run populate cache (there's no alias):
tiny@tiny19:~/build/nv_isa_solver$ python3 nv_isa_solver/populate_cache.py
Traceback (most recent call last):
File "/home/tiny/build/nv_isa_solver/nv_isa_solver/populate_cache.py", line 1, in <module>
from .disasm_utils import Disassembler, set_bit_range
ImportError: attempted relative import with no known parent package
Fixed up the imports, now running: python3 populate_cache.py --arch sm_89 --cache_file 4090_cache.txt
Wait. Can you replace the mainloop with this?
inst = []
for i in range(pow(2, 12)):
array = bytearray(b"\0" * 16)
set_bit_range(array, 0, 12, i)
inst.append(array)
for j in range(13, 8 * 13):
array_ = bytearray(array)
flip_bit(array_, j)
inst.append(array_)
array = bytearray(b"\0" * 16)
works better IIRC. Some instructions don't like read write barriers being set.
It finished as is:
tiny@tiny19:~/build/nv_isa_solver/nv_isa_solver$ python3 populate_cache.py --arch sm_89 --cache_file 4090_cache.txt
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5888/5888 [08:01<00:00, 12.23it/s]
tiny@tiny19:~/build/nv_isa_solver/nv_isa_solver$ ls -l 4090_cache.txt
-rw-rw-r-- 1 tiny tiny 14319616 Jul 8 20:40 4090_cache.txt
Pushed a fix
Kk, retrying
tiny@tiny19:~/build/nv_isa_solver$ nv-isa-solver-populate-cache --arch sm_89 --cache_file 4090_cache_2.txt
1%|█ | 40/5888 [00:03<08:45, 11.13it/s]
You might have issues with operand interaction analysis which needs cubin file creation. We currently hard code 90 for SM90a
Ok, I added a --arch_code
to the instruction_solver.py
you probably need to use --arch_code 89
but not 100% sure.
tiny@tiny19:~/build/nv_isa_solver$ ls -l 4090_cache_2.txt
-rw-rw-r-- 1 tiny tiny 14319616 Jul 8 21:11 4090_cache_2.txt
tiny@tiny19:~/build/nv_isa_solver$ nv-isa-solver --arch sm_89 --cache_file 4090_cache_2.txt
No new instruction found, exiting
tiny@tiny19:~/build/nv_isa_solver$ nv-isa-solver --arch sm_89 --cache_file 4090_cache_2.txt --arch_code 89
No new instruction found, exiting
You need to use --arch SM89 ....
not --arch sm_89
FYI, Analysing SM89 in my 128 core machine. Will get back to you in a few hours.
Thanks for your patience. This repo is still very experimental.
Human Readable ISA Spec For 4090
To reproduce
cuobjdump --dump-sass --gpu-architecture sm_89 libcublasLt.so.12.5.3.2 > libcublasLt.sass
nv-isa-solver-scan --arch SM89 --cache_file 4090_cache.txt libcublasLt.sass
nv-isa-solver-populate-cache --arch SM89 --cache_file 4090_cache.txt
nv-isa-solver-mutate --arch SM89 --cache_file 4090_cache.txt
nv-isa-solver --arch SM89 --arch_code 89 --cache_file 4090_cache.txt --num_parallel 5
nv-isa-solver-mutate --arch SM89 --cache_file 4090_cache.txt
nv-isa-solver --arch SM89 --arch_code 89 --cache_file 4090_cache.txt --num_parallel 5
I will integrate nv-isa-solver-mutate
into the main solver itself tomorrow so that you don't have to run it like this multiple times.