New tuning results

Question

New tuning results

CNugteren opened this issue 10 years ago · 142 comments

(See the README for details)

This is the place to post new tuning results. If you compiled with -DTUNERS=ON, ran one of the tuners on your device (or all perhaps?), and feel that these results should be included in the next release of CLBlast, please post them here.

You can do this by attaching the JSON files to this issue (archived in a .ZIP file).

Answer 1 · 2016-04-08T00:17:46.000Z

Here are some tuning results from an NVIDIA Titan Black, AMD Radeon HD 7970 and an ARM Mali T-628.

Just to let you know about JSON files, GitHub says "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, PDF, or ZIP."
Archive.zip

Answer 2 · 2016-04-12T03:10:33.000Z

Thanks for the tuning results! However, they seem to be ran with non-default settings (using specific values for alpha and beta). Could you perhaps run them again with the default settings?

By the way, the latest version already includes results for Tahiti (the HD 7970) and the ARM Mali T-628, so perhaps those are superfluous.

(I've updated the post regarding JSON-files and GitHub)

Answer 3 · 2016-04-30T14:56:58.000Z

Here are the results for AMD's Pitcairn (R9 270X). I'll also upload the results for Hawaii (R9 290X), but I am getting an error during Xgemm. I'll open another issue for that.
pitcairn.zip

Answer 4 · 2016-05-01T17:33:12.000Z

Thanks! The results for Pitcairn are added to the development branch.

Answer 5 · 2016-05-01T19:01:14.000Z

Hawaii (AMD R9 290X):
hawaii.zip

Answer 6 · 2016-05-01T19:32:59.000Z

And i7 4790k:
i7-4790k.zip

Answer 7 · 2016-05-02T18:12:51.000Z

The results for Hawaii will be added. As for the i7 results: the zip archive seems to include only a Makefile?

Answer 8 · 2016-05-02T20:13:19.000Z

Sorry, I messed up that zip. As I do not have those files any more, I'll send them when I manage to do that tuning.

Answer 9 · 2016-05-31T18:34:37.000Z

nvidia-grid-k520-aws-g2.zip

See details #61

Answer 10 · 2016-06-01T07:43:39.000Z

@fonghou Thanks! The tuning results are added to the database. They are currently in the development branch but will be automatically included in the next release.

Answer 11 · 2016-06-18T12:01:14.000Z

Here are the results for the Intel i5-4210U iGPU:
Device name: 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (OpenCL 1.2 beignet 1.2 (git-1b076ec))
i5-4210U_GPU.zip

Answer 12 · 2016-06-19T13:04:13.000Z

@OursDesCavernes Added, thanks!

Answer 13 · 2016-07-01T16:24:40.000Z

GTX 670, GTX 750 (non-Ti), and GTX 1070 tunings attached. One of the GEMV tunings took ages (or hung) on the latter two, but curiously enough not on the (older) first card. Luckily, it looks like GEMV is the last one to be tuned so these are fairly complete anyway.

gtx670.tar.gz
gtx1070.tar.gz
gtx750.tar.gz

Answer 14 · 2016-07-03T18:31:42.000Z

@gcp Thanks for running all the tuners on those devices! The results are added to CLBlast, currently in the development branch but they will be automatically included in the next release. Indeed, I saw long compilation times for GEMV kernels on NVIDIA as well - it is the last one to be tuned for exactly this reason. NVIDIA promises to reduce compilation times significantly with CUDA 8.0, so hopefully that also fixes these kernels.

Answer 15 · 2016-07-05T11:19:55.000Z

Intel HD530 (desktop Skylake iGPU)
IntelHD530.zip

Answer 16 · 2016-07-10T09:50:14.000Z

@gcp Thanks, they are added.

Answer 17 · 2016-07-26T18:11:17.000Z

Issue #83 caused a complete re-write of the third GEMV kernel (XgemvFastRot), so I had to throw away the corresponding tuning results. If it's not too much effort, I welcome updated clblast_xgemv_fast_rot_*.json tuning results based on the development branch. The other GEMV tuning results are still valid and included in CLBlast. Thanks!

Answer 18 · 2016-08-23T20:18:20.000Z

Intel(R) HD Graphics 5500 BroadWell U-Processor GT2:
hd5500.zip
Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile:
hd4400.zip

Answer 19 · 2016-09-03T16:30:11.000Z

@OursDesCavernes Thanks, HD5500 is added and HD4400 is updated.

Answer 20 · 2016-10-11T18:06:53.000Z

Intel(R) HD Graphics 4000
intel-hd4000.zip

Answer 21 · 2016-10-13T10:20:55.000Z

@yingted Thanks! The tuning results for the IvyBridge GPU are added.

Answer 22 · 2016-10-22T00:19:49.000Z

Radeon R9 380 (Tonga) tuning results:
Tobago_TuningResults.zip

Answer 23 · 2016-10-22T00:27:14.000Z

Of course, the device is called Tonga, just a spelling mistake of the zip-file name.

Answer 24 · 2016-10-22T14:42:41.000Z

@MigMuc The results for Tonga are added, thanks!

Answer 25 · 2016-10-24T08:51:58.000Z

Here are the results for the GTX Titan Black. Unfortunately, I had the same problem as @gcp on the last run. But again, should be fairly complete.

gtx-titan-black.tar.gz

Answer 26 · 2016-10-24T17:56:14.000Z

@matze Thanks a lot for your contribution. The tuning results are added.

Answer 27 · 2017-01-03T00:55:15.000Z

Hi,
since I'm having problems with attaching files, here are the links for:

Amd Radeon HD6770m (Turks) https://www.dropbox.com/s/wabso93trny8fae/amd%20hd6770m%20%28turks%29.zip?dl=0

Intel Core i7-2670qm
https://www.dropbox.com/s/3as860nlbshmdvo/i7-2670qm.zip?dl=0

from my laptop.
In few days I will be able to test a MSI Nvidia GTX 970

Answer 28 · 2017-01-03T19:32:16.000Z

Thanks for the tuning data! The results are added to CLBlast, currently in the development branch but they will be automatically included in the next release.

Answer 29 · 2017-01-18T20:27:18.000Z

Tuning results for Nvidia GTX 1080
nvidia_gtx_1080.zip

Answer 30 · 2017-01-18T22:06:20.000Z

Results for i7-4790k:
i7-4790k.zip

Answer 31 · 2017-01-19T18:47:02.000Z

Thanks a lot! Both the GTX 1080 and i7 results are added.

Answer 32 · 2017-02-16T21:27:58.000Z

Tuning results for AMD RX480 (with amdgpu driver and amdgpu-pro opencl stack)
amd-rx480.zip

Answer 33 · 2017-02-18T13:01:04.000Z

@OursDesCavernes Added, thanks! Nice to see FP16 support from AMD's side as well.

Answer 34 · 2017-03-01T19:36:28.000Z

My two cents: tuning results of an AMD Radeon HD 6750M (unfortunately no support for 16 or 64 bits)

AMD Radeon HD 6750M.zip

Answer 35 · 2017-03-04T14:25:54.000Z

The HD 6750M results are added, thanks!

Answer 36 · 2017-05-11T23:22:56.000Z

See the attachment for some tuning results on AMD Radeon FuryX (using driver 1800.8).
amd-radeon-furyx.zip

Answer 37 · 2017-05-12T05:57:13.000Z

@bramveenboer The Fiji results are added, thanks!

Answer 38 · 2017-06-15T15:35:56.000Z

Hi!
Here are the results for an intel i7-920 on linux using Intel's OpenCL Driver dev-util/intel-ocl-sdk-4.4.0.117-r1

Thanks for your work!

i7-920.zip

Answer 39 · 2017-06-18T18:56:25.000Z

Thanks, the Core i7-920 tuning data is added to CLBlast!

Answer 40 · 2017-07-03T21:10:54.000Z

I have got results for the Radeon R9 390. There were already Hawaii results there, but the direct gemm kernel was missing. Overall, this improved things for me.

Hawaii.zip

(note: is it possible to get a tuning for GemmBatched? I think it makes a difference whether a single matrix is to be computed vs a set of matrices. Also i have the feeling that gemm is too aggressively tuned towards m=n=k=1024. performance drops by 30-40% on m=n=k=2048 and stays there for larger matrices)

Answer 41 · 2017-07-04T01:48:10.000Z

@Ulfgard I tuned those hawaii results on R9 290X. In my case, it would be impossible that performance drops 30%-40% for larger matrices, since I get (if memory serves me well) around 3.7 TFLOPS on 8192x8192, with the theoretical limit being 5.4 TFLOPS. If such performance drop happened, Cedric programmed gemm to run at max theoretical performance that disregards memory access, which seems imposible to me.

OTOH, maybe the drop in performance is simply because these cards are identified as the same (hawaii), but have some internal hardware difference that influences the optimal settings?

Answer 42 · 2017-07-04T05:45:20.000Z

Hi,
It is impossible to mix up the tunings, because you have to remove the old tuning to be able to add the new one in the database script. Otherwise it will fail. While I agree that the kernel itself gives okay performance according to the tuner, for some reason, the whole gemm call seems to die after exceeding some matrix size. I did some benchmarking of the whole procedure to see real world performance.

The numbers reported are the wallclock times between enqeuing several trials of the gemm routine and clFinish (disregarding the first trial for possible kernel setup, of course). Thus they are a lower bound on performance. The numbers are roughly in line with the timings reported by clGetEventProfilingInfo of the supplied event to gemm, but this does not necessarily make sense because I do not know which kernel this actually measures.

(columns are row/column major for A/B, and column C indicates whether C is row/column major. m=n=k=size. Numbers are GFlops)

size C A/B: r/r c/r r/c c/c
256 r 781.738 794.147 781.738 781.738
256 c 806.956 820.184 820.184 806.956
512 r 1005 1005 1005 440.789
512 c 1168.6 1116.67 1142.05 1142.05
1024 r 2080 2080 2080 712.329
1024 c 2363.64 2363.64 2363.64 2363.64 //this number fits quite closely with what the tuner reports
2048 r 1523.81 1523.81 1523.81 1523.81 //something here dies.
2048 c 1523.81 1523.81 1488.37 1523.81
4096 r 1523.81 1542.17 1542.17 1580.25
4096 c 1542.17 1542.17 1560.98 1560.98

Beforehand, i.e. with your tuning, the larger matrices where another 50% worse. So even if the gemm kernel is okay, maybe some of the other kernel is at fault here.

For completeness: the same results with the timings returned by clGetEventProfilingInfo for the event passed to the gemm routine (modulo possible errors because i quickly hacked this together):

size C A/B: r/r c/r r/c c/c
256 r 608.416 606.492 607.536 607.19
256 c 622.785 621.137 620.953 619.315
512 r 24963.3 25338 24816.4 25607.1//indication for that this measures the wrong thing?
512 c 813.057 811.845 813.214 812.822
1024 r 62673.7 63890.4 63529.4 63143.3//indication for that this measures the wrong thing?
1024 c 2559.24 2560.99 2557.5 2557.88
2048 r 1487.71 1492.89 1526.47 1530.32
2048 c 1488.77 1490.06 1524.83 1537.17
4096 r 1518.29 1522.16 1557.49 1567.47
4096 c 1524.63 1528.65 1563.42 1567.73

Answer 43 · 2017-07-04T07:57:17.000Z

I was also talking about wall clock time in my (Clojure on the JVM) program, not ClTune results. 8192x8192 sgemm runs in 293 milliseconds on R9 290X (5.4 TFLOPS max).

GTX 1080 (8.2 TFLOPS) runs in 220 ms, which makes the numbers pretty consistent in my case.

Answer 44 · 2017-07-05T07:03:31.000Z

@blueberry @Ulfgard I've opened issue #169 to have a more detailed discussion on the future of the tuner in CLBlast.

I'll add your tuning data soon to the database, thanks.

Answer 45 · 2017-09-07T00:13:39.000Z

Here is my ubuntu16.04 with intel cpu driver:
i7-6770hq.zip
Tuned for 1.0.1 release.
Impressive tool! Let me know if I included the wrong files.

Answer 46 · 2017-09-16T19:45:58.000Z

Thanks @theoden8. It took a bit longer than normal since I was in the middle of some database changes, but the results are now added!

Answer 47 · 2017-10-12T21:59:20.000Z

Here are the tuning results for a i5-4570 and a GTX580
GTX580.zip
i5-4570.zip

Answer 48 · 2017-10-20T16:22:25.000Z

Thanks @fzimmermann89, they are both added.

Answer 49 · 2018-04-30T01:35:36.000Z

Some more results. Note that beignet (which I used) is 10-20% slower than Intel NEO.

Intel(R) HD Graphics 6000 BroadWell U-Processor GT3.zip

Answer 50 · 2018-06-23T04:33:11.000Z

Thank you for your great work! Here are some tuning results for NVidia GeForce GTX 1070 Ti.

GeForce_GTX_1070_Ti.zip

Answer 51 · 2018-06-28T04:48:21.000Z

Here are some tuning results using POCL (1.2-pre/master) on an Intel i5-4590S. The other tuners segfaulted (#293).
i5_4590S_POCL.zip

Answer 52 · 2018-07-13T19:29:35.000Z

A little late, but I've added the HD Graphics 6000, GTX 1070 Ti, and i5-4590S results. Thanks all!

Answer 53 · 2018-08-06T13:15:21.000Z

Here are some tuning results from Intel Xeon E5-2630 v3 and v4, as well as Nvidia Tesla P100 PCI-E 16 GB.
CLBlast_tuners.zip

Answer 54 · 2018-10-10T07:04:30.000Z

Tuning results from Hikey 970 with a Mali-G72 GPU
Do not use these results because when I launch them if I use Gemm with a size greater than 8 it causes an error in the library.
Mali-G72.zip

Answer 55 · 2019-01-08T00:34:18.000Z

I tuned the CLBlast on FT-2000plus CPU (2.3Ghz@64cores) , which is an ARMv8-based many-core CPU.
tuned-FT-2000Plus-CPU.tar.gz

Answer 56 · 2019-02-09T15:40:59.000Z

Sorry I had overlooked this issue for a while. I've just added tuning results for:

Intel Xeon E5-2630 v3
Intel Xeon E5-2630 v4
NVIDIA Tesla P100

I've not added the results for the ARMv8 machine, since it shows the CPU as device '0x662' from vendor '0x70' in PoCL, perhaps that is not so meaningful. If anyone else is interested they can always take the results from here.

Thanks all for sharing!

Answer 57 · 2019-02-11T09:21:02.000Z

I ran tuning using CLBlast 1.5.0 on a NVIDIA Titan RTX (using driver 415.125): titanrtx-415.125.tar.gz

Answer 58 · 2020-10-07T15:25:21.000Z

Results for AMD Radeon RX Vega
Radeon RX Vega.zip

Answer 59 · 2020-10-10T11:04:12.000Z

Thanks for sharing the tuning results! I've just added both the RX Vega and also the Titan RTX (sorry I forgot about it) to CLBlast.

Answer 60 · 2021-08-19T04:32:18.000Z

i9-9980HK.zip
T2000.zip
T4.zip
a100.zip
v100.zip

Answer 61 · 2021-08-19T12:50:25.000Z

QuadroGV100.zip

Answer 62 · 2022-01-08T02:25:54.000Z

AMD RX 6800 XT (Navi21): amd_rx_6800_xt.tar.gz

Answer 63 · 2022-04-13T07:40:51.000Z

my latest result on RX6500XT (this is win11 22.3 driver) (performance on linux may be a bit better) and Qualcomm Adreno 540 on SD835 phone.

Got several compilation error messages on Adreno & android. the return value -6 means out of host memory, I'd look into the memory management and find some clue.

RX6500Adreno540.tar.gz

Answer 64 · 2022-11-02T16:48:09.000Z

Intel(R) FPGA Emulation Device.
Intel_FPGA_Emulation_Device.zip

Answer 65 · 2022-11-02T17:18:21.000Z

Some MacBook-Pros are equipped with an AMD Radeon Pro 450 Compute Engine
AMD_Radeon_Pro_450_Compute_Engine.zip

Answer 66 · 2023-02-15T14:58:22.000Z

Attached are tuning results from two devices I don't think have been submitted yet (please correct me if mistaken):

NVIDIA GeForce RTX 2080 Ti
NVIDIA GeForce RTX 3090

tuning-results.tar.gz

Answer 67 · 2023-05-15T06:42:54.000Z

AMD RX 5700XT tuning results:
5700XT_tuning.tar.gz

Answer 68 · 2023-05-20T10:03:07.000Z

Intel(R) UHD Graphics 770 tuning results:
Intel(R) UHD Graphics 770.zip

Answer 69 · 2023-05-20T22:32:32.000Z

AMD Radeon RX 6600 XT tuning results:
AMD Radeon RX 6600 XT.zip

Answer 70 · 2023-05-23T06:26:13.000Z

AMD Radeon RX 6700 XT tuning results:

AMD.Radeon.RX6700.XT.tar.gz

Answer 71 · 2023-05-23T14:44:59.000Z

Intel UHD 620 tuning results (the CPU is a i7-8565U) on linux using the intel opencl package.

reesults_intel_uhd620.tar.gz

Answer 72 · 2023-05-23T23:56:36.000Z

AMD Radeon 680M on linux with rocm opencl driver. (The CPU is a Ryzen 7 Pro 6850U)
results_radeon_680M.tar.gz

Answer 73 · 2023-05-25T14:39:44.000Z

@CNugteren
The ROCm thread has a reply from an AMD employee. Could you please go and answer?

Answer 74 · 2023-05-25T14:57:10.000Z

The ROCm thread has a reply from an AMD employee. Could you please go and answer?

I think you are referring to ROCm/ROCm#2161, right? I think the AMD person is just pointing you to the existence of ROCm BLAS as an alternative to CLBlast. Since ROCm didn't exist at CLBlast creation time, I do not have a clear view of its strengths/weaknesses. So I think it is up to you (or other people) to react to that thread I think, not me.

In any case, let's keep this thread for tuning results.

I'll add the recently contributed results soon, thanks everyone 👍

Answer 75 · 2023-05-29T17:05:53.000Z

AMD Radeon RX580 2048SP.zip
Tried my best to use the latest AMD driver, but the driver will fail on 4/4 cases while running clblast_tuner_xgemm.exe -precision 6464 on Windows. The rest of them are fine.

Answer 76 · 2023-05-30T00:02:22.000Z

Intel(R) Iris(R) Xe Graphics.zip

Answer 77 · 2023-05-30T00:06:31.000Z

Apple M1 16GB.zip

Answer 78 · 2023-05-30T00:46:11.000Z

NVIDIA 920A.zip

Answer 79 · 2023-05-30T01:06:23.000Z

AMD Ryzen 5700G APU.zip
clbast_tuner_xgemm.exe -precision 3232 could not produce all results due to driver freeze.

Answer 80 · 2023-05-30T01:14:13.000Z

By the way, I have reported the problem I encountered to AMD community. Professional Dipak there was always very helpful. https://community.amd.com/t5/opencl/driver-freezing-and-produce-wrong-results-while-using-clblast/m-p/609755#M40354

Answer 81 · 2023-05-30T05:52:48.000Z

AMD RX5700.zip
RX5700 (not RX5700xt) has no driver issue.

Answer 82 · 2023-05-30T12:28:41.000Z

NVIDIA RTX3080.zip
RTX 3080 Laptop.zip

Answer 83 · 2023-05-30T14:25:10.000Z

RTX4090.zip
Interestingly, not support FP16. Paid a guy a price tag of a cup of coffee to access the 4090 machine.

Answer 84 · 2023-05-31T06:13:14.000Z

3060 LAPTOP.zip
6800XT.zip

Answer 85 · 2023-05-31T10:57:58.000Z

AMD 4600G APU.zip

Answer 86 · 2023-05-31T13:15:19.000Z

AMD 6900xt.zip
Another cup of coffee.

Answer 87 · 2023-05-31T13:37:01.000Z

For any Windows user, please join this activity to make CLBlast better. For single GPU Windows users, what you need to do is to download the following bin file which is based on clblast version 1.6, and double-click the "all.bat" file. It will execute all the cases. Depending on your PC, it may take an hour or two. During this process, please do not play games or do something that heavily relies on GPUs. After all the cases, please compress all the .json files in the folder again to a zip file and rename it to your GPU name. Then please upload your GPU here. I firmly believe that clBlast will benefit humanity in terms of scientific research, medical care and even creating new jobs.
bin (2).zip

Answer 88 · 2023-05-31T18:22:24.000Z

Thanks for all your efforts! I've added the results from @CaptainSifff and @tangjinchuan in #483.

Note that CLBlast doesn't necessarily require tuning on each device: it computes sensible defaults based on other tuning data for similar devices. E.g. if tuned for a AMD Radeon 5700 and an 5900 XT it will probably get 99% of the performance on an 5800 XT as well.

Answer 89 · 2023-05-31T18:46:18.000Z

I've added the results from @CaptainSifff and @tangjinchuan in #483.

Did you not select these Radeon 6700XT on purpose? #1 (comment)

Note that CLBlast doesn't necessarily require tuning on each device: it computes sensible defaults based on other tuning data for similar devices. E.g. if tuned for a AMD Radeon 5700 and an 5900 XT it will probably get 99% of the performance on an 5800 XT as well.

I know there is already a 6600XT and 6800XT, but my 6700XT got 15-20% higher gemm performance after tuning.

Answer 90 · 2023-05-31T19:45:52.000Z

My apologies, I missed yours. So many new tuning results submitted these last weeks. I'll add it soon 👍

Answer 91 · 2023-06-03T06:54:50.000Z

radeon vii.zip
Radeon VII, although I am not sure if these .jsons are all you need...

Answer 92 · 2023-06-03T08:05:21.000Z

After sharing this information with my students in Artificial Intelligence, at the College of Computer Science and Technology, Guizhou University. I have received some new devices from some of them, as given by the device and the student names/IDs:
NVIDIA GeForce GTX 1650Ti_赵梓衡_2000210037.zip
NVIDIA GeForce RTX 2070 with Max-Q Design_郑怡宁_2000170391.zip
NVIDIA GTX 1650-郭子润-1987000213.zip
GeForce RTX 2070 Super_朱道远_2000170359.zip
NVIDIA GeForce RTX 3060 Laptop GPU_李梅燕_2000170390.zip
NVIDIA GeForce RTX 2060_吴晨阳_2000170358.zip

Answer 93 · 2023-06-03T14:30:03.000Z

RTX2060_黄前顶_2000170354.zip
nvidia1650Ti_沈军豪_2000170356.zip

Answer 94 · 2023-06-03T19:50:39.000Z

Sorry for the spam, but I couldn't help myself.

What an absolutely incredible effort and help you have been providing, @tangjinchuan ! I applaud your unprecedented output and I thank you for your amazing, positive, and altruistic contributions!

Oh, and while at it, thank you @CNugteren for making this program in the first place. 😄

Glory to both of you! 🥇 🎆

Answer 95 · 2023-06-04T07:21:12.000Z

Dear @mikkovedru ,
Thank you very much for your kind words. My students and I are very happy to contribute to the opensource community and to make this world a better place. I would like to thank @CNugteren , you, and many others to make this happen.
By the way, for anyone interested in big models, there is a project called llama.cpp (came out 19 May last month) which used clBlast to speed up the prompts and found comparable performance as cuBlas on some testing cases. It is a C++ based project, and now, thanks to clBlast, we can also have very good token performance on non-CUDA GPUs.
NVIDIA_GeForce_RTX_2080_with_Max-Q_Design_李傲_1910020001.zip
NVIDIA_GeForce_MX150_简发顺_1917000242.zip

Answer 96 · 2023-06-04T20:28:56.000Z

Imagination Technologies GPUs - PowerVR B-Series BXE-4-32
These are results are not 100% complete as clblast_tuner_xgemm -precision 32, clblast_tuner_xgemm -precision 3232 and
clblast_tuner_routine_xtrsv -precision 16 only partially ran before core dumping.
tuned.zip

Answer 97 · 2023-06-05T11:58:21.000Z

NVIDIA RTX4080.zip
RTX2070S_黄俊杰_1915000523.zip
GeForceRTX2060_唐杰嵘_2000170367.zip

Answer 98 · 2023-06-06T02:39:00.000Z

AMD Firepro W8100

I hope this will eventually make Llama.cpp faster 🚀 Note that 6464 xgemm froze midway and did not complete fully.

fireprow8100.zip

Answer 99 · 2023-06-06T05:58:47.000Z

AMD RX Vega 10 iGPU
I know there is already a tuning available for an RX Vega, but this is the integrated version, and while the tuning numbers aren't wildly different, every little bit helps. I did skip a few of the tunings because they took a very long time, sometimes up to ten minutes per iteration, but I think I got most of it.
vega10.zip

Answer 100 · 2023-06-06T11:59:54.000Z

AMD radeon（TM） Graphics_周洪江_2000170360.zip
gfx902