The running speed on 3090 is much slower than 2060
LiUzHiAn opened this issue · 19 comments
Hi
I install the tsnecuda
in two machines, the first is RTX 2060 GPU, cuda10.0, python 3.6
, the second is RTX 3090 GPU, cuda11.0, python 3.6
. I tested the performance via tsnecuda.test()
while the result exceeded my expectations.
BTW, I build the tsnecuda
from source on the dev
branch, just as I posted in #95. I also build the faiss-1.6.5
mannually.
The speed of 3090 is even much slower (~20x) than that of 2060, here is the output:
In my 2060 GPU:
>>> tsnecuda.test()
Initializing cuda handles... done.
KNN Computation... done.
Computing Pij matrix... done.
Initializing low dim points... done.
Initializing CUDA memory... done.
[Step 0] Avg. Gradient Norm: 5.63507e-05
[Step 10] Avg. Gradient Norm: 2.63374e-06
[Step 20] Avg. Gradient Norm: 1.35403e-07
[Step 30] Avg. Gradient Norm: 6.75821e-09
[Step 40] Avg. Gradient Norm: 3.32646e-10
[Step 50] Avg. Gradient Norm: 1.76006e-11
[Step 60] Avg. Gradient Norm: 3.43814e-11
[Step 70] Avg. Gradient Norm: 9.26503e-12
[Step 80] Avg. Gradient Norm: 1.28685e-11
[Step 90] Avg. Gradient Norm: 2.21792e-10
[Step 100] Avg. Gradient Norm: 8.70341e-12
[Step 110] Avg. Gradient Norm: 2.06272e-10
[Step 120] Avg. Gradient Norm: 2.0891e-11
[Step 130] Avg. Gradient Norm: 1.84418e-11
[Step 140] Avg. Gradient Norm: 3.2101e-11
[Step 150] Avg. Gradient Norm: 2.1261e-11
[Step 160] Avg. Gradient Norm: 2.1119e-11
[Step 170] Avg. Gradient Norm: 2.31917e-11
[Step 180] Avg. Gradient Norm: 3.68962e-11
[Step 190] Avg. Gradient Norm: 1.20611e-11
[Step 200] Avg. Gradient Norm: 1.93969e-11
[Step 210] Avg. Gradient Norm: 2.00171e-10
[Step 220] Avg. Gradient Norm: 4.73396e-11
[Step 230] Avg. Gradient Norm: 7.42355e-12
[Step 240] Avg. Gradient Norm: 5.25284e-11
[Step 250] Avg. Gradient Norm: 1.23045e-10
[Step 260] Avg. Gradient Norm: 7.37647e-12
[Step 270] Avg. Gradient Norm: 2.08948e-11
[Step 280] Avg. Gradient Norm: 2.85766e-11
[Step 290] Avg. Gradient Norm: 2.77974e-10
[Step 300] Avg. Gradient Norm: 5.04055e-09
[Step 310] Avg. Gradient Norm: 1.57394e-07
[Step 320] Avg. Gradient Norm: 7.9907e-06
[Step 330] Avg. Gradient Norm: 0.000629361
[Step 340] Avg. Gradient Norm: 0.0377374
[Step 350] Avg. Gradient Norm: 0.0864466
[Step 360] Avg. Gradient Norm: 0.0287763
[Step 370] Avg. Gradient Norm: 0.0137643
[Step 380] Avg. Gradient Norm: 0.0101956
[Step 390] Avg. Gradient Norm: 0.0087624
[Step 400] Avg. Gradient Norm: 0.00821406
[Step 410] Avg. Gradient Norm: 0.00798948
[Step 420] Avg. Gradient Norm: 0.00782703
[Step 430] Avg. Gradient Norm: 0.00759733
[Step 440] Avg. Gradient Norm: 0.00723665
[Step 450] Avg. Gradient Norm: 0.00683833
[Step 460] Avg. Gradient Norm: 0.00651148
[Step 470] Avg. Gradient Norm: 0.00610025
[Step 480] Avg. Gradient Norm: 0.00571612
[Step 490] Avg. Gradient Norm: 0.00551033
[Step 500] Avg. Gradient Norm: 0.00537704
[Step 510] Avg. Gradient Norm: 0.00524043
[Step 520] Avg. Gradient Norm: 0.00504568
[Step 530] Avg. Gradient Norm: 0.00492819
[Step 540] Avg. Gradient Norm: 0.00478753
[Step 550] Avg. Gradient Norm: 0.00470573
[Step 560] Avg. Gradient Norm: 0.00459653
[Step 570] Avg. Gradient Norm: 0.00436502
[Step 580] Avg. Gradient Norm: 0.00405163
[Step 590] Avg. Gradient Norm: 0.00387847
[Step 600] Avg. Gradient Norm: 0.00363504
[Step 610] Avg. Gradient Norm: 0.00345075
[Step 620] Avg. Gradient Norm: 0.00329351
[Step 630] Avg. Gradient Norm: 0.00311661
[Step 640] Avg. Gradient Norm: 0.00300835
[Step 650] Avg. Gradient Norm: 0.00292016
[Step 660] Avg. Gradient Norm: 0.00294263
[Step 670] Avg. Gradient Norm: 0.00279009
[Step 680] Avg. Gradient Norm: 0.00259829
[Step 690] Avg. Gradient Norm: 0.00243013
[Step 700] Avg. Gradient Norm: 0.00230396
[Step 710] Avg. Gradient Norm: 0.00233775
[Step 720] Avg. Gradient Norm: 0.00243892
[Step 730] Avg. Gradient Norm: 0.00235893
[Step 740] Avg. Gradient Norm: 0.00226121
[Step 750] Avg. Gradient Norm: 0.00221478
[Step 760] Avg. Gradient Norm: 0.00214333
[Step 770] Avg. Gradient Norm: 0.00206614
[Step 780] Avg. Gradient Norm: 0.00189938
[Step 790] Avg. Gradient Norm: 0.00182071
[Step 800] Avg. Gradient Norm: 0.00183494
[Step 810] Avg. Gradient Norm: 0.00193397
[Step 820] Avg. Gradient Norm: 0.00196122
[Step 830] Avg. Gradient Norm: 0.00184061
[Step 840] Avg. Gradient Norm: 0.00170407
[Step 850] Avg. Gradient Norm: 0.00157969
[Step 860] Avg. Gradient Norm: 0.00138117
[Step 870] Avg. Gradient Norm: 0.00128773
[Step 880] Avg. Gradient Norm: 0.00123935
[Step 890] Avg. Gradient Norm: 0.00125743
[Step 900] Avg. Gradient Norm: 0.00112275
[Step 910] Avg. Gradient Norm: 0.00101219
[Step 920] Avg. Gradient Norm: 0.00107188
[Step 930] Avg. Gradient Norm: 0.00108749
[Step 940] Avg. Gradient Norm: 0.0011048
[Step 950] Avg. Gradient Norm: 0.00110982
[Step 960] Avg. Gradient Norm: 0.0010239
[Step 970] Avg. Gradient Norm: 0.00101843
[Step 980] Avg. Gradient Norm: 0.00103544
[Step 990] Avg. Gradient Norm: 0.00103231
_time_initialization: 0.0004s
_time_knn: 0.114729s
_time_symmetry: 0.035447s
_time_init_low_dim: 0.000495s
_time_init_fft: 0.001357s
_time_compute_charges: 0.002109s
_time_precompute_2d: 0.139967s
_time_nbodyfft: 0.172617s
_time_norm: 0.027826s
_time_attr: 0.07049s
_time_apply_forces: 0.064584s
_time_other: 0.005371s
total_time: 0.635392s
>>>
while in my 3090 GPU:
>>> tsnecuda.test()
Initializing cuda handles... done.
KNN Computation... done.
Computing Pij matrix...
done.
Initializing low dim points... done.
Initializing CUDA memory... done.
[Step 0] Avg. Gradient Norm: 0.00316841
[Step 10] Avg. Gradient Norm: 0.00020235
[Step 20] Avg. Gradient Norm: 1.17639e-05
[Step 30] Avg. Gradient Norm: 6.08587e-07
[Step 40] Avg. Gradient Norm: 3.33705e-08
[Step 50] Avg. Gradient Norm: 2.13425e-09
[Step 60] Avg. Gradient Norm: 1.352e-09
[Step 70] Avg. Gradient Norm: 7.24622e-09
[Step 80] Avg. Gradient Norm: 3.61282e-09
[Step 90] Avg. Gradient Norm: 8.77663e-10
[Step 100] Avg. Gradient Norm: 8.04638e-10
[Step 110] Avg. Gradient Norm: 8.40385e-10
[Step 120] Avg. Gradient Norm: 1.27527e-09
[Step 130] Avg. Gradient Norm: 4.91896e-09
[Step 140] Avg. Gradient Norm: 8.725e-10
[Step 150] Avg. Gradient Norm: 1.09649e-08
[Step 160] Avg. Gradient Norm: 1.72673e-08
[Step 170] Avg. Gradient Norm: 4.03674e-09
[Step 180] Avg. Gradient Norm: 1.41993e-09
[Step 190] Avg. Gradient Norm: 3.52255e-09
[Step 200] Avg. Gradient Norm: 8.46993e-09
[Step 210] Avg. Gradient Norm: 6.05243e-09
[Step 220] Avg. Gradient Norm: 5.35294e-09
[Step 230] Avg. Gradient Norm: 4.24595e-09
[Step 240] Avg. Gradient Norm: 2.89514e-09
[Step 250] Avg. Gradient Norm: 9.4303e-09
[Step 260] Avg. Gradient Norm: 4.8251e-09
[Step 270] Avg. Gradient Norm: 3.36828e-09
[Step 280] Avg. Gradient Norm: 1.96735e-08
[Step 290] Avg. Gradient Norm: 2.80626e-07
[Step 300] Avg. Gradient Norm: 6.88397e-06
[Step 310] Avg. Gradient Norm: 0.000276592
[Step 320] Avg. Gradient Norm: 0.0173417
[Step 330] Avg. Gradient Norm: 1.10052
[Step 340] Avg. Gradient Norm: 8.72256
[Step 350] Avg. Gradient Norm: 3.15913
[Step 360] Avg. Gradient Norm: 1.69793
[Step 370] Avg. Gradient Norm: 3.2298
[Step 380] Avg. Gradient Norm: 9.03506
[Step 390] Avg. Gradient Norm: 4.04239
[Step 400] Avg. Gradient Norm: 2.67969
[Step 410] Avg. Gradient Norm: 2.43034
[Step 420] Avg. Gradient Norm: 2.35848
[Step 430] Avg. Gradient Norm: 2.36371
[Step 440] Avg. Gradient Norm: 2.40016
[Step 450] Avg. Gradient Norm: 2.45785
[Step 460] Avg. Gradient Norm: 2.52751
[Step 470] Avg. Gradient Norm: 2.61023
[Step 480] Avg. Gradient Norm: 2.71442
[Step 490] Avg. Gradient Norm: 2.78857
[Step 500] Avg. Gradient Norm: 2.84582
[Step 510] Avg. Gradient Norm: 2.88569
[Step 520] Avg. Gradient Norm: 2.94037
[Step 530] Avg. Gradient Norm: 3.00796
[Step 540] Avg. Gradient Norm: 3.0736
[Step 550] Avg. Gradient Norm: 3.10501
[Step 560] Avg. Gradient Norm: 3.13196
[Step 570] Avg. Gradient Norm: 3.13702
[Step 580] Avg. Gradient Norm: 3.16519
[Step 590] Avg. Gradient Norm: 3.19187
[Step 600] Avg. Gradient Norm: 3.21807
[Step 610] Avg. Gradient Norm: 3.24802
[Step 620] Avg. Gradient Norm: 3.26183
[Step 630] Avg. Gradient Norm: 3.26732
[Step 640] Avg. Gradient Norm: 3.26586
[Step 650] Avg. Gradient Norm: 3.2619
[Step 660] Avg. Gradient Norm: 3.25324
[Step 670] Avg. Gradient Norm: 3.25705
[Step 680] Avg. Gradient Norm: 3.27121
[Step 690] Avg. Gradient Norm: 3.28956
[Step 700] Avg. Gradient Norm: 3.2974
[Step 710] Avg. Gradient Norm: 3.30126
[Step 720] Avg. Gradient Norm: 3.31402
[Step 730] Avg. Gradient Norm: 3.3251
[Step 740] Avg. Gradient Norm: 3.32682
[Step 750] Avg. Gradient Norm: 3.33112
[Step 760] Avg. Gradient Norm: 3.34129
[Step 770] Avg. Gradient Norm: 3.35972
[Step 780] Avg. Gradient Norm: 3.37386
[Step 790] Avg. Gradient Norm: 3.38687
[Step 800] Avg. Gradient Norm: 3.39692
[Step 810] Avg. Gradient Norm: 3.3988
[Step 820] Avg. Gradient Norm: 3.40635
[Step 830] Avg. Gradient Norm: 3.41817
[Step 840] Avg. Gradient Norm: 3.43308
[Step 850] Avg. Gradient Norm: 3.44227
[Step 860] Avg. Gradient Norm: 3.44587
[Step 870] Avg. Gradient Norm: 3.45133
[Step 880] Avg. Gradient Norm: 3.4623
[Step 890] Avg. Gradient Norm: 3.47085
[Step 900] Avg. Gradient Norm: 3.48198
[Step 910] Avg. Gradient Norm: 3.48978
[Step 920] Avg. Gradient Norm: 3.49834
[Step 930] Avg. Gradient Norm: 3.51401
[Step 940] Avg. Gradient Norm: 3.52248
[Step 950] Avg. Gradient Norm: 3.52983
[Step 960] Avg. Gradient Norm: 3.53699
[Step 970] Avg. Gradient Norm: 3.52919
[Step 980] Avg. Gradient Norm: 3.51682
[Step 990] Avg. Gradient Norm: 3.49306
_time_initialization: 0.002144s
_time_knn: 0.411477s
_time_symmetry: 0.619355s
_time_init_low_dim: 0.000649s
_time_init_fft: 0.00359s
_time_compute_charges: 0.002199s
_time_precompute_2d: 1.98696s
_time_nbodyfft: 5.07265s
_time_norm: 0.565026s
_time_attr: 1.47092s
_time_apply_forces: 1.42147s
_time_other: 0.904033s
total_time: 12.4605s
I found many ptxas
outputs when building faiss
[ 69%] Building CUDA object faiss/CMakeFiles/faiss.dir/gpu/impl/IVFUtilsSelect1.cu.o
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4118; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4278; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4370; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4519; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4610; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4771; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 4883; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5033; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5138; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5141; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5144; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5311; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5314; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5317; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5446; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5449; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5452; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5608; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5611; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5614; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5732; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 5896; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 6034; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 6187; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 6978; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 6981; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 6984; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7156; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7159; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7162; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7398; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7401; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7404; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7565; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7568; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 7571; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8608; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8611; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8614; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8617; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8620; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8800; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8803; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8806; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8809; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 8812; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9146; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9149; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9152; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9155; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9158; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9327; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9330; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9333; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9336; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 9339; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10646; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10649; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10652; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10655; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10658; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10661; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10664; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10852; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10855; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10858; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10861; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10864; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10867; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 10870; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11302; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11305; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11308; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11311; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11314; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11317; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11320; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11497; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11500; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11503; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11506; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11509; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11512; warning : ld
ptxas /tmp/tmpxft_00007eae_00000000-6_IVFPQ.ptx, line 11515; warning : ld
The reason is probably the kernel-level optimizations here: https://github.com/CannyLab/tsne-cuda/blob/master/src/include/options.h, or an upstream sub-optimization in FAISS. Every GPU architecture is different, so it needs to be tuned to get the best possible performance.
That being said, I'm not actually seeing this discrepancy on my 3090 GPU machine with CUDA 11.2:
Initializing cuda handles... done.
KNN Computation... done.
Computing Pij matrix...
done.
Initializing low dim points... done.
Initializing CUDA memory... done.
[Step 0] Avg. Gradient Norm: 0.00314772
[Step 10] Avg. Gradient Norm: 0.00020423
[Step 20] Avg. Gradient Norm: 1.20728e-05
[Step 30] Avg. Gradient Norm: 6.27223e-07
[Step 40] Avg. Gradient Norm: 3.38021e-08
[Step 50] Avg. Gradient Norm: 1.8473e-09
[Step 60] Avg. Gradient Norm: 1.51767e-09
[Step 70] Avg. Gradient Norm: 6.54819e-10
[Step 80] Avg. Gradient Norm: 5.92469e-09
[Step 90] Avg. Gradient Norm: 1.02277e-08
[Step 100] Avg. Gradient Norm: 1.43477e-09
[Step 110] Avg. Gradient Norm: 1.42235e-08
[Step 120] Avg. Gradient Norm: 5.23869e-09
[Step 130] Avg. Gradient Norm: 8.56736e-09
[Step 140] Avg. Gradient Norm: 1.28851e-08
[Step 150] Avg. Gradient Norm: 6.21431e-10
[Step 160] Avg. Gradient Norm: 8.96805e-09
[Step 170] Avg. Gradient Norm: 6.02116e-09
[Step 180] Avg. Gradient Norm: 1.71678e-08
[Step 190] Avg. Gradient Norm: 1.91975e-09
[Step 200] Avg. Gradient Norm: 1.03975e-09
[Step 210] Avg. Gradient Norm: 1.77325e-09
[Step 220] Avg. Gradient Norm: 1.10725e-08
[Step 230] Avg. Gradient Norm: 5.40395e-09
[Step 240] Avg. Gradient Norm: 2.82379e-09
[Step 250] Avg. Gradient Norm: 1.95103e-09
[Step 260] Avg. Gradient Norm: 1.54212e-09
[Step 270] Avg. Gradient Norm: 1.50517e-09
[Step 280] Avg. Gradient Norm: 1.38435e-09
[Step 290] Avg. Gradient Norm: 8.95915e-09
[Step 300] Avg. Gradient Norm: 1.34948e-07
[Step 310] Avg. Gradient Norm: 3.5555e-06
[Step 320] Avg. Gradient Norm: 0.000156078
[Step 330] Avg. Gradient Norm: 0.0110618
[Step 340] Avg. Gradient Norm: 1.16453
[Step 350] Avg. Gradient Norm: 9.47729
[Step 360] Avg. Gradient Norm: 7.91269
[Step 370] Avg. Gradient Norm: 6.11474
[Step 380] Avg. Gradient Norm: 3.39935
[Step 390] Avg. Gradient Norm: 2.74329
[Step 400] Avg. Gradient Norm: 2.58521
[Step 410] Avg. Gradient Norm: 2.55392
[Step 420] Avg. Gradient Norm: 2.61195
[Step 430] Avg. Gradient Norm: 2.70697
[Step 440] Avg. Gradient Norm: 2.82576
[Step 450] Avg. Gradient Norm: 2.90847
[Step 460] Avg. Gradient Norm: 2.98076
[Step 470] Avg. Gradient Norm: 3.03994
[Step 480] Avg. Gradient Norm: 3.0804
[Step 490] Avg. Gradient Norm: 3.11504
[Step 500] Avg. Gradient Norm: 3.14148
[Step 510] Avg. Gradient Norm: 3.1645
[Step 520] Avg. Gradient Norm: 3.19068
[Step 530] Avg. Gradient Norm: 3.21949
[Step 540] Avg. Gradient Norm: 3.24235
[Step 550] Avg. Gradient Norm: 3.27168
[Step 560] Avg. Gradient Norm: 3.29401
[Step 570] Avg. Gradient Norm: 3.30527
[Step 580] Avg. Gradient Norm: 3.31214
[Step 590] Avg. Gradient Norm: 3.29672
[Step 600] Avg. Gradient Norm: 3.29398
[Step 610] Avg. Gradient Norm: 3.29224
[Step 620] Avg. Gradient Norm: 3.28888
[Step 630] Avg. Gradient Norm: 3.29391
[Step 640] Avg. Gradient Norm: 3.30172
[Step 650] Avg. Gradient Norm: 3.3106
[Step 660] Avg. Gradient Norm: 3.31849
[Step 670] Avg. Gradient Norm: 3.33584
[Step 680] Avg. Gradient Norm: 3.33329
[Step 690] Avg. Gradient Norm: 3.34695
[Step 700] Avg. Gradient Norm: 3.37151
[Step 710] Avg. Gradient Norm: 3.40357
[Step 720] Avg. Gradient Norm: 3.42249
[Step 730] Avg. Gradient Norm: 3.43473
[Step 740] Avg. Gradient Norm: 3.44604
[Step 750] Avg. Gradient Norm: 3.45211
[Step 760] Avg. Gradient Norm: 3.46605
[Step 770] Avg. Gradient Norm: 3.48111
[Step 780] Avg. Gradient Norm: 3.48422
[Step 790] Avg. Gradient Norm: 3.48821
[Step 800] Avg. Gradient Norm: 3.50229
[Step 810] Avg. Gradient Norm: 3.50501
[Step 820] Avg. Gradient Norm: 3.51092
[Step 830] Avg. Gradient Norm: 3.52402
[Step 840] Avg. Gradient Norm: 3.53734
[Step 850] Avg. Gradient Norm: 3.53823
[Step 860] Avg. Gradient Norm: 3.54102
[Step 870] Avg. Gradient Norm: 3.54682
[Step 880] Avg. Gradient Norm: 3.55585
[Step 890] Avg. Gradient Norm: 3.56497
[Step 900] Avg. Gradient Norm: 3.56971
[Step 910] Avg. Gradient Norm: 3.56913
[Step 920] Avg. Gradient Norm: 3.56259
[Step 930] Avg. Gradient Norm: 3.55956
[Step 940] Avg. Gradient Norm: 3.57004
[Step 950] Avg. Gradient Norm: 3.57873
[Step 960] Avg. Gradient Norm: 3.5813
[Step 970] Avg. Gradient Norm: 3.58095
[Step 980] Avg. Gradient Norm: 3.57936
[Step 990] Avg. Gradient Norm: 3.56902
_time_initialization: 0.000592s
_time_knn: 0.448997s
_time_symmetry: 0.043185s
_time_init_low_dim: 0.000269s
_time_init_fft: 0.0022s
_time_compute_charges: 0.001074s
_time_precompute_2d: 0.226982s
_time_nbodyfft: 0.646268s
_time_norm: 0.022039s
_time_attr: 0.039366s
_time_apply_forces: 0.035467s
_time_other: 0.01199s
total_time: 1.47843s
In terms of the PTXAS warnings - they're upstream issues that come from thrust, a low-level computing library that we use for simple GPU operations.
I've tried the CUDA 11.2, but result is pretty much the same.
BTW, on your 3090 machine, did you build as we discussed in (#95)? I will really appreciate it if can share your env settings.
I'm using the official conda build (3.0.0) with MKL that I released yesterday - It's built with the Dockerfile here: https://github.com/CannyLab/tsne-cuda/blob/master/packaging/Dockerfile.cuda11.2.
I'm really impressed that the 2060 was able to achieve 0.6 on the test - I don't think I've ever seen times that good on any machine. I wonder if this has anything to do with variation in the cards, or maybe a thermal load issue? I know some 3090s had issues: (https://www.reddit.com/r/nvidia/comments/k4hqgk/performance_improvement_by_repasting_and_adding/), but the difference between 1.5/5 seconds is pretty big.
My machine:
Ubuntu 20.04, Python 3.7.10
CUDA 11.2, NVIDIA Driver 460.8
AMD Ryzen 5950x, 128 GB RAM, 2x 3090 FE (Restricted to 1 for testing with CUDA_VISIBLE_DEVICES)
@LiUzHiAn - do you have any updates on this?
I didn't install from conda binaries, but from source instead. But the performance on RTX3090 was still not good. Will try to use the conda source later, I am a little busy with other stuff these days...
Just found this issue - I maintain the conda-forge packaging for faiss, which does pre-compile for sm_86
(the GPU architecture of the RTX 3090, cf. here).
You could try rerunning your test after doing conda install -c conda-forge faiss-gpu
.
Hello - I think am having a similar issue as well with an NVIDIA RTX A5000 (assuming b/c of Ampere architecture). tsnecuda.test()
finishes after around 40-50s.
My full installation instructions used were:
pip install faiss-gpu==1.6.5
pip install tsnecuda==3.0.0+cu112 -f https://tsnecuda.isx.ai/tsnecuda_stable --no-deps
conda install -c anaconda mkl
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/hpc/group/tdunn/joshwu/miniconda3/envs/capture/lib/
Also tried conda install -c conda-forge faiss-gpu
as @h-vetinari suggested, but it did not increase the speed.
Does anyone have any updates on this? Thanks!
Here is the timing printout:
KNN Computation... done.
Computing Pij matrix...
done.
Initializing low dim points... done.
Initializing CUDA memory... done.
[Step 0] Avg. Gradient Norm: 0.00317704
[Step 10] Avg. Gradient Norm: 0.000205807
[Step 20] Avg. Gradient Norm: 1.21534e-05
[Step 30] Avg. Gradient Norm: 6.33296e-07
[Step 40] Avg. Gradient Norm: 3.39917e-08
[Step 50] Avg. Gradient Norm: 2.27478e-09
[Step 60] Avg. Gradient Norm: 1.59272e-09
[Step 70] Avg. Gradient Norm: 2.56448e-09
[Step 80] Avg. Gradient Norm: 1.24919e-09
[Step 90] Avg. Gradient Norm: 6.48788e-09
[Step 100] Avg. Gradient Norm: 3.22374e-09
[Step 110] Avg. Gradient Norm: 7.76638e-10
[Step 120] Avg. Gradient Norm: 1.09507e-09
[Step 130] Avg. Gradient Norm: 8.15053e-09
[Step 140] Avg. Gradient Norm: 4.91252e-09
[Step 150] Avg. Gradient Norm: 1.64459e-09
[Step 160] Avg. Gradient Norm: 5.4541e-09
[Step 170] Avg. Gradient Norm: 1.98115e-09
[Step 180] Avg. Gradient Norm: 1.57105e-09
[Step 190] Avg. Gradient Norm: 1.95012e-09
[Step 200] Avg. Gradient Norm: 1.76395e-09
[Step 210] Avg. Gradient Norm: 4.69671e-09
[Step 220] Avg. Gradient Norm: 2.64756e-09
[Step 230] Avg. Gradient Norm: 6.0917e-09
[Step 240] Avg. Gradient Norm: 1.73392e-09
[Step 250] Avg. Gradient Norm: 2.97268e-09
[Step 260] Avg. Gradient Norm: 9.64914e-10
[Step 270] Avg. Gradient Norm: 6.8756e-10
[Step 280] Avg. Gradient Norm: 2.66208e-09
[Step 290] Avg. Gradient Norm: 2.41901e-08
[Step 300] Avg. Gradient Norm: 4.05108e-07
[Step 310] Avg. Gradient Norm: 1.16529e-05
[Step 320] Avg. Gradient Norm: 0.000545325
[Step 330] Avg. Gradient Norm: 0.039768
[Step 340] Avg. Gradient Norm: 2.93634
[Step 350] Avg. Gradient Norm: 8.43551
[Step 360] Avg. Gradient Norm: 8.35451
[Step 370] Avg. Gradient Norm: 4.38604
[Step 380] Avg. Gradient Norm: 3.29575
[Step 390] Avg. Gradient Norm: 2.98643
[Step 400] Avg. Gradient Norm: 2.89365
[Step 410] Avg. Gradient Norm: 2.89646
[Step 420] Avg. Gradient Norm: 2.94314
[Step 430] Avg. Gradient Norm: 2.9699
[Step 440] Avg. Gradient Norm: 3.0068
[Step 450] Avg. Gradient Norm: 3.07336
[Step 460] Avg. Gradient Norm: 3.1301
[Step 470] Avg. Gradient Norm: 3.15443
[Step 480] Avg. Gradient Norm: 3.17626
[Step 490] Avg. Gradient Norm: 3.22365
[Step 500] Avg. Gradient Norm: 3.23884
[Step 510] Avg. Gradient Norm: 3.25859
[Step 520] Avg. Gradient Norm: 3.27849
[Step 530] Avg. Gradient Norm: 3.2787
[Step 540] Avg. Gradient Norm: 3.29459
[Step 550] Avg. Gradient Norm: 3.32802
[Step 560] Avg. Gradient Norm: 3.34371
[Step 570] Avg. Gradient Norm: 3.35071
[Step 580] Avg. Gradient Norm: 3.37802
[Step 590] Avg. Gradient Norm: 3.38962
[Step 600] Avg. Gradient Norm: 3.41323
[Step 610] Avg. Gradient Norm: 3.43185
[Step 620] Avg. Gradient Norm: 3.43688
[Step 630] Avg. Gradient Norm: 3.43925
[Step 640] Avg. Gradient Norm: 3.4438
[Step 650] Avg. Gradient Norm: 3.45464
[Step 660] Avg. Gradient Norm: 3.45484
[Step 670] Avg. Gradient Norm: 3.45409
[Step 680] Avg. Gradient Norm: 3.46934
[Step 690] Avg. Gradient Norm: 3.47985
[Step 700] Avg. Gradient Norm: 3.50077
[Step 710] Avg. Gradient Norm: 3.50544
[Step 720] Avg. Gradient Norm: 3.50302
[Step 730] Avg. Gradient Norm: 3.50576
[Step 740] Avg. Gradient Norm: 3.51903
[Step 750] Avg. Gradient Norm: 3.54009
[Step 760] Avg. Gradient Norm: 3.5458
[Step 770] Avg. Gradient Norm: 3.55806
[Step 780] Avg. Gradient Norm: 3.57719
[Step 790] Avg. Gradient Norm: 3.58843
[Step 800] Avg. Gradient Norm: 3.59292
[Step 810] Avg. Gradient Norm: 3.59815
[Step 820] Avg. Gradient Norm: 3.60438
[Step 830] Avg. Gradient Norm: 3.60921
[Step 840] Avg. Gradient Norm: 3.61636
[Step 850] Avg. Gradient Norm: 3.63057
[Step 860] Avg. Gradient Norm: 3.63681
[Step 870] Avg. Gradient Norm: 3.64818
[Step 880] Avg. Gradient Norm: 3.65464
[Step 890] Avg. Gradient Norm: 3.66827
[Step 900] Avg. Gradient Norm: 3.68641
[Step 910] Avg. Gradient Norm: 3.70724
[Step 920] Avg. Gradient Norm: 3.71669
[Step 930] Avg. Gradient Norm: 3.71864
[Step 940] Avg. Gradient Norm: 3.71161
[Step 950] Avg. Gradient Norm: 3.71052
[Step 960] Avg. Gradient Norm: 3.71033
[Step 970] Avg. Gradient Norm: 3.7095
[Step 980] Avg. Gradient Norm: 3.70967
[Step 990] Avg. Gradient Norm: 3.6966
_time_initialization: 45.1235s
_time_knn: 0.575852s
_time_symmetry: 0.029863s
_time_init_low_dim: 0.000533s
_time_init_fft: 1.18662s
_time_compute_charges: 0.002367s
_time_precompute_2d: 0.291774s
_time_nbodyfft: 0.752047s
_time_norm: 0.027871s
_time_attr: 0.042888s
_time_apply_forces: 0.042094s
_time_other: 0.013827s
total_time: 48.0892
Also update - I ran 'conda install -c anaconda mkl' and it seems to be working now. Last try ran in under 3s. Not sure why this helped, but seeing if I can reproduce.
Update - it seems to work for me with these installation instructions from scratch:
conda create -n capture python=3.8 cudatoolkit=11.2 -c nvidia
conda install -c conda-forge faiss-gpu
pip install tsnecuda==3.0.0+cu112 -f https://tsnecuda.isx.ai/tsnecuda_stable --no-deps
pip install mkl
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/hpc/group/tdunn/joshwu/miniconda3/envs/capture/lib/
conda install -c anaconda mkl
also seems to work sometimes while pip install mkl
does not. The first time I got it to work - I ran conda install -c anaconda mkl
. Sometimes it only works when both are run. Not sure what is causing this inconsistency - might be a mistake on my end.
It does look like it's a driver/build/compatibility issue - since the vast majority of the time in the printout above is in _time_initialization: 45.1235s, which only tracks how long our code takes to initialize a CUDA handle, connect to the GPU drivers, and allocate memory. Anaconda really does seem to struggle with managing these requirements.
Also tried
conda install -c conda-forge faiss-gpu
as @h-vetinari suggested, but it did not increase the speed.
The installation instructions you posted are not good, in the sense that you're mixing pip & conda packages, which can lead to suboptimal, fragile & non-performant results.
I'm gonna take a stab at packaging tsne-cuda for conda-forge, then it should be as easy as
conda create -n my_env -c conda-forge tsne-cuda python=3.8
resp.
conda install -c conda-forge tsne-cuda
In the meantime, you should try to install everything but tsne-cuda from conda-forge, e.g.
# don't add '-c nvidia', no need to add cudatoolkit (will be picked up automatically)
# don't add '-c anaconda', don't try installing 'mkl', it's coming from conda-forge already
conda create -n capture -c conda-forge python=3.8 faiss-gpu=1.6.5 libblas=*=*mkl
pip install tsnecuda==3.0.0+cu112 -f https://tsnecuda.isx.ai/tsnecuda_stable --no-deps
OK, this was a longer slog than expected, but there's now a "staged recipe" for tsnecuda
in conda-forge: conda-forge/staged-recipes#17029
Feedback (and recipe co-maintainership) welcome! 🙃
Awesome work on this! I've been a bit busy - but if we need to release a minor update to smooth out packaging for conda-forge, let me know.
Cool, happy to hear!
but if we need to release a minor update to smooth out packaging for conda-forge, let me know.
Could you look at the PR and in particular patches 2 & 4? They're all pretty small (and can be improved, especially the 4th one), but it would help if those can eventually be upstreamed. The first patch is not applicable to your workflow, and for the 3rd can either be avoided if I build within build/
as well (or there's a magic CMake variable corresponding to the project root - not CMAKE_CURRENT_BINARY_DIR
- that would allow building from both root as well as build/
)
Ah, and removing setuptools
as a runtime dependency would be a great improvement! 😅
Closed as stale.