hidet-org/hidet

[Tracking Issue] Benchmarks

Opened this issue · 134 comments

This issue tracks the performance benchmarks of hidet vs. other dynamo backends in pytorch.

The benchmark scripts that produce these report are located at hidet/scripts/bench.

2023-04-05

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 63d75a7...ab69a97
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.656 3.261 3.302 1.481
resnet50 f16[1,3,224,224] 5.663 3.395 3.460 1.217
model/bert-base-uncased f32, bs=1, seq=128 6.060 3.095 2.920 2.335
model/bert-base-uncased f16, bs=1, seq=128 6.444 1.425 1.099 1.923

Time: 3.35 hours

2023-04-07

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 172166955d6b608f06394a23a37f99fa93009023...6289f46d21169c01c7b4a00cfb62e89484306998
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.542 1.408 1.407 1.291
resnet50 f16[1,3,224,224] 1.557 1.250 1.253 1.089
model/bert-base-uncased f32, bs=1, seq=128 3.037 2.727 2.631 2.012
model/bert-base-uncased f16, bs=1, seq=128 1.865 1.288 1.017 1.715

Time: 2.47 hours

2023-04-08

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 6289f46...68faaa5
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.533 1.402 1.399 1.303
resnet50 f16[1,3,224,224] 1.591 1.250 1.294 1.122
model/bert-base-uncased f32, bs=1, seq=128 3.033 2.807 2.634 2.008
model/bert-base-uncased f16, bs=1, seq=128 1.830 1.289 1.014 1.683

Time: 2.45 hours

2023-04-09

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 68faaa5...68faaa5
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.537 1.404 1.399 1.293
resnet50 f16[1,3,224,224] 1.578 1.234 1.258 1.055
model/bert-base-uncased f32, bs=1, seq=128 2.867 2.738 2.515 2.014
model/bert-base-uncased f16, bs=1, seq=128 1.863 1.287 1.014 1.688

Time: 2.45 hours

2023-04-10

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 68faaa5...da54417
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.541 1.406 1.404 1.280
resnet50 f16[1,3,224,224] 1.577 1.243 1.288 1.116
model/bert-base-uncased f32, bs=1, seq=128 2.958 2.809 2.612 2.015
model/bert-base-uncased f16, bs=1, seq=128 1.871 1.289 1.015 1.683

Time: 2.18 hours

2023-04-11

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: da54417...3a7b972
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.534 1.404 1.401 1.288
resnet50 f16[1,3,224,224] 1.614 1.261 1.280 1.040
model/bert-base-uncased f32, bs=1, seq=128 2.911 2.811 2.631 2.031
model/bert-base-uncased f16, bs=1, seq=128 1.843 1.289 1.013 1.574

Time: 2.18 hours

2023-04-11

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git commit: 634a3a2
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.584 3.263 3.316 1.512
resnet50 f16[1,3,224,224] 5.466 3.376 3.420 1.130
model/bert-base-uncased f32, bs=1, seq=128 6.085 3.093 2.898 2.354
model/bert-base-uncased f16, bs=1, seq=128 6.288 1.425 1.095 1.863

Time: 8.44 hours

2023-04-12

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 634a3a2...7f634d8
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.604 3.222 3.265 1.483
resnet50 f16[1,3,224,224] 5.401 3.365 3.412 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.085 3.082 2.896 2.357
model/bert-base-uncased f16, bs=1, seq=128 6.224 1.426 1.097 1.829

Time: 7.91 hours

2023-04-14

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 3a7b972...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.534 1.400 1.319 1.279
resnet50 f16[1,3,224,224] 1.623 1.207 1.226 1.006
model/bert-base-uncased f32, bs=1, seq=128 2.941 2.716 2.593 2.006
model/bert-base-uncased f16, bs=1, seq=128 1.909 1.299 0.963 1.573

Time: 2.82 hours

2023-04-13

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 7f634d8...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.651 3.271 3.282 1.509
resnet50 f16[1,3,224,224] 5.466 3.403 3.446 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.155 3.080 2.896 2.350
model/bert-base-uncased f16, bs=1, seq=128 6.249 1.426 1.097 1.831

Time: 7.95 hours

2023-04-15

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: ef81b2a...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.539 1.409 1.404 1.291
resnet50 f16[1,3,224,224] 1.580 1.175 1.200 1.008
model/bert-base-uncased f32, bs=1, seq=128 3.045 2.720 2.677 2.003
model/bert-base-uncased f16, bs=1, seq=128 1.949 1.297 1.014 1.587

Time: 2.68 hours

2023-04-14

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ef81b2a...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.722 3.330 3.370 1.483
resnet50 f16[1,3,224,224] 5.482 3.421 3.434 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.154 3.085 2.901 2.346
model/bert-base-uncased f16, bs=1, seq=128 6.346 1.426 1.097 1.828

Time: 8.10 hours

2023-04-16

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: ef81b2a...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.532 1.401 1.396 1.289
resnet50 f16[1,3,224,224] 1.625 1.199 1.231 1.009
model/bert-base-uncased f32, bs=1, seq=128 3.078 2.861 2.682 2.005
model/bert-base-uncased f16, bs=1, seq=128 1.959 1.299 1.013 1.592

Time: 2.75 hours

2023-04-15

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ef81b2a...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.817 3.375 3.422 1.482
resnet50 f16[1,3,224,224] 5.511 3.415 3.452 1.142
model/bert-base-uncased f32, bs=1, seq=128 7.157 3.092 2.903 2.351
model/bert-base-uncased f16, bs=1, seq=128 6.398 1.426 1.098 1.829

Time: 8.09 hours

2023-04-17

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: ef81b2a...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.542 1.402 1.404 1.280
resnet50 f16[1,3,224,224] 1.610 1.203 1.215 1.009
model/bert-base-uncased f32, bs=1, seq=128 2.954 2.710 2.678 1.962
model/bert-base-uncased f16, bs=1, seq=128 1.926 1.299 1.014 1.622

Time: 2.74 hours

2023-04-16

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ef81b2a...ef81b2a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.629 3.305 3.331 1.481
resnet50 f16[1,3,224,224] 5.719 3.485 3.522 1.144
model/bert-base-uncased f32, bs=1, seq=128 6.184 3.084 2.897 2.337
model/bert-base-uncased f16, bs=1, seq=128 6.372 1.426 1.096 1.838

Time: 8.08 hours

2023-04-18

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: ef81b2a...48e57cb
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.543 1.410 1.406 1.281
resnet50 f16[1,3,224,224] 1.639 1.207 1.216 1.009
model/bert-base-uncased f32, bs=1, seq=128 3.082 2.855 2.556 2.021
model/bert-base-uncased f16, bs=1, seq=128 1.943 1.298 1.015 1.610

Time: 2.76 hours

2023-04-17

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ef81b2a...3e7d959
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.643 3.228 3.296 1.516
resnet50 f16[1,3,224,224] 5.600 3.431 3.448 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.233 3.083 2.896 2.346
model/bert-base-uncased f16, bs=1, seq=128 6.154 1.424 1.095 1.828

Time: 8.54 hours

2023-04-18

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 3e7d959...f5afc42
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.707 3.267 3.308 1.506
resnet50 f16[1,3,224,224] 5.583 3.431 3.498 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.188 3.084 2.898 2.339
model/bert-base-uncased f16, bs=1, seq=128 6.277 1.426 1.095 1.828

Time: 8.06 hours

2023-04-20

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 48e57cb...67cd640
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.517 1.390 1.293 1.272
resnet50 f16[1,3,224,224] 1.623 1.197 1.228 1.042
model/bert-base-uncased f32, bs=1, seq=128 3.090 2.849 2.597 2.012
model/bert-base-uncased f16, bs=1, seq=128 1.922 1.297 0.990 1.572

Time: 2.55 hours

2023-04-21

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 67cd640...2165662
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.535 1.409 1.403 1.282
resnet50 f16[1,3,224,224] 1.609 1.160 1.192 1.042
model/bert-base-uncased f32, bs=1, seq=128 2.907 2.809 2.650 2.046
model/bert-base-uncased f16, bs=1, seq=128 1.960 1.297 1.046 1.572

Time: 2.50 hours

2023-04-22

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 2165662...f361211
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.537 1.411 1.398 1.267
resnet50 f16[1,3,224,224] 1.645 1.211 1.233 1.044
model/bert-base-uncased f32, bs=1, seq=128 3.097 2.859 2.675 2.006
model/bert-base-uncased f16, bs=1, seq=128 1.951 1.298 1.046 1.577

Time: 2.56 hours

2023-04-22

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: f5afc42...f361211
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.780 3.355 3.410 1.506
resnet50 f16[1,3,224,224] 5.658 3.445 3.513 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.238 3.083 2.895 2.350
model/bert-base-uncased f16, bs=1, seq=128 6.477 1.426 1.095 1.829

Time: 8.19 hours

2023-04-23

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: f361211...9a65fa2
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.543 1.409 1.403 1.289
resnet50 f16[1,3,224,224] 1.620 1.192 1.218 1.045
model/bert-base-uncased f32, bs=1, seq=128 3.092 2.853 2.651 1.995
model/bert-base-uncased f16, bs=1, seq=128 1.939 1.299 1.046 1.545

Time: 2.61 hours

2023-04-23

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: f361211...9a65fa2
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.851 3.350 3.398 1.480
resnet50 f16[1,3,224,224] 5.689 3.520 3.547 1.147
model/bert-base-uncased f32, bs=1, seq=128 6.228 3.082 2.898 2.334
model/bert-base-uncased f16, bs=1, seq=128 6.454 1.426 1.097 1.820

Time: 8.35 hours

2023-04-24

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 9a65fa2...9a65fa2
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.530 1.403 1.398 1.312
resnet50 f16[1,3,224,224] 1.605 1.191 1.231 1.045
model/bert-base-uncased f32, bs=1, seq=128 3.100 2.860 2.598 2.013
model/bert-base-uncased f16, bs=1, seq=128 1.926 1.298 1.047 1.546

Time: 2.47 hours

2023-04-24

  • Hidet version: 0.2.3.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 9a65fa2...9a65fa2
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.863 3.351 3.393 1.505
resnet50 f16[1,3,224,224] 5.575 3.427 3.452 1.148
model/bert-base-uncased f32, bs=1, seq=128 6.187 3.083 2.897 2.341
model/bert-base-uncased f16, bs=1, seq=128 6.351 1.426 1.098 1.823

Time: 8.26 hours

2023-04-25

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 9a65fa2...30ae787
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.534 1.408 1.401 1.281
resnet50 f16[1,3,224,224] 1.622 1.195 1.214 1.045
model/bert-base-uncased f32, bs=1, seq=128 2.929 2.711 2.684 2.030
model/bert-base-uncased f16, bs=1, seq=128 1.928 1.298 1.045 1.550

Time: 2.48 hours

2023-04-25

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 9a65fa2...30ae787
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.791 3.366 3.390 1.508
resnet50 f16[1,3,224,224] 5.684 3.526 3.549 1.148
model/bert-base-uncased f32, bs=1, seq=128 6.243 3.083 2.898 2.355
model/bert-base-uncased f16, bs=1, seq=128 6.370 1.428 1.098 1.823

Time: 8.40 hours

2023-04-26

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 30ae787...30ae787
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.536 1.401 1.400 1.275
resnet50 f16[1,3,224,224] 1.662 1.218 1.236 1.043
model/bert-base-uncased f32, bs=1, seq=128 3.043 2.828 2.604 2.037
model/bert-base-uncased f16, bs=1, seq=128 1.915 1.298 1.046 1.544

Time: 2.49 hours

2023-04-26

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 30ae787...30ae787
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.791 3.338 3.374 1.479
resnet50 f16[1,3,224,224] 5.680 3.509 3.522 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.283 3.082 2.898 2.337
model/bert-base-uncased f16, bs=1, seq=128 6.449 1.426 1.097 1.828

Time: 8.35 hours

2023-04-27

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 30ae787...af3d412
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.531 1.400 1.396 1.295
resnet50 f16[1,3,224,224] 1.635 1.194 1.228 1.046
model/bert-base-uncased f32, bs=1, seq=128 2.909 2.800 2.561 1.982
model/bert-base-uncased f16, bs=1, seq=128 1.917 1.298 1.048 1.547

Time: 2.48 hours

2023-04-27

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 30ae787...af3d412
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.768 3.347 3.379 1.501
resnet50 f16[1,3,224,224] 5.737 3.504 3.555 1.154
model/bert-base-uncased f32, bs=1, seq=128 6.356 3.086 2.899 2.341
model/bert-base-uncased f16, bs=1, seq=128 6.370 1.426 1.096 1.820

Time: 8.33 hours

2023-04-28

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: af3d412...f8b839c
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.548 1.409 1.404 1.314
resnet50 f16[1,3,224,224] 1.616 1.200 1.222 1.046
model/bert-base-uncased f32, bs=1, seq=128 3.095 2.855 2.680 2.015
model/bert-base-uncased f16, bs=1, seq=128 1.946 1.297 1.046 1.544

Time: 2.48 hours

2023-04-28

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: af3d412...f8b839c
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.797 3.364 3.390 1.507
resnet50 f16[1,3,224,224] 5.691 3.506 3.550 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.270 3.083 2.900 2.340
model/bert-base-uncased f16, bs=1, seq=128 6.388 1.426 1.096 1.822

Time: 8.31 hours

2023-04-29

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: f8b839c...3142734
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.536 1.412 1.405 1.298
resnet50 f16[1,3,224,224] 1.614 1.185 1.272 1.008
model/bert-base-uncased f32, bs=1, seq=128 2.918 2.784 2.583 1.944
model/bert-base-uncased f16, bs=1, seq=128 1.960 1.297 1.045 1.616

Time: 2.39 hours

2023-04-29

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: f8b839c...3142734
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.706 3.335 3.351 1.479
resnet50 f16[1,3,224,224] 5.636 3.436 3.463 1.147
model/bert-base-uncased f32, bs=1, seq=128 6.115 3.082 2.897 2.348
model/bert-base-uncased f16, bs=1, seq=128 6.329 1.426 1.097 1.891

Time: 7.73 hours

2023-04-30

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 3142734...dbfc57d
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.543 1.418 1.412 1.294
resnet50 f16[1,3,224,224] 1.702 1.335 1.366 1.044
model/bert-base-uncased f32, bs=1, seq=128 2.916 2.854 2.672 1.964
model/bert-base-uncased f16, bs=1, seq=128 1.946 1.299 1.047 1.607

Time: 2.66 hours

2023-04-30

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 3142734...dbfc57d
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.709 3.314 3.365 1.477
resnet50 f16[1,3,224,224] 5.622 3.416 3.485 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.157 3.082 2.897 2.356
model/bert-base-uncased f16, bs=1, seq=128 6.333 1.425 1.099 1.896

Time: 8.77 hours

2023-05-01

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: dbfc57d...dbfc57d
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.545 1.417 1.414 1.291
resnet50 f16[1,3,224,224] 1.710 1.331 1.383 1.041
model/bert-base-uncased f32, bs=1, seq=128 3.086 2.850 2.577 2.025
model/bert-base-uncased f16, bs=1, seq=128 1.953 1.300 1.048 1.608

Time: 2.65 hours

2023-05-01

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: dbfc57d...dbfc57d
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.820 3.326 3.360 1.479
resnet50 f16[1,3,224,224] 5.640 3.442 3.487 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.292 3.085 2.901 2.366
model/bert-base-uncased f16, bs=1, seq=128 6.426 1.429 1.098 1.894

Time: 8.86 hours

2023-05-02

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: dbfc57d...ec3bc79
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.552 1.423 1.419 1.288
resnet50 f16[1,3,224,224] 1.744 1.337 1.357 1.010
model/bert-base-uncased f32, bs=1, seq=128 3.088 2.819 2.677 2.021
model/bert-base-uncased f16, bs=1, seq=128 1.948 1.300 1.046 1.605

Time: 2.69 hours

2023-05-02

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: dbfc57d...ec3bc79
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.733 3.347 3.367 1.481
resnet50 f16[1,3,224,224] 5.506 3.400 3.450 1.144
model/bert-base-uncased f32, bs=1, seq=128 6.109 3.079 2.895 2.352
model/bert-base-uncased f16, bs=1, seq=128 6.251 1.426 1.097 1.889

Time: 8.77 hours

2023-05-03

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec3bc79...ec3bc79
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.541 1.420 1.415 1.286
resnet50 f16[1,3,224,224] 1.718 1.346 1.397 1.042
model/bert-base-uncased f32, bs=1, seq=128 3.084 2.792 2.668 2.032
model/bert-base-uncased f16, bs=1, seq=128 1.958 1.299 1.047 1.621

Time: 2.68 hours

2023-05-03

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec3bc79...ec3bc79
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.667 3.340 3.375 1.479
resnet50 f16[1,3,224,224] 5.519 3.411 3.450 1.139
model/bert-base-uncased f32, bs=1, seq=128 6.129 3.081 2.897 2.354
model/bert-base-uncased f16, bs=1, seq=128 6.299 1.426 1.098 1.889

Time: 8.73 hours

2023-05-04

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec3bc79...fd2d80a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.547 1.415 1.415 1.300
resnet50 f16[1,3,224,224] 1.702 1.332 1.366 1.042
model/bert-base-uncased f32, bs=1, seq=128 3.018 2.859 2.617 1.996
model/bert-base-uncased f16, bs=1, seq=128 1.971 1.298 1.046 1.610

Time: 2.62 hours

2023-05-04

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec3bc79...fd2d80a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.619 3.263 3.301 1.479
resnet50 f16[1,3,224,224] 5.596 3.412 3.487 1.148
model/bert-base-uncased f32, bs=1, seq=128 6.120 3.095 2.900 2.356
model/bert-base-uncased f16, bs=1, seq=128 6.278 1.427 1.097 1.890

Time: 8.99 hours

2023-05-05

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fd2d80a...daae22e
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.548 1.419 1.419 1.309
resnet50 f16[1,3,224,224] 1.729 1.341 1.356 1.007
model/bert-base-uncased f32, bs=1, seq=128 3.013 2.860 2.679 1.982
model/bert-base-uncased f16, bs=1, seq=128 2.002 1.299 1.046 1.551

Time: 2.15 hours

2023-05-05

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fd2d80a...daae22e
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.607 3.284 3.301 1.504
resnet50 f16[1,3,224,224] 5.514 3.397 3.425 1.138
model/bert-base-uncased f32, bs=1, seq=128 6.078 3.083 2.900 2.329
model/bert-base-uncased f16, bs=1, seq=128 6.322 1.426 1.097 1.812

Time: 7.34 hours

2023-05-06

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: daae22e...4c51fe9
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.553 1.419 1.409 1.309
resnet50 f16[1,3,224,224] 1.718 1.345 1.362 1.039
model/bert-base-uncased f32, bs=1, seq=128 2.936 2.850 2.678 2.017
model/bert-base-uncased f16, bs=1, seq=128 2.013 1.300 1.048 1.550

Time: 2.16 hours

2023-05-06

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: daae22e...4c51fe9
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.693 3.271 3.315 1.507
resnet50 f16[1,3,224,224] 5.555 3.422 3.447 1.140
model/bert-base-uncased f32, bs=1, seq=128 6.170 3.082 2.898 2.335
model/bert-base-uncased f16, bs=1, seq=128 6.266 1.427 1.096 1.812

Time: 7.36 hours

2023-05-07

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 4c51fe9...49b832f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.551 1.423 1.413 1.309
resnet50 f16[1,3,224,224] 1.725 1.337 1.391 1.006
model/bert-base-uncased f32, bs=1, seq=128 3.013 2.727 2.673 1.988
model/bert-base-uncased f16, bs=1, seq=128 1.989 1.299 1.048 1.551

Time: 2.16 hours

2023-05-07

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 4c51fe9...49b832f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.686 3.293 3.312 1.506
resnet50 f16[1,3,224,224] 5.593 3.453 3.504 1.145
model/bert-base-uncased f32, bs=1, seq=128 6.155 3.082 2.898 2.334
model/bert-base-uncased f16, bs=1, seq=128 6.250 1.426 1.096 1.816

Time: 7.37 hours

2023-05-08

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 49b832f...49b832f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.546 1.418 1.413 1.309
resnet50 f16[1,3,224,224] 1.742 1.327 1.366 1.006
model/bert-base-uncased f32, bs=1, seq=128 3.015 2.721 2.677 1.979
model/bert-base-uncased f16, bs=1, seq=128 1.984 1.298 1.046 1.541

Time: 2.15 hours

2023-05-08

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 49b832f...49b832f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.690 3.300 3.345 1.508
resnet50 f16[1,3,224,224] 5.573 3.403 3.462 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.110 3.084 2.898 2.335
model/bert-base-uncased f16, bs=1, seq=128 6.390 1.426 1.098 1.817

Time: 7.36 hours

2023-05-09

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 49b832f...e6d89a7
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.552 1.426 1.419 1.329
resnet50 f16[1,3,224,224] 1.828 1.405 1.440 1.012
model/bert-base-uncased f32, bs=1, seq=128 3.097 2.857 2.675 1.989
model/bert-base-uncased f16, bs=1, seq=128 2.092 1.298 1.047 1.551

Time: 2.29 hours

2023-05-09

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 49b832f...e6d89a7
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.676 3.263 3.297 1.509
resnet50 f16[1,3,224,224] 5.515 3.375 3.416 1.150
model/bert-base-uncased f32, bs=1, seq=128 6.127 3.082 2.896 2.336
model/bert-base-uncased f16, bs=1, seq=128 6.263 1.426 1.098 1.811

Time: 7.34 hours

2023-05-10

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: e6d89a7...00e91dd
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.546 1.419 1.410 1.374
resnet50 f16[1,3,224,224] 1.814 1.404 1.613 1.012
model/bert-base-uncased f32, bs=1, seq=128 3.089 2.707 2.563 2.075
model/bert-base-uncased f16, bs=1, seq=128 2.090 1.300 1.047 1.551

Time: 2.78 hours

2023-05-10

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: e6d89a7...00e91dd
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.649 3.284 3.326 1.568
resnet50 f16[1,3,224,224] 5.637 3.424 3.465 1.155
model/bert-base-uncased f32, bs=1, seq=128 6.180 3.081 2.896 2.407
model/bert-base-uncased f16, bs=1, seq=128 6.355 1.425 1.096 1.812

Time: 9.10 hours

2023-05-11

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 00e91dd...5db7810
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.554 1.418 1.499 1.375
resnet50 f16[1,3,224,224] 1.805 1.418 1.456 1.044
model/bert-base-uncased f32, bs=1, seq=128 3.085 2.852 2.560 2.100
model/bert-base-uncased f16, bs=1, seq=128 2.074 1.301 1.049 1.554

Time: 2.78 hours

2023-05-11

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 00e91dd...5db7810
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.698 3.272 3.309 1.569
resnet50 f16[1,3,224,224] 5.657 3.446 3.498 1.155
model/bert-base-uncased f32, bs=1, seq=128 6.232 3.084 2.898 2.404
model/bert-base-uncased f16, bs=1, seq=128 6.422 1.426 1.097 1.827

Time: 9.12 hours

2023-05-12

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 5db7810...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.548 1.417 1.411 1.374
resnet50 f16[1,3,224,224] 1.805 1.646 1.452 1.009
model/bert-base-uncased f32, bs=1, seq=128 2.955 2.857 2.681 2.073
model/bert-base-uncased f16, bs=1, seq=128 2.088 1.299 1.048 1.539

Time: 2.83 hours

2023-05-12

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 5db7810...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.724 3.302 3.378 1.570
resnet50 f16[1,3,224,224] 5.570 3.464 3.495 1.140
model/bert-base-uncased f32, bs=1, seq=128 6.191 3.087 2.903 2.400
model/bert-base-uncased f16, bs=1, seq=128 6.410 1.426 1.096 1.823

Time: 9.08 hours

2023-05-13

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fe8b65f...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.552 1.418 1.413 1.374
resnet50 f16[1,3,224,224] 1.825 1.426 1.479 1.011
model/bert-base-uncased f32, bs=1, seq=128 3.098 2.862 2.573 2.081
model/bert-base-uncased f16, bs=1, seq=128 2.096 1.300 1.049 1.543

Time: 2.84 hours

2023-05-13

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fe8b65f...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.629 3.284 3.329 1.565
resnet50 f16[1,3,224,224] 5.548 3.437 3.461 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.112 3.085 2.904 2.400
model/bert-base-uncased f16, bs=1, seq=128 6.283 1.426 1.097 1.824

Time: 9.04 hours

2023-05-14

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fe8b65f...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.550 1.418 1.414 1.372
resnet50 f16[1,3,224,224] 1.800 1.546 1.441 1.015
model/bert-base-uncased f32, bs=1, seq=128 2.974 2.710 2.684 2.073
model/bert-base-uncased f16, bs=1, seq=128 2.117 1.298 1.046 1.541

Time: 2.83 hours

2023-05-14

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fe8b65f...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.672 3.240 3.320 1.568
resnet50 f16[1,3,224,224] 5.716 3.492 3.524 1.140
model/bert-base-uncased f32, bs=1, seq=128 6.138 3.082 2.899 2.400
model/bert-base-uncased f16, bs=1, seq=128 6.344 1.426 1.097 1.824

Time: 9.09 hours

2023-05-15

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 3090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fe8b65f...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.551 1.421 1.413 1.376
resnet50 f16[1,3,224,224] 1.825 1.412 1.430 1.007
model/bert-base-uncased f32, bs=1, seq=128 3.091 2.762 2.679 2.077
model/bert-base-uncased f16, bs=1, seq=128 2.094 1.299 1.047 1.544

Time: 2.79 hours

2023-05-15

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 20.04.6 LTS
  • GPU: NVIDIA A10G
  • GPU driver: 530.30.02 (12.1)
  • Git diff: fe8b65f...fe8b65f
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.725 3.309 3.350 1.570
resnet50 f16[1,3,224,224] 5.526 3.458 3.475 1.155
model/bert-base-uncased f32, bs=1, seq=128 6.104 3.094 2.900 2.413
model/bert-base-uncased f16, bs=1, seq=128 6.352 1.431 1.098 1.867

Time: 9.26 hours

2023-05-31

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: fe8b65f...5e69cce
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.244 1.066 1.090 1.053
resnet50 f16[1,3,224,224] 1.475 1.104 1.134 0.647
model/bert-base-uncased f32, bs=1, seq=128 2.055 1.864 1.576 1.196
model/bert-base-uncased f16, bs=1, seq=128 1.791 0.711 0.738 0.957

Time: 2.32 hours

2023-06-01

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.0)
  • Git diff: 5e69cce...5e69cce
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.230 1.061 1.085 1.066
resnet50 f16[1,3,224,224] 1.447 1.057 1.099 0.641
model/bert-base-uncased f32, bs=1, seq=128 2.049 1.866 1.670 1.219
model/bert-base-uncased f16, bs=1, seq=128 1.769 0.714 0.801 0.958

Time: 2.26 hours

2023-06-02

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 5e69cce...c1cfef8
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.939 1.852 1.864 0.869
resnet50 f16[1,3,224,224] 3.917 3.642 3.887 0.641
model/bert-base-uncased f32, bs=1, seq=128 2.046 1.749 1.691 1.280
model/bert-base-uncased f16, bs=1, seq=128 1.847 0.712 0.800 0.957

Time: 2.13 hours

2023-06-03

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.0+cu117
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: c1cfef8...870a702
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.941 1.852 1.863 0.819
resnet50 f16[1,3,224,224] 4.252 3.580 3.571 0.623
model/bert-base-uncased f32, bs=1, seq=128 1.965 1.859 1.695 1.276
model/bert-base-uncased f16, bs=1, seq=128 1.828 0.713 0.801 0.939

Time: 2.10 hours

2023-06-04

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 870a702...ca607f9
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.239 1.064 1.079 0.808
resnet50 f16[1,3,224,224] 1.404 1.111 1.096 0.621
model/bert-base-uncased f32, bs=1, seq=128 1.957 1.828 1.701 1.293
model/bert-base-uncased f16, bs=1, seq=128 1.933 0.714 0.801 0.940

Time: 2.09 hours

2023-06-05

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ca607f9...a1bd18c
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.237 1.056 1.080 0.822
resnet50 f16[1,3,224,224] 1.418 1.079 1.091 0.567
model/bert-base-uncased f32, bs=1, seq=128 1.972 1.862 1.701 1.161
model/bert-base-uncased f16, bs=1, seq=128 1.906 0.710 0.800 0.941

Time: 2.14 hours

2023-06-05

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git commit: a1bd18c
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 6.145 4.233 4.502 1.384
resnet50 f16[1,3,224,224] 7.176 4.303 4.437 1.029
model/bert-base-uncased f32, bs=1, seq=128 9.116 3.619 2.946 2.088
model/bert-base-uncased f16, bs=1, seq=128 9.784 1.239 1.189 1.563

Time: 8.29 hours

2023-06-06

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: a1bd18c...a1bd18c
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.220 1.048 1.097 0.812
resnet50 f16[1,3,224,224] 1.415 1.075 1.095 0.570
model/bert-base-uncased f32, bs=1, seq=128 1.984 1.864 1.697 1.293
model/bert-base-uncased f16, bs=1, seq=128 1.897 0.712 0.800 0.941

Time: 2.13 hours

2023-06-06

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git diff: a1bd18c...a1bd18c
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 5.802 4.281 4.373 1.384
resnet50 f16[1,3,224,224] 7.292 4.379 4.484 1.025
model/bert-base-uncased f32, bs=1, seq=128 9.202 4.066 3.127 2.161
model/bert-base-uncased f16, bs=1, seq=128 9.881 1.241 1.184 1.598

Time: 8.29 hours

2023-06-07

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: a1bd18c...260f0ee
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.235 1.055 1.091 0.799
resnet50 f16[1,3,224,224] 1.407 1.084 1.095 0.564
model/bert-base-uncased f32, bs=1, seq=128 2.046 1.859 1.699 1.163
model/bert-base-uncased f16, bs=1, seq=128 1.910 0.710 0.796 0.941

Time: 2.13 hours

2023-06-07

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git diff: a1bd18c...260f0ee
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 5.943 4.209 4.313 1.384
resnet50 f16[1,3,224,224] 7.108 4.282 4.423 1.025
model/bert-base-uncased f32, bs=1, seq=128 9.152 3.643 3.135 2.130
model/bert-base-uncased f16, bs=1, seq=128 9.861 1.242 1.184 1.571

Time: 8.26 hours

2023-06-08

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 260f0ee...ec23670
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.225 1.059 1.079 0.794
resnet50 f16[1,3,224,224] 1.417 1.092 1.099 0.566
model/bert-base-uncased f32, bs=1, seq=128 2.055 1.870 1.701 1.166
model/bert-base-uncased f16, bs=1, seq=128 1.928 0.715 0.801 0.943

Time: 2.13 hours

2023-06-08

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 260f0ee...ec23670
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 5.936 4.205 4.322 1.380
resnet50 f16[1,3,224,224] 7.039 4.214 4.362 1.027
model/bert-base-uncased f32, bs=1, seq=128 9.276 3.626 2.942 2.116
model/bert-base-uncased f16, bs=1, seq=128 9.887 1.242 1.183 1.589

Time: 8.25 hours

2023-06-09

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec23670...ec23670
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.232 1.059 1.083 0.793
resnet50 f16[1,3,224,224] 1.422 1.076 1.101 0.565
model/bert-base-uncased f32, bs=1, seq=128 2.048 1.869 1.705 1.169
model/bert-base-uncased f16, bs=1, seq=128 1.920 0.710 0.795 0.894

Time: 2.13 hours

2023-06-09

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec23670...ec23670
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 5.883 4.250 4.434 1.386
resnet50 f16[1,3,224,224] 7.112 4.319 4.485 1.029
model/bert-base-uncased f32, bs=1, seq=128 9.146 3.630 3.024 2.117
model/bert-base-uncased f16, bs=1, seq=128 9.886 1.239 1.188 1.578

Time: 8.30 hours

2023-06-10

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec23670...ec23670
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.220 1.050 1.092 0.801
resnet50 f16[1,3,224,224] 1.432 1.078 1.098 0.563
model/bert-base-uncased f32, bs=1, seq=128 2.045 1.869 1.666 1.184
model/bert-base-uncased f16, bs=1, seq=128 1.945 0.712 0.799 0.897

Time: 2.13 hours

2023-06-10

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec23670...ec23670
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 5.922 4.268 4.383 1.382
resnet50 f16[1,3,224,224] 7.081 4.330 4.420 1.027
model/bert-base-uncased f32, bs=1, seq=128 9.007 4.007 2.756 2.096
model/bert-base-uncased f16, bs=1, seq=128 9.866 1.241 1.186 1.561

Time: 8.30 hours

2023-06-11

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec23670...09463e8
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.218 1.069 1.079 0.733
resnet50 f16[1,3,224,224] 1.434 1.082 1.116 0.563
model/bert-base-uncased f32, bs=1, seq=128 2.051 1.868 1.701 1.164
model/bert-base-uncased f16, bs=1, seq=128 1.894 0.711 0.799 0.893

Time: 2.13 hours

2023-06-11

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA A100-SXM4-40GB
  • GPU driver: 530.30.02 (12.1)
  • Git diff: ec23670...09463e8
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 5.924 4.218 4.355 1.319
resnet50 f16[1,3,224,224] 7.594 4.344 4.442 1.037
model/bert-base-uncased f32, bs=1, seq=128 8.988 3.617 2.727 2.084
model/bert-base-uncased f16, bs=1, seq=128 9.930 1.239 1.184 1.565

Time: 8.16 hours

2023-06-12

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 09463e8...09463e8
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.249 1.071 1.094 0.734
resnet50 f16[1,3,224,224] 1.434 1.106 1.131 0.565
model/bert-base-uncased f32, bs=1, seq=128 2.050 1.865 1.702 1.165
model/bert-base-uncased f16, bs=1, seq=128 1.908 0.712 0.801 0.900

Time: 2.12 hours

2023-06-13

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 09463e8...09463e8
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.205 1.051 1.074 0.729
resnet50 f16[1,3,224,224] 1.440 1.081 1.124 0.568
model/bert-base-uncased f32, bs=1, seq=128 1.930 1.866 1.698 1.182
model/bert-base-uncased f16, bs=1, seq=128 1.924 0.710 0.798 0.890

Time: 2.12 hours

2023-06-14

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 09463e8...cdd75ec
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.225 1.043 1.077 0.762
resnet50 f16[1,3,224,224] 1.414 1.080 1.094 0.568
model/bert-base-uncased f32, bs=1, seq=128 2.053 1.867 1.698 1.290
model/bert-base-uncased f16, bs=1, seq=128 1.908 0.711 0.797 0.936

Time: 2.11 hours

2023-06-15

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: cdd75ec...cdd75ec
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.208 1.052 1.072 0.770
resnet50 f16[1,3,224,224] 1.416 1.087 1.137 0.566
model/bert-base-uncased f32, bs=1, seq=128 2.051 1.868 1.650 1.283
model/bert-base-uncased f16, bs=1, seq=128 1.918 0.712 0.799 0.944

Time: 2.11 hours

2023-06-16

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: cdd75ec...cdd75ec
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.224 1.054 1.076 0.765
resnet50 f16[1,3,224,224] 1.427 1.102 1.109 0.567
model/bert-base-uncased f32, bs=1, seq=128 2.052 1.864 1.703 1.293
model/bert-base-uncased f16, bs=1, seq=128 1.919 0.711 0.797 0.896

Time: 2.11 hours

2023-06-17

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: cdd75ec...d6e431e
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.220 1.077 1.069 0.762
resnet50 f16[1,3,224,224] 1.402 1.085 1.097 0.661
model/bert-base-uncased f32, bs=1, seq=128 2.039 1.856 1.633 1.291
model/bert-base-uncased f16, bs=1, seq=128 1.871 0.712 0.800 0.897

Time: 3.41 hours

2023-06-18

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: d6e431e...d6e431e
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.216 1.049 1.065 0.762
resnet50 f16[1,3,224,224] 1.410 1.080 1.089 0.659
model/bert-base-uncased f32, bs=1, seq=128 2.051 1.865 1.705 1.167
model/bert-base-uncased f16, bs=1, seq=128 1.894 0.714 0.801 0.894

Time: 3.42 hours

2023-06-19

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: d6e431e...bb1612e
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.211 1.059 1.074 0.766
resnet50 f16[1,3,224,224] 1.399 1.066 1.100 0.661
model/bert-base-uncased f32, bs=1, seq=128 2.052 1.866 1.623 1.304
model/bert-base-uncased f16, bs=1, seq=128 1.883 0.710 0.796 0.941

Time: 3.42 hours

2023-06-20

  • Hidet version: 0.2.4.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: bb1612e...289377a
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.229 1.062 1.091 0.767
resnet50 f16[1,3,224,224] 1.412 1.091 1.106 0.479
model/bert-base-uncased f32, bs=1, seq=128 1.972 1.866 1.701 1.166
model/bert-base-uncased f16, bs=1, seq=128 1.950 0.711 0.798 0.943

Time: 2.17 hours

2023-06-21

  • Hidet version: 0.3.0.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: 289377a...eb7e55b
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.253 1.063 1.081 0.733
resnet50 f16[1,3,224,224] 1.424 1.080 1.110 0.478
model/bert-base-uncased f32, bs=1, seq=128 2.052 1.830 1.702 1.166
model/bert-base-uncased f16, bs=1, seq=128 1.902 0.712 0.799 0.901

Time: 2.17 hours

2023-06-22

  • Hidet version: 0.3.0.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: eb7e55b...eb7e55b
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.217 1.059 1.074 0.730
resnet50 f16[1,3,224,224] 1.405 1.089 1.101 0.477
model/bert-base-uncased f32, bs=1, seq=128 2.050 1.870 1.699 1.164
model/bert-base-uncased f16, bs=1, seq=128 1.917 0.711 0.799 0.894

Time: 2.17 hours

2023-06-23

  • Hidet version: 0.3.0.dev
  • PyTorch version: 2.0.1+cu118
  • OS: Ubuntu 22.04.2 LTS
  • GPU: NVIDIA GeForce RTX 4090
  • GPU driver: 530.30.02 (12.1)
  • Git diff: eb7e55b...64b9f03
model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.240 1.055 1.084 0.737
resnet50 f16[1,3,224,224] 1.415 1.099 1.115 0.475
model/bert-base-uncased f32, bs=1, seq=128 2.022 1.867 1.677 1.167
model/bert-base-uncased f16, bs=1, seq=128 1.897 0.713 0.796 0.892

Time: 2.17 hours