-
Notifications
You must be signed in to change notification settings - Fork 3
improved wrapper #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -205,7 +205,7 @@ def __init__(self): | |||
# However, this is eliminated above while filtering out images without | |||
# annotations. Thus, we fake it here | |||
mock_dataset = SimpleNamespace(ids=["invalid"]) | |||
wrapper = wrapper_factory(mock_dataset) | |||
wrapper = wrapper_factory(mock_dataset, target_keys=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives v2 an "unfair advantage" due to pytorch/vision#7489. If we want a fairer comparison, we can do
wrapper = wrapper_factory(mock_dataset, target_keys=None) | |
wrapper = wrapper_factory(mock_dataset, target_keys={"boxes", "masks", "labels"}) |
However, this means we shouldn't report these numbers for v2 performance, since we are artificially throttling it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Philip, I can reproduce the results on the cluster
############################################################
detection-ssdlite
############################################################
loading annotations into memory...
Done (t=14.51s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='Tensor', api_version='v1'
transform min 25% quantile median 75% quantile max
---------------------- ----- -------------- -------- -------------- ------
ConvertCocoPolysToMask 728 2249 4591 9432 59412
PILToTensor 236 708 774 846 1740
RandomIoUCrop 34 480 772 9171 104918
RandomHorizontalFlip 16 21 30 458 3786
ConvertImageDtype 103 385 570 777 4690
---------------------- ----- -------------- -------- -------------- ------
Total 2319 5576 11605 19896 111309
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.04s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='Tensor', api_version='v2'
transform min 25% quantile median 75% quantile max
--------------------------- ----- -------------- -------- -------------- ------
WrapCocoSampleForTransforms 79 99 110 125 653
PILToTensor 351 469 500 535 1480
RandomIoUCrop 69 558 689 9138 122733
RandomHorizontalFlip 36 41 242 335 2470
ConvertDtype 104 246 398 633 2355
SanitizeBoundingBox 279 307 326 353 517
--------------------------- ----- -------------- -------- -------------- ------
Total 1286 2004 2602 10835 126345
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.18s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='PIL', api_version='v1'
transform min 25% quantile median 75% quantile max
---------------------- ----- -------------- -------- -------------- ------
ConvertCocoPolysToMask 741 2264 4581 9549 57323
RandomIoUCrop 33 700 1048 9503 106169
RandomHorizontalFlip 15 35 46 481 3819
PILToTensor 107 269 357 488 2742
ConvertImageDtype 86 358 566 974 3076
---------------------- ----- -------------- -------- -------------- ------
Total 2285 5427 11638 19999 113150
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.96s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='PIL', api_version='v2'
transform min 25% quantile median 75% quantile max
--------------------------- ----- -------------- -------- -------------- ------
WrapCocoSampleForTransforms 78 99 110 123 516
RandomIoUCrop 59 698 922 9208 124672
RandomHorizontalFlip 32 43 232 335 1940
PILToTensor 139 287 384 495 3926
ConvertDtype 108 282 457 656 2265
SanitizeBoundingBox 275 306 326 348 518
--------------------------- ----- -------------- -------- -------------- ------
Total 1198 2075 2693 10938 126231
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=12.91s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='Datapoint', api_version='v2'
transform min 25% quantile median 75% quantile max
--------------------------- ----- -------------- -------- -------------- ------
WrapCocoSampleForTransforms 83 106 119 139 325
ToImageTensor 283 500 533 569 1478
RandomIoUCrop 67 570 696 9153 121969
RandomHorizontalFlip 35 42 225 340 2471
ConvertDtype 108 291 461 658 2428
SanitizeBoundingBox 269 303 322 345 405
--------------------------- ----- -------------- -------- -------------- ------
Total 1188 2093 2644 10924 125690
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
Summary
[a] [b] [c] [d] [e]
------------------ ----- ----- ----- ----- -----
Tensor, v1 [a] 1.00 4.46 1.00 4.31 4.39
Tensor, v2 [b] 0.22 1.00 0.22 0.97 0.98
PIL, v1 [c] 1.00 4.47 1.00 4.32 4.40
PIL, v2 [d] 0.23 1.04 0.23 1.00 1.02
Datapoint, v2 [e] 0.23 1.02 0.23 0.98 1.00
Slowdown computed as row / column
############################################################
Collecting environment information...
PyTorch version: 2.1.0.dev20230403+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (conda-forge gcc 9.5.0-16) 9.5.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.25.2
Libc version: glibc-2.31
Python version: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-1019-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
Nvidia driver version: 525.85.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping: 7
CPU MHz: 2999.998
BogoMIPS: 5999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 1.5 MiB
L1i cache: 1.5 MiB
L2 cache: 48 MiB
L3 cache: 71.5 MiB
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] pytorch-pfn-extras==0.5.8
[pip3] pytorch-triton==2.1.0+46672772b4
[pip3] torch==2.1.0.dev20230403+cu117
[pip3] torchdata==0.5.0a0+25c6180
[pip3] torchvision==0.16.0a0+781f512
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] libblas 3.9.0 14_linux64_mkl conda-forge
[conda] libcblas 3.9.0 14_linux64_mkl conda-forge
[conda] liblapack 3.9.0 14_linux64_mkl conda-forge
[conda] liblapacke 3.9.0 14_linux64_mkl conda-forge
[conda] mkl 2022.0.1 h06a4308_117
[conda] numpy 1.22.3 pypi_0 pypi
[conda] pytorch-cuda 11.7 h778d358_3 pytorch-nightly
[conda] pytorch-pfn-extras 0.5.8 pypi_0 pypi
[conda] pytorch-triton 2.1.0+46672772b4 pypi_0 pypi
[conda] torch 2.1.0.dev20230403+cu117 pypi_0 pypi
[conda] torchdata 0.5.0a0+25c6180 dev_0 <develop>
[conda] torchvision 0.16.0a0+781f512 dev_0 <develop>
Using target_keys={"boxes", "masks", "labels"}
:
############################################################
detection-ssdlite
############################################################
loading annotations into memory...
Done (t=14.37s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='Tensor', api_version='v1'
transform min 25% quantile median 75% quantile max
---------------------- ----- -------------- -------- -------------- ------
ConvertCocoPolysToMask 747 2220 4601 9411 58741
PILToTensor 251 717 787 853 1921
RandomIoUCrop 34 485 772 9208 105959
RandomHorizontalFlip 16 21 32 455 7729
ConvertImageDtype 87 386 555 743 2750
---------------------- ----- -------------- -------- -------------- ------
Total 2228 5594 11608 19783 112173
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.04s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='Tensor', api_version='v2'
transform min 25% quantile median 75% quantile max
--------------------------- ----- -------------- -------- -------------- ------
WrapCocoSampleForTransforms 476 1289 2399 4918 29185
PILToTensor 280 476 518 570 1410
RandomIoUCrop 77 590 718 9146 121106
RandomHorizontalFlip 37 44 319 548 5096
ConvertDtype 108 278 451 690 2469
SanitizeBoundingBox 338 536 754 1269 10103
--------------------------- ----- -------------- -------- -------------- ------
Total 2361 4666 8570 16152 126087
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.31s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='PIL', api_version='v1'
transform min 25% quantile median 75% quantile max
---------------------- ----- -------------- -------- -------------- ------
ConvertCocoPolysToMask 722 2186 4523 9381 55098
RandomIoUCrop 31 783 1064 9416 105125
RandomHorizontalFlip 16 34 46 484 4472
PILToTensor 95 250 346 476 2694
ConvertImageDtype 79 347 555 959 3332
---------------------- ----- -------------- -------- -------------- ------
Total 2147 5515 11561 20034 112346
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.98s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='PIL', api_version='v2'
transform min 25% quantile median 75% quantile max
--------------------------- ----- -------------- -------- -------------- ------
WrapCocoSampleForTransforms 475 1291 2429 4851 26084
RandomIoUCrop 80 814 1106 9368 124367
RandomHorizontalFlip 37 46 304 575 5201
PILToTensor 119 273 376 501 1787
ConvertDtype 106 293 456 688 2173
SanitizeBoundingBox 346 528 747 1304 11644
--------------------------- ----- -------------- -------- -------------- ------
Total 2401 4846 8653 16361 126969
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
loading annotations into memory...
Done (t=13.15s)
creating index...
index created!
Caching 1000 ([89444, 73295, 101719] ... [31395, 96727, 47807]) COCO samples
input_type='Datapoint', api_version='v2'
transform min 25% quantile median 75% quantile max
--------------------------- ----- -------------- -------- -------------- ------
WrapCocoSampleForTransforms 506 1292 2413 4823 25880
ToImageTensor 236 561 702 857 1611
RandomIoUCrop 73 598 731 9110 121680
RandomHorizontalFlip 37 45 318 562 3139
ConvertDtype 120 321 475 709 2060
SanitizeBoundingBox 333 525 748 1285 9874
--------------------------- ----- -------------- -------- -------------- ------
Total 2428 4968 8628 16411 126754
Results computed for 1_000 samples and reported in µs
------------------------------------------------------------
Summary
[a] [b] [c] [d] [e]
------------------ ----- ----- ----- ----- -----
Tensor, v1 [a] 1.00 1.35 1.00 1.34 1.35
Tensor, v2 [b] 0.74 1.00 0.74 0.99 0.99
PIL, v1 [c] 1.00 1.35 1.00 1.34 1.34
PIL, v2 [d] 0.75 1.01 0.75 1.00 1.00
Datapoint, v2 [e] 0.74 1.01 0.75 1.00 1.00
Slowdown computed as row / column
############################################################
Collecting environment information...
PyTorch version: 2.1.0.dev20230403+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (conda-forge gcc 9.5.0-16) 9.5.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.25.2
Libc version: glibc-2.31
Python version: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-1019-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
Nvidia driver version: 525.85.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping: 7
CPU MHz: 2999.998
BogoMIPS: 5999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 1.5 MiB
L1i cache: 1.5 MiB
L2 cache: 48 MiB
L3 cache: 71.5 MiB
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] pytorch-pfn-extras==0.5.8
[pip3] pytorch-triton==2.1.0+46672772b4
[pip3] torch==2.1.0.dev20230403+cu117
[pip3] torchdata==0.5.0a0+25c6180
[pip3] torchvision==0.16.0a0+781f512
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] libblas 3.9.0 14_linux64_mkl conda-forge
[conda] libcblas 3.9.0 14_linux64_mkl conda-forge
[conda] liblapack 3.9.0 14_linux64_mkl conda-forge
[conda] liblapacke 3.9.0 14_linux64_mkl conda-forge
[conda] mkl 2022.0.1 h06a4308_117
[conda] numpy 1.22.3 pypi_0 pypi
[conda] pytorch-cuda 11.7 h778d358_3 pytorch-nightly
[conda] pytorch-pfn-extras 0.5.8 pypi_0 pypi
[conda] pytorch-triton 2.1.0+46672772b4 pypi_0 pypi
[conda] torch 2.1.0.dev20230403+cu117 pypi_0 pypi
[conda] torchdata 0.5.0a0+25c6180 dev_0 <develop>
[conda] torchvision 0.16.0a0+781f512 dev_0 <develop>
This adds some results after pytorch/vision#7488. @NicolasHug could you run this on the cluster as well and push the results so we have them available for later?