Building Pytorch release 2.1 + Glake failed
SolenoidWGT opened this issue · 5 comments
SolenoidWGT commented
Very cool work, we really hope to use Glake in our LLM training. However, I failed when trying to compile glake on pytorch release 2.1. My system information and error message are as follows. Hope to get some help : )
Env & Sys info
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (conda-forge gcc 13.1.0-0) 13.1.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB
Nvidia driver version: 535.104.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz
Stepping: 6
CPU MHz: 3199.853
CPU max MHz: 3400.0000
CPU min MHz: 800.0000
BogoMIPS: 5200.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq spec_ctrl intel_stibp flush_l1d arch_capabilities
Failed message
FAILED: c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o
ccache /mnt/petrelfs/share_data/llm_env/dep/gcc-10.2.0/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dc10_cuda_EXPORTS -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/aten/src -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/aten/src -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/benchmark/include -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/onnx -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/third_party/onnx -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/foxi -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/third_party/foxi -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/../.. -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/.. -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/third_party/gloo -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/gloo -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/googletest/googletest/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/protobuf/src -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/gemmlowp -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/neon2sse -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/XNNPACK/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/ittapi/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/eigen -isystem /mnt/petrelfs/share_data/llm_env/dep/cuda-11.8/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -DC10_CUDA_BUILD_MAIN_LIB -fvisibility=hidden -DPYTORCH_C10_DRIVER_API_SUPPORTED -MD -MT c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -MF c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o.d -o c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -c /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp
In file included from /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:27:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h:40:12: warning: ‘gmlakeInfoLevel’ initialized and declared ‘extern’
40 | extern int gmlakeInfoLevel = -1;
| ^~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h: In function ‘size_t getGranularitySize()’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h:157:20: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
157 | if(granularity == -1) {
| ~~~~~~~~~~~~^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h:207:8: warning: ‘BlockSegment’ has a field ‘BlockSegment::block’ whose type uses the anonymous namespace [-Wsubobject-linkage]
207 | struct BlockSegment
| ^~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘uint64_t c10::cuda::CUDACachingAllocator::Native::{anonymous}::EventIDCounter::next_id()’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:194:5: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
194 | else
| ^~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:197:7: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’
197 | return current_event_id;
| ^~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:335:3: error: ‘History’ does not name a type
335 | History h;
| ^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In constructor ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::Block(int, cudaStream_t, size_t, c10::cuda::CUDACachingAllocator::Native::{anonymous}::BlockPool*, void*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:346:10: warning: ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::actual_size’ will be initialized after [-Wreorder]
346 | size_t actual_size;
| ^~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:345:10: warning: ‘size_t c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::requested_size’ [-Wreorder]
345 | size_t requested_size; // memory originally requested
| ^~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:360:3: warning: when initialized here [-Wreorder]
360 | Block(
| ^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:358:31: warning: ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::self_last_event’ will be initialized after [-Wreorder]
358 | std::shared_ptr<BlockEvent> self_last_event;
| ^~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:348:9: warning: ‘void* c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::ptr’ [-Wreorder]
348 | void* ptr{nullptr}; // memory address
| ^~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:360:3: warning: when initialized here [-Wreorder]
360 | Block(
| ^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In constructor ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::Block(int, cudaStream_t, size_t)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:358:31: warning: ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::self_last_event’ will be initialized after [-Wreorder]
358 | std::shared_ptr<BlockEvent> self_last_event;
| ^~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:345:10: warning: ‘size_t c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::requested_size’ [-Wreorder]
345 | size_t requested_size; // memory originally requested
| ^~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:377:3: warning: when initialized here [-Wreorder]
377 | Block(int device, cudaStream_t stream, size_t size)
| ^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In function ‘int c10::cuda::CUDACachingAllocator::Native::{anonymous}::trimHistoryBefore(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*, void*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:511:44: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
511 | while (block->history && block->history->h.addr < point) {
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3663:23: error: ‘Context’ was not declared in this scope; did you mean ‘CUcontext’?
3663 | std::shared_ptr<Context> context) {
| ^~~~~~~
| CUcontext
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3663:30: error: template argument 1 is invalid
3663 | std::shared_ptr<Context> context) {
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block* c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(int, size_t, cudaStream_t)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1062:21: error: ‘Context’ was not declared in this scope; did you mean ‘CUcontext’?
1062 | std::shared_ptr<Context> context =
| ^~~~~~~
| CUcontext
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1062:28: error: template argument 1 is invalid
1062 | std::shared_ptr<Context> context =
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1063:26: error: cannot convert ‘std::shared_ptr<c10::GatheredContext>’ to ‘int’ in initialization
1063 | context_recorder ? context_recorder() : nullptr;
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| std::shared_ptr<c10::GatheredContext>
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1266:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
1266 | for(int i=1; i<phy_block->mapped_blocks.size(); i++) {
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1490:11: error: ‘History’ was not declared in this scope
1490 | History{block->ptr, orig_size, std::move(context)},
| ^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1489:67: error: expected primary-expression before ‘{’ token
1489 | block->history = std::make_unique<HistoryChain>(HistoryChain{
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1500:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
1500 | block->history->h.context);
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::free(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1562:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
1562 | block->history->h.real_size,
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1564:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
1564 | block->history->h.context);
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::update_block(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1685:30: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
1685 | for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘std::vector<c10::cuda::CUDACachingAllocator::SegmentInfo> c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::snapshot()’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1868:22: error: ‘struct c10::cuda::CUDACachingAllocator::BlockInfo’ has no member named ‘history’
1868 | block_info.history.push_back(h->h);
| ^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1868:43: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
1868 | block_info.history.push_back(h->h);
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1884:62: error: cannot convert ‘std::nullptr_t’ to ‘int’
1884 | record_trace(TraceEntry::SNAPSHOT, 0, total_active, 0, nullptr);
| ^~~~~~~
| |
| std::nullptr_t
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3663:32: note: initializing argument 5 of ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::record_trace(c10::cuda::CUDACachingAllocator::TraceEntry::Action, int64_t, size_t, cudaStream_t, int)’
3663 | std::shared_ptr<Context> context) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::free_block(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*, bool)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2059:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
2059 | block->history->h.real_size,
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2061:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
2061 | block->history->h.context);
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘size_t c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::garbage_collect_fused_blocks(int, size_t)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2571:30: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
2571 | for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2597:73: warning: comparison of integer expressions of different signedness: ‘long int’ and ‘std::vector<std::shared_ptr<VirBlock> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
2597 | if(block->vmm_segment->vir_blocks[0]->vir_dev_ptr.use_count() != block->vmm_segment->vir_blocks.size()) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2646:32: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
2646 | for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2674:75: warning: comparison of integer expressions of different signedness: ‘long int’ and ‘std::vector<std::shared_ptr<VirBlock> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
2674 | if(block->vmm_segment->vir_blocks[0]->vir_dev_ptr.use_count() != block->vmm_segment->vir_blocks.size()) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘bool c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::get_fused_fragmented_blocks(c10::cuda::CUDACachingAllocator::Native::{anonymous}::AllocParams&, int)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2814:19: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
2814 | if (index == blocks2fuse.size() - 1 && (fuse_size - p.search_key.size) >= kGranularity) continue;
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::release_block(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3439:34: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
3439 | for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
| ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3499:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
3499 | block->history->h.context);
| ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::record_trace(c10::cuda::CUDACachingAllocator::TraceEntry::Action, int64_t, size_t, cudaStream_t, int)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3669:37: error: operands to ‘?:’ have different types ‘std::remove_reference<int&>::type’ {aka ‘int’} and ‘std::nullptr_t’
3669 | alloc_trace_record_context_ ? std::move(context) : nullptr);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3802:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::recordHistory(bool, c10::cuda::CUDACachingAllocator::CreateContextFn, size_t, bool)’ marked ‘override’, but does not override
3802 | void recordHistory(
| ^~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3925:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureBegin(int, c10::cuda::CaptureId_t, c10::cuda::MempoolId_t)’ marked ‘override’, but does not override
3925 | void notifyCaptureBegin(
| ^~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3934:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureAboutToEnd(int, c10::cuda::CaptureId_t)’ marked ‘override’, but does not override
3934 | void notifyCaptureAboutToEnd(int device, CaptureId_t graph_id) override {
| ^~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3939:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureEnded(int, c10::cuda::CaptureId_t)’ marked ‘override’, but does not override
3939 | void notifyCaptureEnded(int device, CaptureId_t graph_id) override {} // no-op
| ^~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3941:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureDestroy(int, c10::cuda::MempoolId_t)’ marked ‘override’, but does not override
3941 | void notifyCaptureDestroy(int device, MempoolId_t mempool_id) override {
| ^~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3967:8: error: ‘bool c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::needsPoolSpecificPeerAccess()’ marked ‘override’, but does not override
3967 | bool needsPoolSpecificPeerAccess() override {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:4031:24: error: cannot declare variable ‘c10::cuda::CUDACachingAllocator::Native::allocator’ to be of abstract type ‘c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator’
4031 | NativeCachingAllocator allocator;
| ^~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3700:7: note: because the following virtual functions are pure within ‘c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator’:
3700 | class NativeCachingAllocator : public CUDAAllocator {
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:14:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:218:16: note: ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::beginAllocateStreamToPool(int, cudaStream_t, c10::cuda::MempoolId_t)’
218 | virtual void beginAllocateStreamToPool(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:222:16: note: ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::endAllocateStreamToPool(int, cudaStream_t)’
222 | virtual void endAllocateStreamToPool(int device, cudaStream_t stream) = 0;
| ^~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:223:16: note: ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::releasePool(int, c10::cuda::MempoolId_t)’
223 | virtual void releasePool(int device, MempoolId_t mempool_id) = 0;
| ^~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:243:16: note: ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::recordHistory(bool, c10::cuda::CUDACachingAllocator::CreateContextFn, size_t, c10::cuda::CUDACachingAllocator::RecordContext)’
243 | virtual void recordHistory(
| ^~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:250:16: note: ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::enablePeerAccess(int, int)’
250 | virtual void enablePeerAccess(int dev, int dev_to_access) = 0;
| ^~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:263:23: note: ‘virtual cudaError_t c10::cuda::CUDACachingAllocator::CUDAAllocator::memcpyAsync(void*, int, const void*, int, size_t, cudaStream_t, bool)’
263 | virtual cudaError_t memcpyAsync(
| ^~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:271:43: note: ‘virtual std::shared_ptr<c10::cuda::CUDACachingAllocator::AllocatorState> c10::cuda::CUDACachingAllocator::CUDAAllocator::getCheckpointState(int, c10::cuda::MempoolId_t)’
271 | virtual std::shared_ptr<AllocatorState> getCheckpointState(
| ^~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:274:27: note: ‘virtual c10::cuda::CUDACachingAllocator::CheckpointDelta c10::cuda::CUDACachingAllocator::CUDAAllocator::setCheckpointPoolState(int, std::shared_ptr<c10::cuda::CUDACachingAllocator::AllocatorState>)’
274 | virtual CheckpointDelta setCheckpointPoolState(
| ^~~~~~~~~~~~~~~~~~~~~~
cc1plus: note: unrecognized command-line option ‘-Wno-aligned-allocation-unavailable’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-unused-private-field’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-invalid-partial-specialization’ may have been intended to silence earlier diagnostics
[4418/6619] Linking CXX executable bin/c10_string_view_test
[4419/6619] Building CXX object c10/test/CMakeFiles/c10_typeid_test.dir/util/typeid_test.cpp.o
[4420/6619] Linking C static library sleef/lib/libsleef.a
[4421/6619] Building CXX object c10/test/CMakeFiles/c10_ordered_preserving_dict_test.dir/util/ordered_preserving_dict_test.cpp.o
[4422/6619] Generating /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/Functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_3.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_3.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/RegisterLazy.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/Functions.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/variable_factories.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyIr.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyNonNativeIr.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_3.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_4.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_torch_functions_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_torch_functions_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_nn_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_fft_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_linalg_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_nested_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_sparse_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_special_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_return_types.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_enum_tag.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/testing/_internal/generated/annotated_fn_args.py
[4423/6619] Generating /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/_C/__init__.pyi, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/_C/_VariableFunctions.pyi, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/nn/functional.pyi
[4424/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_from_2_processes.dir/impl/CUDAAssertionsTest_from_2_processes.cu.o
[4425/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device.dir/impl/CUDAAssertionsTest_catches_thread_and_block_and_device.cu.o
[4426/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o
[4427/6619] Building CXX object c10/test/CMakeFiles/c10_optional_test.dir/util/optional_test.cpp.o
[4428/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_catches_stream.dir/impl/CUDAAssertionsTest_catches_stream.cu.o
[4429/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads.dir/impl/CUDAAssertionsTest_multiple_writes_from_blocks_and_threads.cu.o
[4430/6619] Building CXX object caffe2/CMakeFiles/vec_test_all_types_AVX512.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o
[4431/6619] Building CXX object caffe2/CMakeFiles/vec_test_all_types_DEFAULT.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o
[4432/6619] Building CXX object caffe2/CMakeFiles/vec_test_all_types_AVX2.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o
[4433/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block.dir/impl/CUDAAssertionsTest_multiple_writes_from_same_block.cu.o
[4434/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks.dir/impl/CUDAAssertionsTest_multiple_writes_from_multiple_blocks.cu.o
[4435/6619] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/torch.pb.cc.o
[4436/6619] Building CXX object c10/test/CMakeFiles/c10_either_test.dir/util/either_test.cpp.o
[4437/6619] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/caffe2.pb.cc.o
[4438/6619] Building CXX object c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/util/intrusive_ptr_test.cpp.o
[4439/6619] Building CXX object c10/test/CMakeFiles/c10_small_vector_test.dir/util/small_vector_test.cpp.o
[4440/6619] Performing build step for 'nccl_external'
ruizhang1230 commented
Thanks for your interest. The version of GMLake we released is compatible with PyTorch2.0. The offical version of PyTorch2.1 is released in October. We are adapting our work to the offical version of PyTorch2.1. We will release it as soon as possible. Thanks.
SolenoidWGT commented
Looking forward to supporting torch2.1!
johnzlli commented
Any progress or roadmap?
ruizhang1230 commented
We had post our roadmap, please check the url. #14