intelligent-machine-learning/glake

Building Pytorch release 2.1 + Glake failed

SolenoidWGT opened this issue · 5 comments

Very cool work, we really hope to use Glake in our LLM training. However, I failed when trying to compile glake on pytorch release 2.1. My system information and error message are as follows. Hope to get some help : )

Env & Sys info

Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (conda-forge gcc 13.1.0-0) 13.1.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17

Python version: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB

Nvidia driver version: 535.104.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    32
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 106
Model name:            Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz
Stepping:              6
CPU MHz:               3199.853
CPU max MHz:           3400.0000
CPU min MHz:           800.0000
BogoMIPS:              5200.00
Virtualization:        VT-x
L1d cache:             48K
L1i cache:             32K
L2 cache:              1280K
L3 cache:              49152K
NUMA node0 CPU(s):     0-31,64-95
NUMA node1 CPU(s):     32-63,96-127
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq spec_ctrl intel_stibp flush_l1d arch_capabilities

Failed message

FAILED: c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o 
ccache /mnt/petrelfs/share_data/llm_env/dep/gcc-10.2.0/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dc10_cuda_EXPORTS -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/aten/src -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/aten/src -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/benchmark/include -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/onnx -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/third_party/onnx -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/foxi -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/third_party/foxi -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/../.. -I/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/.. -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/build/third_party/gloo -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/gloo -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/googletest/googletest/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/protobuf/src -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/gemmlowp -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/neon2sse -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/XNNPACK/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/third_party/ittapi/include -isystem /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/cmake/../third_party/eigen -isystem /mnt/petrelfs/share_data/llm_env/dep/cuda-11.8/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -DC10_CUDA_BUILD_MAIN_LIB -fvisibility=hidden -DPYTORCH_C10_DRIVER_API_SUPPORTED -MD -MT c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -MF c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o.d -o c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -c /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp
In file included from /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:27:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h:40:12: warning: ‘gmlakeInfoLevel’ initialized and declared ‘extern’
   40 | extern int gmlakeInfoLevel = -1;
      |            ^~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h: In function ‘size_t getGranularitySize()’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h:157:20: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
  157 |     if(granularity == -1) {
      |        ~~~~~~~~~~~~^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/cuda_vmm_allocator.h:207:8: warning: ‘BlockSegment’ has a field ‘BlockSegment::block’ whose type uses the anonymous namespace [-Wsubobject-linkage]
  207 | struct BlockSegment
      |        ^~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘uint64_t c10::cuda::CUDACachingAllocator::Native::{anonymous}::EventIDCounter::next_id()’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:194:5: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  194 |     else
      |     ^~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:197:7: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’
  197 |       return current_event_id;
      |       ^~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:335:3: error: ‘History’ does not name a type
  335 |   History h;
      |   ^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In constructor ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::Block(int, cudaStream_t, size_t, c10::cuda::CUDACachingAllocator::Native::{anonymous}::BlockPool*, void*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:346:10: warning: ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::actual_size’ will be initialized after [-Wreorder]
  346 |   size_t actual_size;
      |          ^~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:345:10: warning:   ‘size_t c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::requested_size’ [-Wreorder]
  345 |   size_t requested_size; // memory originally requested
      |          ^~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:360:3: warning:   when initialized here [-Wreorder]
  360 |   Block(
      |   ^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:358:31: warning: ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::self_last_event’ will be initialized after [-Wreorder]
  358 |   std::shared_ptr<BlockEvent> self_last_event;
      |                               ^~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:348:9: warning:   ‘void* c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::ptr’ [-Wreorder]
  348 |   void* ptr{nullptr}; // memory address
      |         ^~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:360:3: warning:   when initialized here [-Wreorder]
  360 |   Block(
      |   ^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In constructor ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::Block(int, cudaStream_t, size_t)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:358:31: warning: ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::self_last_event’ will be initialized after [-Wreorder]
  358 |   std::shared_ptr<BlockEvent> self_last_event;
      |                               ^~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:345:10: warning:   ‘size_t c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block::requested_size’ [-Wreorder]
  345 |   size_t requested_size; // memory originally requested
      |          ^~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:377:3: warning:   when initialized here [-Wreorder]
  377 |   Block(int device, cudaStream_t stream, size_t size)
      |   ^~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In function ‘int c10::cuda::CUDACachingAllocator::Native::{anonymous}::trimHistoryBefore(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*, void*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:511:44: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
  511 |   while (block->history && block->history->h.addr < point) {
      |                                            ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3663:23: error: ‘Context’ was not declared in this scope; did you mean ‘CUcontext’?
 3663 |       std::shared_ptr<Context> context) {
      |                       ^~~~~~~
      |                       CUcontext
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3663:30: error: template argument 1 is invalid
 3663 |       std::shared_ptr<Context> context) {
      |                              ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block* c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(int, size_t, cudaStream_t)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1062:21: error: ‘Context’ was not declared in this scope; did you mean ‘CUcontext’?
 1062 |     std::shared_ptr<Context> context =
      |                     ^~~~~~~
      |                     CUcontext
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1062:28: error: template argument 1 is invalid
 1062 |     std::shared_ptr<Context> context =
      |                            ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1063:26: error: cannot convert ‘std::shared_ptr<c10::GatheredContext>’ to ‘int’ in initialization
 1063 |         context_recorder ? context_recorder() : nullptr;
      |         ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                          |
      |                          std::shared_ptr<c10::GatheredContext>
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1266:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 1266 |             for(int i=1; i<phy_block->mapped_blocks.size(); i++) {
      |                          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1490:11: error: ‘History’ was not declared in this scope
 1490 |           History{block->ptr, orig_size, std::move(context)},
      |           ^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1489:67: error: expected primary-expression before ‘{’ token
 1489 |       block->history = std::make_unique<HistoryChain>(HistoryChain{
      |                                                                   ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1500:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 1500 |           block->history->h.context);
      |                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::free(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1562:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 1562 |           block->history->h.real_size,
      |                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1564:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 1564 |           block->history->h.context);
      |                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::update_block(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1685:30: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 1685 |             for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
      |                            ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘std::vector<c10::cuda::CUDACachingAllocator::SegmentInfo> c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::snapshot()’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1868:22: error: ‘struct c10::cuda::CUDACachingAllocator::BlockInfo’ has no member named ‘history’
 1868 |           block_info.history.push_back(h->h);
      |                      ^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1868:43: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 1868 |           block_info.history.push_back(h->h);
      |                                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:1884:62: error: cannot convert ‘std::nullptr_t’ to ‘int’
 1884 |       record_trace(TraceEntry::SNAPSHOT, 0, total_active, 0, nullptr);
      |                                                              ^~~~~~~
      |                                                              |
      |                                                              std::nullptr_t
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3663:32: note:   initializing argument 5 of ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::record_trace(c10::cuda::CUDACachingAllocator::TraceEntry::Action, int64_t, size_t, cudaStream_t, int)’
 3663 |       std::shared_ptr<Context> context) {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::free_block(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*, bool)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2059:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 2059 |           block->history->h.real_size,
      |                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2061:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 2061 |           block->history->h.context);
      |                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘size_t c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::garbage_collect_fused_blocks(int, size_t)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2571:30: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 2571 |             for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
      |                            ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2597:73: warning: comparison of integer expressions of different signedness: ‘long int’ and ‘std::vector<std::shared_ptr<VirBlock> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 2597 |           if(block->vmm_segment->vir_blocks[0]->vir_dev_ptr.use_count() != block->vmm_segment->vir_blocks.size()) {
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2646:32: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 2646 |               for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
      |                              ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2674:75: warning: comparison of integer expressions of different signedness: ‘long int’ and ‘std::vector<std::shared_ptr<VirBlock> >::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 2674 |             if(block->vmm_segment->vir_blocks[0]->vir_dev_ptr.use_count() != block->vmm_segment->vir_blocks.size()) {
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘bool c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::get_fused_fragmented_blocks(c10::cuda::CUDACachingAllocator::Native::{anonymous}::AllocParams&, int)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:2814:19: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 2814 |         if (index == blocks2fuse.size() - 1 && (fuse_size - p.search_key.size) >= kGranularity) continue;
      |             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::release_block(c10::cuda::CUDACachingAllocator::Native::{anonymous}::Block*)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3439:34: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<BlockSegment>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
 3439 |                 for(int j = 0; j < phy_block->mapped_blocks.size(); j++) {
      |                                ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3499:27: error: ‘struct c10::cuda::CUDACachingAllocator::Native::{anonymous}::HistoryChain’ has no member named ‘h’
 3499 |           block->history->h.context);
      |                           ^
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: In member function ‘void c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::record_trace(c10::cuda::CUDACachingAllocator::TraceEntry::Action, int64_t, size_t, cudaStream_t, int)’:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3669:37: error: operands to ‘?:’ have different types ‘std::remove_reference<int&>::type’ {aka ‘int’} and ‘std::nullptr_t’
 3669 |         alloc_trace_record_context_ ? std::move(context) : nullptr);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp: At global scope:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3802:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::recordHistory(bool, c10::cuda::CUDACachingAllocator::CreateContextFn, size_t, bool)’ marked ‘override’, but does not override
 3802 |   void recordHistory(
      |        ^~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3925:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureBegin(int, c10::cuda::CaptureId_t, c10::cuda::MempoolId_t)’ marked ‘override’, but does not override
 3925 |   void notifyCaptureBegin(
      |        ^~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3934:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureAboutToEnd(int, c10::cuda::CaptureId_t)’ marked ‘override’, but does not override
 3934 |   void notifyCaptureAboutToEnd(int device, CaptureId_t graph_id) override {
      |        ^~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3939:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureEnded(int, c10::cuda::CaptureId_t)’ marked ‘override’, but does not override
 3939 |   void notifyCaptureEnded(int device, CaptureId_t graph_id) override {} // no-op
      |        ^~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3941:8: error: ‘void c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::notifyCaptureDestroy(int, c10::cuda::MempoolId_t)’ marked ‘override’, but does not override
 3941 |   void notifyCaptureDestroy(int device, MempoolId_t mempool_id) override {
      |        ^~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3967:8: error: ‘bool c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::needsPoolSpecificPeerAccess()’ marked ‘override’, but does not override
 3967 |   bool needsPoolSpecificPeerAccess() override {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:4031:24: error: cannot declare variable ‘c10::cuda::CUDACachingAllocator::Native::allocator’ to be of abstract type ‘c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator’
 4031 | NativeCachingAllocator allocator;
      |                        ^~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:3700:7: note:   because the following virtual functions are pure within ‘c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator’:
 3700 | class NativeCachingAllocator : public CUDAAllocator {
      |       ^~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.cpp:14:
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:218:16: note:     ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::beginAllocateStreamToPool(int, cudaStream_t, c10::cuda::MempoolId_t)’
  218 |   virtual void beginAllocateStreamToPool(
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:222:16: note:     ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::endAllocateStreamToPool(int, cudaStream_t)’
  222 |   virtual void endAllocateStreamToPool(int device, cudaStream_t stream) = 0;
      |                ^~~~~~~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:223:16: note:     ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::releasePool(int, c10::cuda::MempoolId_t)’
  223 |   virtual void releasePool(int device, MempoolId_t mempool_id) = 0;
      |                ^~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:243:16: note:     ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::recordHistory(bool, c10::cuda::CUDACachingAllocator::CreateContextFn, size_t, c10::cuda::CUDACachingAllocator::RecordContext)’
  243 |   virtual void recordHistory(
      |                ^~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:250:16: note:     ‘virtual void c10::cuda::CUDACachingAllocator::CUDAAllocator::enablePeerAccess(int, int)’
  250 |   virtual void enablePeerAccess(int dev, int dev_to_access) = 0;
      |                ^~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:263:23: note:     ‘virtual cudaError_t c10::cuda::CUDACachingAllocator::CUDAAllocator::memcpyAsync(void*, int, const void*, int, size_t, cudaStream_t, bool)’
  263 |   virtual cudaError_t memcpyAsync(
      |                       ^~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:271:43: note:     ‘virtual std::shared_ptr<c10::cuda::CUDACachingAllocator::AllocatorState> c10::cuda::CUDACachingAllocator::CUDAAllocator::getCheckpointState(int, c10::cuda::MempoolId_t)’
  271 |   virtual std::shared_ptr<AllocatorState> getCheckpointState(
      |                                           ^~~~~~~~~~~~~~~~~~
/mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/c10/cuda/CUDACachingAllocator.h:274:27: note:     ‘virtual c10::cuda::CUDACachingAllocator::CheckpointDelta c10::cuda::CUDACachingAllocator::CUDAAllocator::setCheckpointPoolState(int, std::shared_ptr<c10::cuda::CUDACachingAllocator::AllocatorState>)’
  274 |   virtual CheckpointDelta setCheckpointPoolState(
      |                           ^~~~~~~~~~~~~~~~~~~~~~
cc1plus: note: unrecognized command-line option ‘-Wno-aligned-allocation-unavailable’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-unused-private-field’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-invalid-partial-specialization’ may have been intended to silence earlier diagnostics
[4418/6619] Linking CXX executable bin/c10_string_view_test
[4419/6619] Building CXX object c10/test/CMakeFiles/c10_typeid_test.dir/util/typeid_test.cpp.o
[4420/6619] Linking C static library sleef/lib/libsleef.a
[4421/6619] Building CXX object c10/test/CMakeFiles/c10_ordered_preserving_dict_test.dir/util/ordered_preserving_dict_test.cpp.o
[4422/6619] Generating /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/Functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_3.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_3.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/RegisterLazy.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/Functions.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/variable_factories.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/VariableType.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyIr.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyNonNativeIr.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_3.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions_4.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_torch_functions_1.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_torch_functions_2.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_nn_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_fft_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_linalg_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_nested_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_sparse_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_special_functions.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_return_types.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_enum_tag.cpp, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/csrc/autograd/generated/python_functions.h, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/testing/_internal/generated/annotated_fn_args.py
[4423/6619] Generating /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/_C/__init__.pyi, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/_C/_VariableFunctions.pyi, /mnt/hwfile/share_data/wangguoteng.p/pytorch21/pytorch/torch/nn/functional.pyi
[4424/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_from_2_processes.dir/impl/CUDAAssertionsTest_from_2_processes.cu.o
[4425/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device.dir/impl/CUDAAssertionsTest_catches_thread_and_block_and_device.cu.o
[4426/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o
[4427/6619] Building CXX object c10/test/CMakeFiles/c10_optional_test.dir/util/optional_test.cpp.o
[4428/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_catches_stream.dir/impl/CUDAAssertionsTest_catches_stream.cu.o
[4429/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads.dir/impl/CUDAAssertionsTest_multiple_writes_from_blocks_and_threads.cu.o
[4430/6619] Building CXX object caffe2/CMakeFiles/vec_test_all_types_AVX512.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o
[4431/6619] Building CXX object caffe2/CMakeFiles/vec_test_all_types_DEFAULT.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o
[4432/6619] Building CXX object caffe2/CMakeFiles/vec_test_all_types_AVX2.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o
[4433/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block.dir/impl/CUDAAssertionsTest_multiple_writes_from_same_block.cu.o
[4434/6619] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks.dir/impl/CUDAAssertionsTest_multiple_writes_from_multiple_blocks.cu.o
[4435/6619] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/torch.pb.cc.o
[4436/6619] Building CXX object c10/test/CMakeFiles/c10_either_test.dir/util/either_test.cpp.o
[4437/6619] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/caffe2.pb.cc.o
[4438/6619] Building CXX object c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/util/intrusive_ptr_test.cpp.o
[4439/6619] Building CXX object c10/test/CMakeFiles/c10_small_vector_test.dir/util/small_vector_test.cpp.o
[4440/6619] Performing build step for 'nccl_external'

Thanks for your interest. The version of GMLake we released is compatible with PyTorch2.0. The offical version of PyTorch2.1 is released in October. We are adapting our work to the offical version of PyTorch2.1. We will release it as soon as possible. Thanks.

Looking forward to supporting torch2.1!

Any progress or roadmap?

We had post our roadmap, please check the url. #14