facebookresearch/HolisticTraceAnalysis

Dictionary lookups fail when runtime kernels are absent in the trace

anupambhatnagar opened this issue ยท 2 comments

๐Ÿ› Describe the bug

The lookup fails if kernels are not there in the trace.

cudaLaunchKernel_id = sym_index["cudaLaunchKernel"]
cudaMemcpyAsync_id = sym_index["cudaMemcpyAsync"]
cudaMemsetAsync_id = sym_index["cudaMemsetAsync"]

cudaLaunchKernel_id = sym_index["cudaLaunchKernel"]
cudaMemcpyAsync_id = sym_index["cudaMemcpyAsync"]
cudaMemsetAsync_id = sym_index["cudaMemsetAsync"]

Steps to reproduce

Use a trace without the cudaMemsetAsync kernel.

Expected behavior

the lookups should work in all cases. the fix is to use .get with a default response of None.

Environment

fails on both mac and linux with HTA 0.1.2 and python >= 3.8

Additional Info

No response

+1 this issue, not sure if the profiler has to be configured a certain way.

For reference I am trying to analyze a trace for inference with no ranks. I manually added:
"distributedInfo": {"rank": 0}, to the json trace. Would be nice to have a mode that enables single file analysis.

I found that I can't run
get_queue_length_time_series, get_queue_length_summary,

Also installing from pip: get_cuda_launch_kernel_info ->

AttributeError: 'TraceAnalysis' object has no attribute 'get_cuda_launch_kernel_info'

Hi @drisspg, thanks for the feedback.

  1. There is already a mode which allows the user to specify a single file. See the trace_files option in the API. You will need to pass a dictionary to the trace_files argument whose key is the rank and value is the full path to the trace file.

  2. The README will be updated soon with the corrected version. Please replace with get_cuda_kernel_launch_stats instead.

  3. With respect to get_queue_length_* please check that the trace file has rank 0. If it has a different rank, then use the ranks argument to pass value. If the error still persists please open a bug issue and provide us the trace file, if possible.

Hope this helps!