MemoryError on jetson TX2

Question

MemoryError on jetson TX2

Closed this issue 4 years ago · 7 comments

I am trying to convert model from mmdetection2tensorrt using the Dockerfile provided on TX2 machine but getting Memory error issues

mmdet2trt configs/retinanet_r50_fpn_2x_coco.py weights/retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth weights/model.trt --min-scale 1 3 800 600 --max-scale 1 3 800 600 --opt-scale 1 3 800 600

INFO:mmdet2trt:Model warmup
INFO:mmdet2trt:Converting model
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
INFO:mmdet2trt:Conversion took 80.97697949409485 s
INFO:mmdet2trt:Saving TRT model to: weights/model.trt
Killed

enviroment:

OS: Ubuntu 18.04 LTS
python_version: 3.6.9
pytorch_version: 1.6.0
cuda_version: 10.2
cudnn_version: [e.g. 8.0.2.39]
mmdetection_version: [e.g. 2.7.0]

we have made several changes to Dockerfile to be able to make it run jetson tx2 device.

FROM nvcr.io/nvidia/l4t-base:r32.4.4

### update apt and install libs
RUN apt-get update &&\
    apt-get install -y vim cmake libsm6 libxext6 libxrender-dev libgl1-mesa-glx git

### torch install 
RUN wget https://nvidia.box.com/shared/static/9eptse6jyly1ggt9axbja2yrmj6pbarc.whl -O torch-1.6.0-cp36-cp36m-linux_aarch64.whl &&\
    apt-get install -y python3-pip libopenblas-base libopenmpi-dev &&\
    pip3 install Cython &&\
    pip3 install numpy torch-1.6.0-cp36-cp36m-linux_aarch64.whl
### python
RUN pip3 install --upgrade pip

# ### install mmcv

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3-opencv

### scikit image
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y \
  && apt-get install -y --no-install-recommends apt-utils \
  && apt-get install -y \
    python3-dev libpython3-dev python-pil python3-tk python-imaging-tk \
    build-essential wget locales liblapack-dev

RUN sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales && \
    update-locale LANG=en_US.UTF-8
ENV LANG en_US.UTF-8



RUN wget -q -O /tmp/get-pip.py --no-check-certificate https://bootstrap.pypa.io/get-pip.py \
  && python3 /tmp/get-pip.py \
  && pip3 install -U pip
RUN pip3 install -U testresources setuptools

RUN pip3 install -U numpy
#####

RUN git clone https://github.com/open-mmlab/mmcv.git /root/space/mmcv &&\
    cd root/space/mmcv &&\
    MMCV_WITH_OPS=1 pip install -e .

### git mmdetection
RUN git clone --depth=1 https://github.com/open-mmlab/mmdetection.git /root/space/mmdetection

### install mmdetection
RUN cd /root/space/mmdetection &&\ 
    pip3 install -r requirements.txt &&\
    python3 setup.py develop

## install cmake - amirstan plugin below requires cmake version > 3.13
RUN cd /root/space/ &&\
    wget https://github.com/Kitware/CMake/releases/download/v3.19.1/cmake-3.19.1.tar.gz &&\
    tar -xf cmake-3.19.1.tar.gz &&\
    cd cmake-3.19.1 &&\
    apt-get install -y libssl-dev &&\
    ./configure &&\
    make &&\
    make install


### git amirstan plugin
RUN git clone --depth=1 https://github.com/grimoire/amirstan_plugin.git /root/space/amirstan_plugin &&\ 
    cd /root/space/amirstan_plugin &&\ 
    git submodule update --init --progress --depth=1

### install amirstan plugin
RUN cd /root/space/amirstan_plugin &&\ 
    mkdir build &&\
    cd build &&\
    cmake .. &&\
    make -j10 &&\
    echo "export AMIRSTAN_LIBRARY_PATH=/root/space/amirstan_plugin/build/lib" >> /root/.bashrc

### git torch2trt_dynamic
RUN git clone --depth=1 https://github.com/grimoire/torch2trt_dynamic.git /root/space/torch2trt_dynamic

### install torch2trt_dynamic
RUN cd /root/space/torch2trt_dynamic &&\
    python3 setup.py develop

### git mmdetection-to-tensorrt
RUN git clone --depth=1 https://github.com/grimoire/mmdetection-to-tensorrt.git /root/space/mmdetection-to-tensorrt

### install mmdetection-to-tensorrt
RUN cd /root/space/mmdetection-to-tensorrt &&\
    python3 setup.py develop

## setuptools for python3
RUN apt-get install -y python3-setuptools

### install torchvision
RUN  apt-get install -y libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev &&\
     git clone --branch v0.7.0 https://github.com/pytorch/vision torchvision &&\
     cd torchvision &&\
     export BUILD_VERSION=0.7.0 &&\  
     python3 setup.py install

WORKDIR /root/space

Answer 1 · 2020-12-17T11:40:54.000Z

Thanks for the report. I am trying to fix it.
Might take some days. Please be patient.

Answer 2 · 2020-12-18T04:57:42.000Z

Thanks @grimoire . I am able to successfully execute after commenting this line. Not sure if .engine file would suffice to deploy the model on deepstream. still testing it.

Answer 3 · 2020-12-24T04:42:37.000Z

@grimoire any update?

Answer 4 · 2020-12-24T07:57:12.000Z

I found model with hourglass(2 stack, such as cornernet) also have this problem. But still don't know the reason. Sorry.
There is a new PR about cpp example. Plan to test engine on it.

Answer 5 · 2020-12-25T11:18:03.000Z

Model saving failed on 2070s, but success on 2080ti. Might related to gpu memory size. But still don't know why.
Have you try convert without docker?

Answer 6 · 2020-12-30T08:53:13.000Z

conversion is successful on 1080Ti with 64GB CPU memory. Problem is coming only when we try to run the model on TX2 as shared memory (GPU+CPU) is only 8 GB. we resolved it by commenting torch.save and setting --save-engine true. we were able to run the app using deepstream too.
with the above changes we were able to run both using docker and without docker.

closing this issue for now.

Answer 7 · 2022-07-12T10:56:40.000Z

I have a similar issue, @prakashjayy where did you set the--save-engine truesetting?