deeplearning4j/deeplearning4j

Low GPU utilization when both CPU and CUDA modules loaded.

baedaron opened this issue · 1 comments

Low_GPU_Usage

LOG

.
.
.
[2023-05-06 21:45:59] [main] INFO Nd4jBackend - Loaded [JCublasBackend] backend
[2023-05-06 21:46:01] [main] INFO NativeOpsHolder - Number of threads used for linear algebra: 32
[2023-05-06 21:46:01] [main] INFO DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 11]
[2023-05-06 21:46:01] [main] INFO DefaultOpExecutioner - Cores: [12]; Memory: [1.0GB];
[2023-05-06 21:46:01] [main] INFO DefaultOpExecutioner - Blas vendor: [CUBLAS]
[2023-05-06 21:46:01] [main] INFO JCublasBackend - ND4J CUDA build version: 11.6.55
[2023-05-06 21:46:01] [main] INFO JCublasBackend - CUDA device 0: [NVIDIA GeForce RTX 3070]; cc: [8.6]; Total memory: [8589279232]
[2023-05-06 21:46:01] [main] INFO JCublasBackend - Backend build information:
MSVC: 192930146
STD version: 201402L
DEFAULT_ENGINE: samediff::ENGINE_CUDA
HAVE_FLATBUFFERS
HAVE_CUDNN
[2023-05-06 21:46:03] [main] INFO MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [DEVICE]
.
.
.

Issue Description

  • Low GPU utilization when both CPU and CUDA modules loaded.
  • 2 java processes have been launched with the same command line at the same time.
  • It seems CPU is utilized, though BACKEND_PRIORITY_GPU=10 and BACKEND_PRIORITY_CPU=1 were set in environment.
  • See Attached screenshot image and below information.

Version Information

  • CPU: AMD Ryzen 5 5600X 6-Core Processor
  • CPU RAM: 32G
  • SSD: 512G
  • GPU: GeForce RTX 3070
  • GPU RAM: 8G
  • OS: windows 11 pro - Korean
  • NVIDIA driver: 531.29-desktop-win10-win11-64bit-international-dch-whql.exe
  • CUDA : cuda_11.8.0_522.06_windows.exe
  • CUDNN: cudnn-windows-x86_64-8.9.0.131_cuda11-archive.zip
  • Deeplearning4j: 1.0.0-M2.1

Additional Information

  • NETWORK: LSTM
  • BATCH SIZE: 32768
  • COMMAND LINE:
    C:/X64_jdk-11.0.18/bin/java.exe -server -Xverify:none -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+AggressiveOpts -Dfile.encoding=UTF-8 -Djava.library.path=C:/Users/Administrator/Downloads/AlPreDo -Xmx1G -Dorg.bytedeco.javacpp.maxbytes=4G -Dorg.bytedeco.javacpp.maxphysicalbytes=5G -Dorg.deeplearning4j.ui.port=9001 org.alpredo.realtime.NetCreatorRtUpDetection48 true
  • ENVIRONMENT:
    BACKEND_PRIORITY_GPU=10
    BACKEND_PRIORITY_CPU=1
    JAVA_HOME=C:/X64_jdk-11.0.18
    CLASSPATH=C:/lib/classes;C:/lib/nd4j-native-1.0.0-M2.1.jar;C:/lib/nd4j-native-1.0.0-M2.1-windows-x86_64.jar;C:/lib/javacpp-1.5.7.jar;C:/lib/nd4j-api-1.0.0-M2.1.jar;C:/lib/byteunits-0.9.1.jar;C:/lib/commons-math3-3.5.jar;C:/lib/commons-lang3-3.11.jar;C:/lib/commons-collections4-4.1.jar;C:/lib/flatbuffers-java-1.12.0.jar;C:/lib/protobuf-1.0.0-M2.1.jar;C:/lib/oshi-core-3.4.2.jar;C:/lib/jna-platform-4.3.0.jar;C:/lib/jna-4.3.0.jar;C:/lib/threetenbp-1.3.3.jar;C:/lib/jackson-1.0.0-M2.1.jar;C:/lib/commons-net-3.1.jar;C:/lib/neoitertools-1.0.0.jar;C:/lib/nd4j-common-1.0.0-M2.1.jar;C:/lib/guava-1.0.0-M2.1.jar;C:/lib/commons-compress-1.21.jar;C:/lib/nd4j-native-api-1.0.0-M2.1.jar;C:/lib/nd4j-native-preset-1.0.0-M2.1.jar;C:/lib/nd4j-presets-common-1.0.0-M2.1.jar;C:/lib/nd4j-native-preset-1.0.0-M2.1-windows-x86_64.jar;C:/lib/openblas-0.3.19-1.5.7.jar;C:/lib/openblas-0.3.19-1.5.7-windows-x86_64.jar;C:/lib/javacpp-1.5.7-windows-x86_64.jar;C:/lib/nd4j-cuda-11.6-1.0.0-M2.1.jar;C:/lib/nd4j-cuda-11.6-preset-1.0.0-M2.1.jar;C:/lib/nd4j-cuda-11.6-preset-1.0.0-M2.1-windows-x86_64.jar;C:/lib/cuda-11.6-8.3-1.5.7.jar;C:/lib/cuda-11.6-8.3-1.5.7-windows-x86_64.jar;C:/lib/nd4j-cuda-11.6-1.0.0-M2.1-windows-x86_64.jar;C:/lib/nd4j-cuda-11.6-1.0.0-M2.1-windows-x86_64-cudnn.jar;C:/lib/deeplearning4j-ui-1.0.0-M2.1.jar;C:/lib/deeplearning4j-vertx-1.0.0-M2.1.jar;C:/lib/vertx-core-3.9.0.jar;C:/lib/netty-handler-4.1.48.Final.jar;C:/lib/netty-handler-proxy-4.1.48.Final.jar;C:/lib/netty-codec-http2-4.1.48.Final.jar;C:/lib/netty-resolver-4.1.48.Final.jar;C:/lib/netty-resolver-dns-4.1.48.Final.jar;C:/lib/netty-codec-dns-4.1.48.Final.jar;C:/lib/jackson-core-2.10.2.jar;C:/lib/jackson-databind-2.10.2.jar;C:/lib/jackson-annotations-2.10.2.jar;C:/lib/vertx-web-3.9.0.jar;C:/lib/vertx-web-common-3.9.0.jar;C:/lib/vertx-auth-common-3.9.0.jar;C:/lib/vertx-bridge-common-3.9.0.jar;C:/lib/deeplearning4j-core-1.0.0-M2.1.jar;C:/lib/javax.activation-1.2.0.jar;C:/lib/deeplearning4j-datasets-1.0.0-M2.1.jar;C:/lib/resources-1.0.0-M2.1.jar;C:/lib/deeplearning4j-datavec-iterators-1.0.0-M2.1.jar;C:/lib/deeplearning4j-modelimport-1.0.0-M2.1.jar;C:/lib/hdf5-platform-1.12.1-1.5.7.jar;C:/lib/javacpp-platform-1.5.7.jar;C:/lib/javacpp-1.5.7-android-arm.jar;C:/lib/javacpp-1.5.7-android-arm64.jar;C:/lib/javacpp-1.5.7-android-x86.jar;C:/lib/javacpp-1.5.7-android-x86_64.jar;C:/lib/javacpp-1.5.7-ios-arm64.jar;C:/lib/javacpp-1.5.7-ios-x86_64.jar;C:/lib/javacpp-1.5.7-linux-armhf.jar;C:/lib/javacpp-1.5.7-linux-arm64.jar;C:/lib/javacpp-1.5.7-linux-ppc64le.jar;C:/lib/javacpp-1.5.7-linux-x86.jar;C:/lib/javacpp-1.5.7-linux-x86_64.jar;C:/lib/javacpp-1.5.7-macosx-arm64.jar;C:/lib/javacpp-1.5.7-macosx-x86_64.jar;C:/lib/javacpp-1.5.7-windows-x86.jar;C:/lib/hdf5-1.12.1-1.5.7.jar;C:/lib/hdf5-1.12.1-1.5.7-linux-x86.jar;C:/lib/hdf5-1.12.1-1.5.7-linux-x86_64.jar;C:/lib/hdf5-1.12.1-1.5.7-linux-armhf.jar;C:/lib/hdf5-1.12.1-1.5.7-linux-arm64.jar;C:/lib/hdf5-1.12.1-1.5.7-linux-ppc64le.jar;C:/lib/hdf5-1.12.1-1.5.7-macosx-x86_64.jar;C:/lib/hdf5-1.12.1-1.5.7-windows-x86.jar;C:/lib/hdf5-1.12.1-1.5.7-windows-x86_64.jar;C:/lib/deeplearning4j-nn-1.0.0-M2.1.jar;C:/lib/deeplearning4j-utility-iterators-1.0.0-M2.1.jar;C:/lib/fastutil-6.5.7.jar;C:/lib/datavec-api-1.0.0-M2.1.jar;C:/lib/joda-time-2.2.jar;C:/lib/stream-2.9.8.jar;C:/lib/opencsv-2.3.jar;C:/lib/t-digest-3.2.jar;C:/lib/datavec-data-image-1.0.0-M2.1.jar;C:/lib/jai-imageio-core-1.3.0.jar;C:/lib/imageio-jpeg-3.1.1.jar;C:/lib/imageio-core-3.1.1.jar;C:/lib/imageio-metadata-3.1.1.jar;C:/lib/common-lang-3.1.1.jar;C:/lib/common-io-3.1.1.jar;C:/lib/common-image-3.1.1.jar;C:/lib/imageio-tiff-3.1.1.jar;C:/lib/imageio-psd-3.1.1.jar;C:/lib/imageio-bmp-3.1.1.jar;C:/lib/javacv-1.5.7.jar;C:/lib/opencv-4.5.5-1.5.7.jar;C:/lib/ffmpeg-5.0-1.5.7.jar;C:/lib/flycapture-2.13.3.31-1.5.7.jar;C:/lib/libdc1394-2.2.6-1.5.7.jar;C:/lib/libfreenect-0.5.7-1.5.7.jar;C:/lib/libfreenect2-0.2.0-1.5.7.jar;C:/lib/librealsense-1.12.4-1.5.7.jar;C:/lib/librealsense2-2.50.0-1.5.7.jar;C:/lib/videoinput-0.200-1.5.7.jar;C:/lib/artoolkitplus-2.3.1-1.5.7.jar;C:/lib/flandmark-1.07-1.5.7.jar;C:/lib/leptonica-1.82.0-1.5.7.jar;C:/lib/tesseract-5.0.1-1.5.7.jar;C:/lib/opencv-platform-4.5.5-1.5.7.jar;C:/lib/openblas-platform-0.3.19-1.5.7.jar;C:/lib/openblas-0.3.19-1.5.7-android-arm.jar;C:/lib/openblas-0.3.19-1.5.7-android-arm64.jar;C:/lib/openblas-0.3.19-1.5.7-android-x86.jar;C:/lib/openblas-0.3.19-1.5.7-android-x86_64.jar;C:/lib/openblas-0.3.19-1.5.7-ios-arm64.jar;C:/lib/openblas-0.3.19-1.5.7-ios-x86_64.jar;C:/lib/openblas-0.3.19-1.5.7-linux-x86.jar;C:/lib/openblas-0.3.19-1.5.7-linux-x86_64.jar;C:/lib/openblas-0.3.19-1.5.7-linux-armhf.jar;C:/lib/openblas-0.3.19-1.5.7-linux-arm64.jar;C:/lib/openblas-0.3.19-1.5.7-linux-ppc64le.jar;C:/lib/openblas-0.3.19-1.5.7-macosx-arm64.jar;C:/lib/openblas-0.3.19-1.5.7-macosx-x86_64.jar;C:/lib/openblas-0.3.19-1.5.7-windows-x86.jar;C:/lib/opencv-4.5.5-1.5.7-android-arm.jar;C:/lib/opencv-4.5.5-1.5.7-android-arm64.jar;C:/lib/opencv-4.5.5-1.5.7-android-x86.jar;C:/lib/opencv-4.5.5-1.5.7-android-x86_64.jar;C:/lib/opencv-4.5.5-1.5.7-ios-arm64.jar;C:/lib/opencv-4.5.5-1.5.7-ios-x86_64.jar;C:/lib/opencv-4.5.5-1.5.7-linux-x86.jar;C:/lib/opencv-4.5.5-1.5.7-linux-x86_64.jar;C:/lib/opencv-4.5.5-1.5.7-linux-armhf.jar;C:/lib/opencv-4.5.5-1.5.7-linux-arm64.jar;C:/lib/opencv-4.5.5-1.5.7-linux-ppc64le.jar;C:/lib/opencv-4.5.5-1.5.7-macosx-arm64.jar;C:/lib/opencv-4.5.5-1.5.7-macosx-x86_64.jar;C:/lib/opencv-4.5.5-1.5.7-windows-x86.jar;C:/lib/opencv-4.5.5-1.5.7-windows-x86_64.jar;C:/lib/leptonica-platform-1.82.0-1.5.7.jar;C:/lib/leptonica-1.82.0-1.5.7-android-arm.jar;C:/lib/leptonica-1.82.0-1.5.7-android-arm64.jar;C:/lib/leptonica-1.82.0-1.5.7-android-x86.jar;C:/lib/leptonica-1.82.0-1.5.7-android-x86_64.jar;C:/lib/leptonica-1.82.0-1.5.7-linux-x86.jar;C:/lib/leptonica-1.82.0-1.5.7-linux-x86_64.jar;C:/lib/leptonica-1.82.0-1.5.7-linux-armhf.jar;C:/lib/leptonica-1.82.0-1.5.7-linux-arm64.jar;C:/lib/leptonica-1.82.0-1.5.7-linux-ppc64le.jar;C:/lib/leptonica-1.82.0-1.5.7-macosx-x86_64.jar;C:/lib/leptonica-1.82.0-1.5.7-windows-x86.jar;C:/lib/leptonica-1.82.0-1.5.7-windows-x86_64.jar;C:/lib/ffmpeg-platform-5.0-1.5.7.jar;C:/lib/ffmpeg-5.0-1.5.7-android-arm.jar;C:/lib/ffmpeg-5.0-1.5.7-android-arm64.jar;C:/lib/ffmpeg-5.0-1.5.7-android-x86.jar;C:/lib/ffmpeg-5.0-1.5.7-android-x86_64.jar;C:/lib/ffmpeg-5.0-1.5.7-linux-x86.jar;C:/lib/ffmpeg-5.0-1.5.7-linux-x86_64.jar;C:/lib/ffmpeg-5.0-1.5.7-linux-armhf.jar;C:/lib/ffmpeg-5.0-1.5.7-linux-arm64.jar;C:/lib/ffmpeg-5.0-1.5.7-linux-ppc64le.jar;C:/lib/ffmpeg-5.0-1.5.7-macosx-arm64.jar;C:/lib/ffmpeg-5.0-1.5.7-macosx-x86_64.jar;C:/lib/ffmpeg-5.0-1.5.7-windows-x86.jar;C:/lib/ffmpeg-5.0-1.5.7-windows-x86_64.jar;C:/lib/deeplearning4j-ui-components-1.0.0-M2.1.jar;C:/lib/oshi-json-3.4.2.jar;C:/lib/javax.json-1.0.4.jar;C:/lib/freemarker-2.3.23.jar;C:/lib/jcommander-1.27.jar;C:/lib/jakarta.xml.bind-api-2.3.2.jar;C:/lib/jakarta.activation-api-1.2.1.jar;C:/lib/babel__polyfill-7.4.4.jar;C:/lib/core-js-3.0.0-beta.9.jar;C:/lib/regenerator-runtime-0.13.11.jar;C:/lib/coreui__coreui-2.1.9.jar;C:/lib/coreui__icons-0.3.0.jar;C:/lib/jquery-3.4.1.jar;C:/lib/popper.js-1.12.9.jar;C:/lib/bootstrap-4.3.1.jar;C:/lib/jquery-2.2.0.jar;C:/lib/jquery-migrate-1.2.1.jar;C:/lib/jquery-ui-1.10.2.jar;C:/lib/modernizr-2.8.3-1.jar;C:/lib/jquery-cookie-1.4.1-1.jar;C:/lib/fullcalendar-1.6.4.jar;C:/lib/excanvas-3.jar;C:/lib/cytoscape-3.3.3.jar;C:/lib/heap-0.2.7.jar;C:/lib/lodash.debounce-4.0.8.jar;C:/lib/cytoscape-dagre-2.1.0.jar;C:/lib/cytoscape-3.2.5.jar;C:/lib/dagre-0.7.4.jar;C:/lib/graphlib-1.0.7.jar;C:/lib/lodash-3.10.1-amd.jar;C:/lib/dagre-0.8.4.jar;C:/lib/graphlib-2.1.8.jar;C:/lib/lodash-4.17.21.jar;C:/lib/cytoscape-cola-2.3.0.jar;C:/lib/webcola-3.3.8.jar;C:/lib/d3-dispatch-2.0.0-rc.1.jar;C:/lib/d3-drag-2.0.0-rc.1.jar;C:/lib/d3-selection-3.0.0.jar;C:/lib/d3-timer-2.0.0-rc.1.jar;C:/lib/cytoscape-cose-bilkent-4.0.0.jar;C:/lib/cytoscape-euler-1.2.1.jar;C:/lib/cytoscape-klay-3.1.2.jar;C:/lib/klayjs-0.4.1.jar;C:/lib/cytoscape-spread-3.0.0.jar;C:/lib/weaverjs-1.2.0.jar;C:/lib/retinajs-0.0.2.jar;C:/lib/flot-0.8.3.jar;C:/lib/explorercanvas-r3-1.jar;C:/lib/chosen-0.9.8.jar;C:/lib/uniform-2.1.2-1.jar;C:/lib/noty-2.2.2.jar;C:/lib/jquery-raty-2.5.2.jar;C:/lib/imagesloaded-2.1.1.jar;C:/lib/masonry-3.1.5.jar;C:/lib/jquery.sparkline-2.1.2.jar;C:/lib/jquery-knob-1.2.2.jar;C:/lib/datatables-1.9.4.jar;C:/lib/jquery-ui-touch-punch-0.2.2.jar;C:/lib/d3js-3.3.5.jar;C:/lib/bootstrap-notify-3.1.3-1.jar;C:/lib/github-com-jboesch-Gritter-1.7.4.jar;C:/lib/open-sans-0.1.3.jar;C:/lib/font-awesome-3.0.2.jar;C:/lib/bootstrap-2.2.2-1.jar;C:/lib/bootstrap-glyphicons-bdd2cbfba0.jar;C:/lib/flatbuffers-1.9.0.jar;C:/lib/commons-io-2.7.jar;C:/lib/deeplearning4j-nlp-1.0.0-M2.1.jar;C:/lib/commons-lang-2.6.jar;C:/lib/threadly-4.10.0.jar;C:/lib/jfasttext-0.4.jar;C:/lib/deeplearning4j-ui-model-1.0.0-M2.1.jar;C:/lib/mapdb-3.0.5.jar;C:/lib/kotlin-stdlib-1.0.7.jar;C:/lib/kotlin-runtime-1.0.7.jar;C:/lib/eclipse-collections-api-7.1.2.jar;C:/lib/jcip-annotations-1.0.jar;C:/lib/eclipse-collections-7.1.2.jar;C:/lib/eclipse-collections-forkjoin-7.1.2.jar;C:/lib/guava-19.0.jar;C:/lib/lz4-1.3.0.jar;C:/lib/elsa-3.0.0-M5.jar;C:/lib/sqlite-jdbc-3.15.1.jar;C:/lib/aeron-all-1.39.0.jar;C:/lib/jfreechart-1.5.3.jar;C:/lib/lombok-1.18.24.jar;C:/lib/mariadb-java-client-2.7.4.jar;C:/lib/jsoup-1.14.3.jar;C:/lib/gson-2.8.9.jar;C:/lib/reflections-0.10.2.jar;C:/lib/javassist-3.28.0-GA.jar;C:/lib/jsr305-3.0.2.jar;C:/lib/slf4j-api-1.7.32.jar;C:/lib/jsch-0.1.55.jar;C:/lib/slf4j-simple-1.7.32.jar;C:/lib/httpclient-4.5.13.jar;C:/lib/httpcore-4.4.13.jar;C:/lib/commons-logging-1.2.jar;C:/lib/commons-codec-1.11.jar;C:/lib/playwright-1.26.0.jar;C:/lib/opentest4j-1.2.0.jar;C:/lib/driver-1.26.0.jar;C:/lib/driver-bundle-1.26.0.jar;C:/lib/selenium-edge-driver-4.4.0.jar;C:/lib/auto-service-annotations-1.0.1.jar;C:/lib/auto-service-1.0.1.jar;C:/lib/auto-common-1.2.jar;C:/lib/selenium-api-4.4.0.jar;C:/lib/selenium-chromium-driver-4.4.0.jar;C:/lib/selenium-json-4.4.0.jar;C:/lib/selenium-remote-driver-4.4.0.jar;C:/lib/netty-buffer-4.1.78.Final.jar;C:/lib/netty-codec-http-4.1.78.Final.jar;C:/lib/netty-codec-4.1.78.Final.jar;C:/lib/netty-common-4.1.78.Final.jar;C:/lib/netty-transport-classes-epoll-4.1.78.Final.jar;C:/lib/netty-transport-classes-kqueue-4.1.78.Final.jar;C:/lib/netty-transport-native-epoll-4.1.78.Final.jar;C:/lib/netty-transport-native-kqueue-4.1.78.Final.jar;C:/lib/netty-transport-native-unix-common-4.1.78.Final.jar;C:/lib/netty-transport-4.1.78.Final.jar;C:/lib/opentelemetry-api-1.16.0.jar;C:/lib/opentelemetry-context-1.16.0.jar;C:/lib/opentelemetry-exporter-logging-1.16.0.jar;C:/lib/opentelemetry-sdk-metrics-1.16.0.jar;C:/lib/opentelemetry-sdk-logs-1.16.0-alpha.jar;C:/lib/opentelemetry-sdk-common-1.16.0.jar;C:/lib/opentelemetry-sdk-extension-autoconfigure-spi-1.16.0.jar;C:/lib/opentelemetry-sdk-extension-autoconfigure-1.16.0-alpha.jar;C:/lib/opentelemetry-sdk-trace-1.16.0.jar;C:/lib/opentelemetry-sdk-1.16.0.jar;C:/lib/opentelemetry-semconv-1.16.0-alpha.jar;C:/lib/jtoml-2.0.0.jar;C:/lib/byte-buddy-1.12.10.jar;C:/lib/commons-exec-1.3.jar;C:/lib/async-http-client-2.12.3.jar;C:/lib/async-http-client-netty-utils-2.12.3.jar;C:/lib/netty-codec-socks-4.1.60.Final.jar;C:/lib/netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar;C:/lib/netty-transport-native-kqueue-4.1.60.Final-osx-x86_64.jar;C:/lib/reactive-streams-1.0.3.jar;C:/lib/netty-reactive-streams-2.0.4.jar;C:/lib/jakarta.activation-1.2.2.jar;C:/lib/selenium-http-4.4.0.jar;C:/lib/failsafe-3.2.4.jar

@baedaron do you have a reproducer for me? This doesn't sound right. From the looks of your logs it seems to be running GPU backend. I would need to see the behavior myself to see if that's even the root cause or not.

Please consider looking in a profiler yourself to see what else could be contributing. Your issue with GPU utilization is a pretty common one. Most of the time it turns out people set their batch size too low or they are bottlenecked on data transformations. I'm going to close this by default and ask you to post on our forums: https://community.konduit.ai/ for further support. I don't think this is a bug. Thanks!