scanner-research/scanner

Walkthrough fails on CPU Docker instance

Closed this issue · 7 comments

Background

I'm attempting to run the walkthrough on a Docker CPU instance. I haven't made any changes or updates to the image. I've tried providing the input file as a positional argument, and received the same result.

Steps to reproduce:

  1. wget https://raw.githubusercontent.com/scanner-research/scanner/master/docker/docker-compose.yml
  2. docker-compose run --service-ports cpu /bin/bash
  3. cd /opt/scanner/examples/apps/walkthroughs
  4. wget https://storage.googleapis.com/scanner-data/public/sample-clip.mp4
  5. python3 grayscale_conversion.py

The error

root@abe0ef57dfa3:/opt/scanner/examples/apps/walkthroughs# python3 grayscale_conversion.py 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '(null)':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.71.100
  Duration: 00:01:00.02, start: 0.000000, bitrate: 2037 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1902 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
1 worker failed at 10:35AM +0000 on Apr 30, 2019                                                             
  0%|                                                      | 0/3 [01:37<?, ?it/s, jobs=2, tasks=3, workers=0]E0430 10:36:46.240423  1134 master.cpp:1739] No workers but have unfinished work after 30 seconds
100%|██████████████████████████████████████████████| 3/3 [01:38<00:00, 32.95s/it, jobs=2, tasks=3, workers=0]
Traceback (most recent call last):
  File "grayscale_conversion.py", line 37, in <module>
    main()
  File "grayscale_conversion.py", line 27, in main
    sc.run(output, sp.PerfParams.manual(50, 250), cache_mode=sp.CacheMode.Overwrite)
  File "/usr/local/lib/python3.5/dist-packages/scannerpy/client.py", line 1586, in run
    raise ScannerException(job_status.result.msg)
scannerpy.common.ScannerException: No workers but have unfinished work after 30 seconds
E0430 10:45:08.041167  1189 worker.cpp:662] Worker did not receive heartbeat in 300000ms. Shutting down.

Additional Information

I ran ctest from the build directory at /opt/scanner/build and received an error on the python tests. Running pytest in the tests directory shows a number of failing tests.

Test Results:

============================================ test session starts ============================================
platform linux -- Python 3.5.2, pytest-3.0.6, py-1.8.0, pluggy-0.4.0
rootdir: /opt/scanner/tests, inifile: pytest.ini
collected 37 items
py_test.py s.FF..FFF.FFFFFFFFFFsFFFFFFFFFFF.FFFAborted (core dumped)

Thanks for the error report. I just double checked and this all worked for me, so I'm guessing it's either your machine (maybe not enough memory or something?) or an environment setup issue.

First, to double-check, run docker-compose pull cpu to make sure you have the latest image.

Next, can you run the tests with pytest tests -vvs and paste the output of logs to a gist?

Yep! I'll do that first thing tomorrow. Thanks! The host machine should have plenty of resources (64 GB RAM and 12 cores). I confirmed that the container was set to use "unlimited".

@willcrichton : I've uploaded the results to a gist. I ensured that the docker image was updated. Here's the image that the container built from :

scannerresearch/scannertools:cpu-latest@sha256:6ff1e13a3e2a3877ab3543eff07c71c97d61a335a1901816113e68e227efc416

Thanks @spaulaus. Does this machine have any kind of proxy or firewall?

@willcrichton: It is. I verified that I'm able to connect through the proxy. I'm able to execute wget and git pulls without issues. I'll verify that there's not a certificate issue.

We use gRPC to connect the various processes, which we've seen issues with in the past around proxies. If I remember correctly, try unset http_proxy and re-running the example.

@willcrichton : I confirmed that clearing the proxy environment variables results in a successful conversion to greyscale. It's had the side-effect of causing all the unittests to fail. This is outside the scope of the original problem statement.

Thanks for your assistance!