Question about Latency Limit

Question

Question about Latency Limit

zhipengChen opened this issue 5 years ago · 22 comments

Dear MRQA group,

We test our model(single model) on the out-of-domain dataset(official data on Codalab) with official predict_server.py on Codalab with one GPU(Tesla K80)and get the right result we expected. But the time we used was 3h(bout 1.12s a question), I'd like to confirm that our model meet your Latency Limit.

Best,
Zhipeng

Answer 1 · 2019-07-24T08:34:33.000Z

Hi Zhipeng,

That should be fine, because we will test on better hardware. If you want to double check, you can submit this model now and I can test the latency on my end. This doesn't have to be your final submission, but you should follow the submission instructions so that it is easy for me to run it.

Answer 2 · 2019-07-24T09:27:29.000Z

OK, Thank you so much. This is our bandle id 0xbbe65de9855b4a058e1d333f28c46dad (now it can only read by mrqa group).

Answer 3 · 2019-07-24T13:15:31.000Z

The bundle of run-predictions/predictions.json is predictions-LatencyTest(bandle id 0xefa76771a566486096af8f39c241793e )

Answer 4 · 2019-07-26T01:54:39.000Z

Hello Robin，
Do you test my submission's latency on your end?

Answer 5 · 2019-07-26T19:39:47.000Z

Hi Zhipeng,

I encounter the following error when running your code:

  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcecda0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcece48>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcecc88>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcecf98>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcec7b8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Could not find a version that satisfies the requirement Werkzeug>=0.7 (from Flask==0.12.1) (from versions: )
No matching distribution found for Werkzeug>=0.7 (from Flask==0.12.1)
WARNING: Logging before flag parsing goes to stderr.
W0726 19:23:41.477045 140036318504768 deprecation_wrapper.py:119] From /0x05a84515b6bf45f088b35976d5fdc2a0_dependencies/src-sub_v6/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

Traceback (most recent call last):
  File "src-sub_v6/run_mrqa_sub1.py", line 1338, in <module>
    import flask
ModuleNotFoundError: No module named 'flask'

It seems that you are trying to install flask but it fails. Note that when we run the submissions, we run them in a container that does not have network access. I recommend that you have everything installed on the docker image you are using, so that you don't have to install as part of your code.

Answer 6 · 2019-07-26T23:32:17.000Z

Hello Robin，
Thank you for testing our model.
We didn't use network when we install flask. The script we used to run our model is below.
cl run :src-sub_v6 :tools :predict_server.py data_dir:mrqa-dev-data allennlp:src-sub_v6/allennlp mrqa_model:saved_model1/1563934954 :run_mrqa.sh 'sh run_mrqa.sh & pip3 install tools/overrides-1.9.tar.gz; python3 predict_server.py <(cat data_dir/*.jsonl) predictions.json 8888' --request-docker-image kevin898y/tensorflow_py36 --request-memory 12g --request-gpus 1

All the packages we need to install Flask in docker kevin898y/tensorflow_py36 are already download.

This is our runtime environment on codalab. Did you use the docker kevin898y/tensorflow_py36 ?

Answer 7 · 2019-07-26T23:38:39.000Z

I see. Yes I'm using the same docker image. This is very strange, let me try to get some help from the codalab people to understand what happened.

Comparing your bundle with my run, the difference is that for yours:

Requirement already satisfied: Werkzeug>=0.7 in /usr/local/lib/python3.6/dist-packages (from Flask==0.12.1)

Whereas in mine it says it's not satisfied, tries to download it, and fails.

Answer 8 · 2019-07-26T23:44:55.000Z

Thanks.
If you still can't run it. I can download Werkzeug into my direct and install it.

Answer 9 · 2019-07-28T14:16:14.000Z

Hello Robin，
Can you run our model on your end now?

Answer 10 · 2019-07-28T17:31:15.000Z

I am still debugging this. Just to help me out: is the docker image kevin898y/tensorflow_py36 something you created? Have you changed the docker image recently?

Answer 11 · 2019-07-28T19:55:17.000Z

Hi Zhipeng,

Could you try submitting another version that uses the --no-deps flag everywhere when you pip install things (especially inside run_mrqa.sh)? The dependencies that are needed seem to already be in the docker image, but pip tries to re-install them anyways, which is what causes it to try to access network, which leads to failed bundles.

Answer 12 · 2019-07-28T20:35:29.000Z

Actually, we think we have found the issue inside codalab that was causing the problem. Once it is fixed I will try again and hopefully it will work. So you do not need to take further action at this time.

Answer 13 · 2019-07-29T00:48:23.000Z

Ok, thank you.

Answer 14 · 2019-07-29T01:30:45.000Z

Hello Robin，
If you still can't run our model on your end, you can run our final submition on the official Codalab. It must be Ok.

Answer 15 · 2019-07-29T05:48:51.000Z

Hi Zhipeng,

Yes in the worst case we can just run on the public codalab instances. If it's not too much trouble, could you try submitting another bundle with --no-deps flag everywhere, as I suggested? I want to see if this workaround can succeed, if the codalab issue does not get fixed in time (the issue is that the running job does not have root in the non-public codalab workers, which usually doesn't matter but predictably causes some installation-related things to behave differently).

Answer 16 · 2019-07-29T05:52:09.000Z

Ok. I will submit another bundle right now.

Answer 17 · 2019-07-29T06:08:00.000Z

Hello Robin，
I already upload a new script. Now it is running.
The bundle id is 0x9650972932c742b19f3be50de3de54f0 (run-predictions).

Answer 18 · 2019-07-29T06:19:35.000Z

I am trying it now, initially it looks like it works! I will keep you updated. Please still finish the normal submission procedure.

Answer 19 · 2019-07-29T06:27:21.000Z

Ok. Thank you for your remind. Our final submition is already prepared. But I didn't add --no-deps into the script. Should I upload a new one with all pip install with --no-deps.

Answer 20 · 2019-07-29T06:34:01.000Z

Yes, that would be great if you can have your final submission use --no-deps.

Answer 21 · 2019-07-29T06:36:06.000Z

Ok. I'm already uploading a new one.

Answer 22 · 2019-07-30T11:04:38.000Z

Hello Robin，
We already submit our final system with all 'pip install' added '--no-deps'( Bundle id is 0x75a635dc423f495f8016f1eb5b02bbb2 . We also send it to your group with an email). If any problem happen when you run it on test set, you can send me a massage here or with an email. Thanks.