Wikipedia: ConnectionClosedError('code = 1006 (connection closed abnormally [internal]), no reason')

Question

Wikipedia: ConnectionClosedError('code = 1006 (connection closed abnormally [internal]), no reason')

alexcg1 opened this issue 4 years ago · 12 comments

I'm testing Wikipedia example with different datasets. Each time I:

Delete workspace
Create a new folder like data.products containing data.txt
ln -s data.products data
Set path in app.py to data/data.txt
Run python app.py -t index
It worked the first few times. But now after indexing 100 Documents (out of ~3000) I'm getting error:Got following error while streaming requests via websocket: ConnectionClosedError('code = 1006 (connection closed abnormally [internal]), no reason')`

I've tried with several different datasets, same thing each time. Only thing I did was swap out dataset.

I mean, I'm pretty sure if I clone examples repo from scratch and create new venv this problem would go away. But thought important to bring it up.

Sample dataset I used attached (adapted from https://www.kaggle.com/PromptCloudHQ/toy-products-on-amazon)
data.txt

Answer 1 · 2021-04-23T09:32:57.000Z

Using

Jina 1.1.0
Python 3.8.8
Manjaro 21

Answer 2 · 2021-04-23T09:41:46.000Z

So I re-cloned the repo, started with a clean venv, and everything worked okay after that. Closing for now

Correction: Error popped up again.

It worked fine indexing amazon toy dataset with JINA_MAX_DOCS unset (thus indexing only 50 as specified in app.py. But using export JINA_MAX_DOCS=30000 caused the error.

I'll try changing:

  restful: True

to

  restful: False

in flows/index.yml to see if that fixes things as @deepankarm suggested

Answer 3 · 2021-04-23T09:48:46.000Z

So I re-cloned the repo, started with a clean venv, and everything worked okay after that. Closing for now

Correction: Error popped up again.

It worked fine indexing amazon toy dataset with JINA_MAX_DOCS unset (thus indexing only 50 as specified in app.py. But using export JINA_MAX_DOCS=30000 caused the error.

I'll try changing:
  restful: True
to
  restful: False
in flows/index.yml to see if that fixes things as @deepankarm suggested

After doing this I'm now at 300 Documents indexed and it's proceeding smoothly. I'll update as I go along.

I suggest someone tests RESTful indexing using the full Wikipedia dataset (check the README for howto), with JINA_MAX_DOCS set pretty high. That way we can see if it's the dataset itself (it shouldn't be, since the Docker image is pre-indexed with 30k docs) or if the indexing is choking after a certain number - @rutujasurve94 ?

Answer 4 · 2021-04-23T10:00:40.000Z

@alexcg1 @rutujasurve94 With restful: true and f.index(...), we use websockets for streaming. This is not highly used/tested (as frontend still doesn't stream requests to jina). If you face issues with it, feel free to create issues in core.

Answer 5 · 2021-04-30T08:35:38.000Z

Hey @alexcg1

If the Core issue #2343 is closed, does that mean this issue is closed? I'm on a burn & clean up mood this morning.

Answer 6 · 2021-04-30T08:59:38.000Z

Alas no @FionnD . This issue is about the index_restful crapping out after 100 or so Documents and Jina crashes. Core issue #2343 was a whole other bug where Jina kept spitting out terminal output even after Flow said it was complete

Answer 7 · 2021-04-30T09:06:31.000Z

ok... let me see if I can get a someone to try and reproduce it

Answer 8 · 2021-04-30T09:20:17.000Z

I had it occur multiple times, in multiple virtual environments, multiple Python versions, multiple datasets, from multiple clones of repo (to ensure I hadn't accidentally polluted with my own prior changes)

Answer 9 · 2021-05-15T16:19:19.000Z

I've fixed the tests and codes at #559. Except the query_restful, all the other CLI arguments are tested during CI. This should have been fixed.

Answer 10 · 2021-05-17T10:59:58.000Z

Confirmed it works with 1,000 docs. Trying now with more just in case

Answer 11 · 2021-05-17T11:05:13.000Z

3,000 docs works. Tested on AWS ec2

Answer 12 · 2021-05-17T11:45:54.000Z

Please close if your happy it's works :)