CUNY-CL/wikipron

CLI hangs for some languages

msonderegger opened this issue · 2 comments

Hi Jackson and Kyle,

Thanks for this amazing resource! I've run into a pretty basic issue, and wonder if there's a problem on my end.

I am trying to re-extract lexicons for some languages (like Ukrainian) including stress information, for which the current scrape TSVs don't have stress.

After installing wikipron, the CLI works for some languages:

(base) morgan@Morgans-MBP-2 ~ % wikipron abk

INFO: Language: 'Abkhaz'
INFO: No cut-off date specified
ƶ	ʐ
ааба	a a p a
абӷьы	a ˈb ʁʲ ə
абельльи	a b e lʲ lʲ ə j
абна	a b n a

(this coninues, outputting a couple entries per second)

But for Ukrainian (ukr), it hangs after one entry:

(base) morgan@Morgans-MBP-2 ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś	ɕ

(here I waited 5 minutes)

And for Russian, (rus), it hangs before any entries are outputted.

Here is the trace if I cancel the Ukrainian run:

(base) morgan@Morgans-MBP-2 ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś	ɕ
^CTraceback (most recent call last):
  File "/Users/morgan/opt/anaconda3/bin/wikipron", line 8, in <module>
    sys.exit(main())
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py", line 133, in main
    _scrape_and_write(config)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py", line 123, in _scrape_and_write
    for i, (word, pron) in enumerate(scrape(config), 1):
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py", line 104, in scrape
    yield from _scrape_once(data, config)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py", line 59, in _scrape_once
    request = session.get(
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Users/morgan/opt/anaconda3/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1242, in recv_into
    return self.read(nbytes, buffer)
  File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1100, in read
    return self._sslobj.read(len, buffer)
KeyboardInterrupt

Thanks!