CLI hangs for some languages
msonderegger opened this issue · 2 comments
msonderegger commented
Hi Jackson and Kyle,
Thanks for this amazing resource! I've run into a pretty basic issue, and wonder if there's a problem on my end.
I am trying to re-extract lexicons for some languages (like Ukrainian) including stress information, for which the current scrape TSVs don't have stress.
After installing wikipron
, the CLI works for some languages:
(base) morgan@Morgans-MBP-2 ~ % wikipron abk
INFO: Language: 'Abkhaz'
INFO: No cut-off date specified
ƶ ʐ
ааба a a p a
абӷьы a ˈb ʁʲ ə
абельльи a b e lʲ lʲ ə j
абна a b n a
(this coninues, outputting a couple entries per second)
But for Ukrainian (ukr
), it hangs after one entry:
(base) morgan@Morgans-MBP-2 ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś ɕ
(here I waited 5 minutes)
And for Russian, (rus
), it hangs before any entries are outputted.
Here is the trace if I cancel the Ukrainian run:
(base) morgan@Morgans-MBP-2 ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś ɕ
^CTraceback (most recent call last):
File "/Users/morgan/opt/anaconda3/bin/wikipron", line 8, in <module>
sys.exit(main())
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py", line 133, in main
_scrape_and_write(config)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py", line 123, in _scrape_and_write
for i, (word, pron) in enumerate(scrape(config), 1):
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py", line 104, in scrape
yield from _scrape_once(data, config)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py", line 59, in _scrape_once
request = session.get(
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/Users/morgan/opt/anaconda3/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1242, in recv_into
return self.read(nbytes, buffer)
File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1100, in read
return self._sslobj.read(len, buffer)
KeyboardInterrupt
Thanks!
kylebgorman commented
Not in front of computer but try specifying that you want narrow transcriptions (—narrow). From experience I know that Russian only has [] transcriptions IIRC but the only way for us to know that is to (seemingly though not really) hang while it downloads all the relevant pages, none of which have // transcriptions. I bet Ukrainian is the same story. LMK if this works.On Jul 21, 2023, at 11:51 AM, msonderegger ***@***.***> wrote:
Hi Jackson and Kyle,
Thanks for this amazing resource! I've run into a pretty basic issue, and wonder if there's a problem on my end.
I am trying to re-extract lexicons for some languages (like Ukrainian) including stress information, for which the current scrape TSVs don't have stress.
After installing wikipron, the CLI works for some languages:
(base) ***@***.*** ~ % wikipron abk
INFO: Language: 'Abkhaz'
INFO: No cut-off date specified
ƶ ʐ
ааба a a p a
абӷьы a ˈb ʁʲ ə
абельльи a b e lʲ lʲ ə j
абна a b n a
But for Ukrainian (ukr), it hangs after one entry:
(base) ***@***.*** ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś ɕ
(here I waited 5 minutes)
And for Russian, (rus), it hangs before any entries are outputted.
Here is the trace if I cancel the Ukrainian run:
(base) ***@***.*** ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś ɕ
^CTraceback (most recent call last):
File "/Users/morgan/opt/anaconda3/bin/wikipron", line 8, in <module>
sys.exit(main())
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py", line 133, in main
_scrape_and_write(config)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py", line 123, in _scrape_and_write
for i, (word, pron) in enumerate(scrape(config), 1):
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py", line 104, in scrape
yield from _scrape_once(data, config)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py", line 59, in _scrape_once
request = session.get(
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/Users/morgan/opt/anaconda3/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1242, in recv_into
return self.read(nbytes, buffer)
File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1100, in read
return self._sslobj.read(len, buffer)
KeyboardInterrupt
Thanks!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
msonderegger commented
Specifying narrow transcription works. Thank you!
Morgan
On Fri, Jul 21, 2023 at 12:13 PM Kyle Gorman ***@***.***>
wrote:
… Not in front of computer but try specifying that you want narrow
transcriptions (—narrow). From experience I know that Russian only has []
transcriptions IIRC but the only way for us to know that is to (seemingly
though not really) hang while it downloads all the relevant pages, none of
which have // transcriptions. I bet Ukrainian is the same story. LMK if
this works.On Jul 21, 2023, at 11:51 AM, msonderegger ***@***.***> wrote:
Hi Jackson and Kyle,
Thanks for this amazing resource! I've run into a pretty basic issue, and
wonder if there's a problem on my end.
I am trying to re-extract lexicons for some languages (like Ukrainian)
including stress information, for which the current scrape TSVs don't have
stress.
After installing wikipron, the CLI works for some languages:
(base) ***@***.*** ~ % wikipron abk
INFO: Language: 'Abkhaz'
INFO: No cut-off date specified
ƶ ʐ
ааба a a p a
абӷьы a ˈb ʁʲ ə
абельльи a b e lʲ lʲ ə j
абна a b n a
But for Ukrainian (ukr), it hangs after one entry:
(base) ***@***.*** ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś ɕ
(here I waited 5 minutes)
And for Russian, (rus), it hangs before any entries are outputted.
Here is the trace if I cancel the Ukrainian run:
(base) ***@***.*** ~ % wikipron ukr
INFO: Language: 'Ukrainian'
INFO: No cut-off date specified
ś ɕ
^CTraceback (most recent call last):
File "/Users/morgan/opt/anaconda3/bin/wikipron", line 8, in <module>
sys.exit(main())
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py",
line 133, in main
_scrape_and_write(config)
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/cli.py",
line 123, in _scrape_and_write
for i, (word, pron) in enumerate(scrape(config), 1):
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py",
line 104, in scrape
yield from _scrape_once(data, config)
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/wikipron/scrape.py",
line 59, in _scrape_once
request = session.get(
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py",
line 600, in get
return self.request("GET", url, **kwargs)
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py",
line 587, in request
resp = self.send(prep, **send_kwargs)
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py",
line 701, in send
r = adapter.send(request, **kwargs)
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/requests/adapters.py",
line 489, in send
resp = conn.urlopen(
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 703, in urlopen
httplib_response = self._make_request(
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File
"/Users/morgan/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 444, in _make_request
httplib_response = conn.getresponse()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line
1377, in getresponse
response.begin()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 320,
in begin
version, status, reason = self._read_status()
File "/Users/morgan/opt/anaconda3/lib/python3.9/http/client.py", line 281,
in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/Users/morgan/opt/anaconda3/lib/python3.9/socket.py", line 704, in
readinto
return self._sock.recv_into(b)
File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1242, in
recv_into
return self.read(nbytes, buffer)
File "/Users/morgan/opt/anaconda3/lib/python3.9/ssl.py", line 1100, in
read
return self._sslobj.read(len, buffer)
KeyboardInterrupt
Thanks!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are
receiving this because you are subscribed to this thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#500 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGC5UMCTW7LO2KTNOUULQDXRKTENANCNFSM6AAAAAA2TAPQPM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>