Circular origin reference
alxwrd opened this issue · 3 comments
alxwrd commented
There's currently at least one circular reference in etymwn-relety.json
.
ety.origins("software", recursive=True)
will eventually fail with a recursion error because software
-> soft, -ware
and -ware
-> software
.
alxwrd commented
I'm going to run a job to try and discover all these problems with the data.
Hidden as no longer relevant
import sys
import ety
sys.setrecursionlimit(20)
total = len(ety.data.etyms)
def find():
results = []
errors = []
for count, word in enumerate(ety.data.etyms):
try:
_ = ety.Word(word["a_word"], word["a_lang"]).origins()
print("{}/{}".format(count, total), end="\r")
except RecursionError:
results.append(_)
except Exception as e:
errors.append({
"error": e,
"word": _
})
return results, errors
>>> import find_circulars
>>> results, errors = find_circulars.find()
4094/473433
I'll update here once it's done.
alxwrd commented
I've just realised this isn't actually recursion because it's the child's .origins()
that's being called.
The issue is because the results are appended to the result
list, and the chain "software" -> "-ware" -> "software" will just keep growing the result
list.
I have a fix in mind, I'll submit a PR later.
jmsv commented
Hmm, could the solution be as simple as only adding a Word
to a branch if it hasn't already appeared in that branch?