Script to remove duplicate quotes
mubaris opened this issue ยท 15 comments
Add a small script to automatically delete all the duplicate quotes in the data
folder.
Is it to remove duplicated quotes in an unique file or among all files? Is the script allowed to edit the files in data folder?
We want to remove duplicate quotes from all the json files in data
folder. We already have a script to find duplicate entries. That script will be useful.
Yes, the script is allowed to edit the files in data
folder. Whenever there's a duplicate entry in json file, it should be removed.
Ok. I'll take this issue.
Can't run $ motivate
TypeError: join() argument must be str or bytes, not 'PosixPath'
@michellymenezes I think I have fixed this issue. Let me know if there's any problem. Clone and install again
@michellymenezes It would be very useful instead creating new json file, if we could remove the entry from existing json files.
Can I take this up?
There's already a PR. There's a small problem with it. If you can find a method to eliminate that issue, let me know.
What's the problem?
Hey in your find_dupes.py file,can you explain me in this line of code
dupes = sorted([x for x in quotes if x['quote'] in seen or seen.add(x['quote'])], key=lambda x:x['quote'])
What do the 'key=lambda x:x['quote']' does?
@mubaris I think removing repeated quotes from each file will cost more than computing unique quotes and getting everything from a unique file.
Issue was still open, you mentioned there were issues with previous implementation. In this case it's better to not worry about cost, but worry about ease of the database. Maintaining multiple json files only makes sense if there is a difference between them (001 = science quotes, 002 = history quotes, 007 = james bond quotes). If you wish for randomization it'll be easier if you don't have to randomize file in addition to the quote.
If you'd like I can do the same refactoring to your same quote as this one. It ends up improving readability.