mubaris/motivate

Script to remove duplicate quotes

mubaris opened this issue ยท 15 comments

Add a small script to automatically delete all the duplicate quotes in the data folder.

Is it to remove duplicated quotes in an unique file or among all files? Is the script allowed to edit the files in data folder?

We want to remove duplicate quotes from all the json files in data folder. We already have a script to find duplicate entries. That script will be useful.

Yes, the script is allowed to edit the files in data folder. Whenever there's a duplicate entry in json file, it should be removed.

Ok. I'll take this issue.

Can't run $ motivate

TypeError: join() argument must be str or bytes, not 'PosixPath'

@michellymenezes I think I have fixed this issue. Let me know if there's any problem. Clone and install again

@mubaris I've made a script to create a new json file with all unique quotes #25

@michellymenezes It would be very useful instead creating new json file, if we could remove the entry from existing json files.

Can I take this up?

There's already a PR. There's a small problem with it. If you can find a method to eliminate that issue, let me know.

What's the problem?

Check #25

Hey in your find_dupes.py file,can you explain me in this line of code
dupes = sorted([x for x in quotes if x['quote'] in seen or seen.add(x['quote'])], key=lambda x:x['quote'])
What do the 'key=lambda x:x['quote']' does?

@mubaris I think removing repeated quotes from each file will cost more than computing unique quotes and getting everything from a unique file.

@mubaris Hey is the issue still open?I made a fix for what you want to achieve.

#40

Issue was still open, you mentioned there were issues with previous implementation. In this case it's better to not worry about cost, but worry about ease of the database. Maintaining multiple json files only makes sense if there is a difference between them (001 = science quotes, 002 = history quotes, 007 = james bond quotes). If you wish for randomization it'll be easier if you don't have to randomize file in addition to the quote.

If you'd like I can do the same refactoring to your same quote as this one. It ends up improving readability.