cegme/cs5293sp22

Duplicated Lines

Opened this issue · 10 comments

I'm not sure how much you care, but I found quite a few duplicated contexts. I'm about add a pull request removing one I found on mine.

Line 1008: @vudu0001 training Esperanza Here ███ visits her parent for the birthday of her...
Line 1194: @pa1vasanth training Jimmy Stewart █████████████ is a special favorite of mine....
Line 1207: @pa1vasanth training Jack Nicholson The alien with the most tolerable voice also happe...
Line 1345: @JDewberry testing Safran Foer The Hollywood industry and writers such as ███████...
Line 1386: @sanojdoddapaneni validation Siegel There are films that are not released in theaters ...
Line 1417: @sanojdoddapaneni training Tom Cruise Great cast Great acting Unpredictable story line f...
Line 1485: @deepikamettu training Jalal Merhi's I was JUST about to turn the movie off because of ...
Line 1492: @deepikamettu training Jalal Merhi's I was JUST about to turn the movie off because of ...
Line 1493: @deepikamettu training Werner Pochath Like I said, a very strange movie that is dark and...
Line 1739: @adityakasturi8 training John This is just a bad movie. With what seemed to be q...
Line 1743: @adityakasturi8 training La It is finally coming out. The first season will be...
Line 2010: @mat-crick training Dilbert spinoff As others have pointed out, it is more than a ████...
Line 2089: @satanu22ou training Martin Niemoller ████████████████, 1945When faced with intolerance ...
Line 2132: @satanu22ou testing Wonder Showzen My girlfriend was gonna get an abortion until we b...
Line 2437: @CurSpace training Holly For my humanities quarter project for school, i ch...
Line 2438: @CurSpace training Patric For my humanities quarter project for school, i ch...
Line 2459: @CurSpace training Pauline Like some of the other folks who have reviewed thi...
Line 2472: @CurSpace training Morbius Shakespeare's "The Tempest" is a model for this ex...
Line 2474: @CurSpace training Altaira Shakespeare's "The Tempest" is a model for this ex...
Line 2488: @CurSpace validation Linda █████ Lovelace was the victim of a sadistic woman ...
Line 2489: @CurSpace validation Lovelace Linda ████████ was the victim of a sadistic woman ...
Line 2508: @CurSpace validation Drew ████ Barrymore gets second chance at high school, ...
Line 2509: @CurSpace validation Mr. Wells Worst. Movie. Ever. I can't believe they had to hi...
Line 2753: @manishamandava01 validation Bambi █████'s facial expressions were superb....
Line 2803: @Vnarra558 training Bill Paxton After all it's not everyday that someone comes in ...
Line 2904: @Gnan58 training Mel Gibson I mean, it was not like ██████████ or Bruce Willis...
Line 2947: @Gnan58 validation Edgar Kennedy The comic moments that follow are generated with t...
Line 2977: @pinn0002 training Howard Hughes Was this an early variation of beefcake courtesy o...
Line 3222: @LasyaSudha testing Jason Connery Also starring █████████████....
Line 3305: @kaustubhpande73 validation Joe Dallesandro ███████████████ is outstanding as the easy-going, ...
Line 3306: @kaustubhpande73 validation Laura Linney ████████████ of all people is along for this bumpi...
Line 3308: @kaustubhpande73 validation Nick Hammond I didn't think ████████████ was Peter Parker... an...
Line 3309: @kaustubhpande73 validation Peter Parker I didn't think Nick Hammond was ████████████... an...
Line 3310: @kaustubhpande73 validation Seth McFarlane Reason 2, The jokes are just generally hilarious, ...
Line 3311: @kaustubhpande73 validation Stephen Dorff █████████████ does the best job of the whole cast ...
Line 3312: @kaustubhpande73 validation Tobey Macguire Granted, I can also spot in the modern Spider-Man ...
Line 3313: @kaustubhpande73 validation Marilyn Monroe She recreated ██████████████'s poses for the magaz...
Line 3314: @kaustubhpande73 validation Marilyn You know him, he's the guy who claimed to have bee...
Line 3315: @kaustubhpande73 validation Miss Monroe All his claims were never proved as a matter of fa...
Line 3316: @kaustubhpande73 validation Fu Manchu For example, there is the scene where █████████ is...
Line 3317: @kaustubhpande73 validation Mencia Otherwise, you're better off settling on chewing a...
Line 3318: @kaustubhpande73 validation Carlos Mencia From his stand up specials to this train wreck of ...
Line 3319: @kaustubhpande73 validation Ashton Kutcher I would liked that the protagonist male character ...
Line 3320: @kaustubhpande73 validation Carlos Mencia Comedy stands no chance of evolving with █████████...
Line 3321: @kaustubhpande73 validation Carlos Mencia Perhaps people, especially viewers and Comedy Cent...
Line 3322: @kaustubhpande73 validation Daffy The short starts with █████ getting frustrated at ...
Line 3323: @kaustubhpande73 validation Kannathil Muthamittal The Director of █████████████████████ directed the...
Line 3324: @kaustubhpande73 validation Geoffrey Land █████████████ is okay as her surly doctor boyfrien...
Line 3325: @kaustubhpande73 validation Paul Lukas he movie has some fascinating villains in ████████...
Line 3340: @siddhardha-maguluri training Peter This movie is a great way for the series to finall...
Line 3341: @siddhardha-maguluri training Alfred Molina After some of the negative reviews i heard on this...
Line 3344: @siddhardha-maguluri training Udo Kier More eeriness and dark secrets released in the fin...
Line 3719: @Rachana137 training Tom Kiesche First-time director ███████████ turns in a winning...
Line 3786: @Rachana137 testing Kutcher This is a pale imitation of 'Officer and a Gentlem...
Line 3828: @simurgh9 training Boris Karloff Look fast to spot a very young █████████████ as th...
Line 3830: @simurgh9 training Howard Hughes Was this an early variation of beefcake courtesy o...
Line 3841: @simurgh9 validation Sadako Is ██████ somehow connected to these events?...
Line 4134: @Infinite-Zero validation Bat Masterson The larger-than-life figures of Wyatt Earp and ███...

*Updated to not include the first instance

Hello @nathanscain,

All my records were unique when I added them. However, when the user @kaustubhpande73 added his set of records, he mistakenly added the first 5 records of mine. Attaching the screenshot of the same. Hence it is showing duplicate records for my name.

image

Changed the duplicate files and created a pull request to approve the same. Thanks.

All my records were unique when I added them. However, when the user @kaustubhpande73 added his set of records, he mistakenly added the first 5 records of mine. Attaching the screenshot of the same. Hence it is showing duplicate records for my name.

Okay - someone just needs to remove them whenever possible. I am curious as to why my script only flagged those 3 and not the other 2. Possibly a whitespace issue

Changed the duplicates and created a pull request for the same.

Changes are made in pull request #151

Hello @nathanscain ,

When I checked in the unredactor.tsv file, As you mentioned above that one of my recorded is a duplicate. I couldn't fine any duplicate of my record. Can you verify once?

cegme commented

Hello Professor, One of my training dataset line was similar to kaustubhpande73. But he has updated the unredacted.tsv now. Do I still need to update the unredacted.tsv and create a pull request? Thanks & Regards, Lasya Sudha Narkimelli

@LasyaSudha No, thank you

Hello @nathanscain ,

When I checked in the unredactor.tsv file, As you mentioned above that one of my recorded is a duplicate. I couldn't fine any duplicate of my record. Can you verify once?

Sorry for the delay @vudo0001 , still working on the project lol

The context matches your context for line 1004. My guess is that you just forgot to copy the proper context because "Esperanza" is neither in the string or 3 characters. No biggie, just needs an update.