How to compare each and every row with every row in same column and delete matching rows with ratio > 90
nithinreddyy opened this issue · 0 comments
nithinreddyy commented
How to compare each and every row with every row in same column and delete matching rows with ratio > 90
For example i have dataframe like
Pdf Content Page no
July 20, 2017.PDF Hello 24.0
July 20, 2017.PDF Hi 20.0
July 2, 2018.PDF Hey 21.0
July 2, 2018.PDF Helloo 10.0
July 2, 2018.PDF Hii 11.0
I'm exptecting output like if the each and every matches with ration above 90, then the row must be removed and the expected output is
Pdf Content Page no
July 20, 2017.PDF Hello 24.0
July 20, 2017.PDF Hi 20.0
July 2, 2018.PDF Hey 21.0
I'm trying the below code, but it's just returning the matching ratio
compare = pd.MultiIndex.from_product([data['Content'],
data['Content1']]).to_series()
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare = compare.apply(metrics)