tech-conferences/conference-data

Duplicate Conferences - Test should failed

Closed this issue · 4 comments

The test didn't fail and mark as duplicate multiple conferences:

Was it because the URL and the name are not exactly the same in some cases?

I fill out the form with a duplicate conference and the test did failed. it indeed flag it as

Error: [name] Found almost identical conference...

Hi @JuanPabloDiaz

I've created a PR which is using string similarity to recognize duplicates. It looks promising:
#7074

Best regards,
Christian

Ok. Found a solution which worked for your test cases:
https://github.com/tech-conferences/conference-data/actions/runs/10147310567/job/28057389678
https://github.com/tech-conferences/conference-data/actions/runs/10147207581/job/28057047257

I'm using string compare for the URL and for the URL path and conference name. Got it working only in combination:

const confOfYearSimpleUrl = createSimpleUrl(confOfYear);
const urlSimilarity = stringSimilarity(confSimpleUrl, confOfYearSimpleUrl);
if (urlSimilarity > 0.91) {
console.log(`URL similarity of ${confSimpleUrl} and ${confOfYearSimpleUrl} is ${urlSimilarity}`);
return confOfYear;
}
const similarity = stringSimilarity(confKey, confOfYearKey);