Query: Methods to improve similarity score.
Closed this issue · 1 comments
I've been contemplating the introduction of an intermediate API call to look up user search queries on platforms like Amazon. By doing so, we could collect the first search result and extract its detailed product description, brand name, associated company, and more. These details can then be used to establish the similarity of the product description embedding to the industry product descriptions. Furthermore, the brand or company details can be matched to their respective NAICS industry associations. This dual-validation method, I believe, could significantly refine and enhance the match scores.
I'd be very interested to hear your thoughts on this. Is this a direction you've considered or possibly experimented with? If not, do you see any potential challenges or advantages in implementing this method? Your feedback and insights would be invaluable to me as I embark on extending your work.
I have attached a screenshot of my experiments with your project in which, I think the user search query intent did not match the industry mapping.
I'm hoping to get your opinion on the pros and cons of changing the input product string, and industry string, with it's respective descriptions. For example instead of the product input, "coca cola", and industry "Soft drink manufacturing", we replace it with preprocessed and cleaned embeddings of product description, and industry description as shown below input-lhs, and input-rhs.
input-lhs:
Coca-Cola Soda Soft Drink, 16.9 fl oz, 6 Pack
Product Description
Soda. Pop. Soft drink. Sparkling beverage. Whatever you call it, nothing compares to the refreshing, crisp taste of Coca-Cola Original Taste, the delicious soda you know and love. Enjoy with friends, on the go or with a meal. Whatever the occasion, wherever you are, Coca-Cola Original Taste makes life’s special moments a little bit better. Carefully crafted in 1886, its great taste has stood the test of time. Something so delicious, so unique and so familiar, it’s what makes you think “Coca-Cola” whenever you hear “soft drink.” Between that perfect taste and refreshing fizz, it’s sure to give you that “ahhh” moment whenever you want it. Coca-Cola is available in many different options in addition to Original Taste, including a variety of all-time favorite flavors like Coca-Cola Cherry and Coca-Cola Vanilla. Looking for something zero sugar or caffeine free? Then look no further than Coca-Cola Zero Sugar and Coca-Cola Caffeine Free. Whatever you’re looking for in a soda, there’s a Coca-Cola to satisfy your taste buds. Every sip, every “ahhh,” every smile—find that feeling with Coca-Cola Original Taste. Best enjoyed ice-cold for maximum refreshment. Grab a Coca-Cola Original Taste, take a sip and find your “ahhh” moment. Enjoy Coca-Cola Original Taste.
input-rhs:
Common types of business activities within NAICS Code 312111 - Soft Drink Manufacturing are:
Flavored water manufacturing
Coffee, iced, manufacturing
Iced coffee manufacturing
Soda carbonated, manufacturing
Pop, soda, manufacturing
Carbonated soda manufacturing
Artificially carbonated waters manufacturing
Fruit drinks (except juice), manufacturing
Soda pop manufacturing
Beverages, soft drink (including artificially carbonated waters), manufacturing
Water, flavored, manufacturing
Water, artificially carbonated, manufacturing
Soft drinks manufacturing
Beverages, fruit and vegetable drinks, cocktails, and ades, manufacturing
Carbonated soft drinks manufacturing
Iced tea manufacturing
Tea, iced, manufacturing
Drinks, fruit (except juice), manufacturing
Hi Sachin, agree that filling out the full product description will give you better matches to NAICS codes. That's actually what we did in our experiments documented in this paper: https://www.amazon.science/publications/caml-carbon-footprinting-of-household-products-with-zero-shot-semantic-text-similarity
The dataset of products we used can be found in the code base here: https://github.com/amazon-science/carbon-assessment-with-ml/tree/main/notebooks/eio/data
I think this answers the questions you asked, so I'm closing the ticket. Feel free re-open it if you have more questions.