Bots index every url, even if its duplicate.

Question

Bots index every url, even if its duplicate.

Macka323 opened this issue a year ago · 4 comments

By duplicate I mean that the category for products can be viewed in different sort orders, but the contents of the page are the same.

One way to fix it would be to disallow the bots in Robots.txt to crawl links with those parameters. For example:

Disallow: /*?viewmode=
Disallow: /*?orderby=
Disallow: /*?pagesize=

Another option is to use canonical tags on the pages. We can specify the canonical URL without query parameters to indicate which version should be indexed. For example:

<link rel="canonical" href="https://www.example.com/page">

Answer 1 · 2023-12-09T07:41:25.000Z

Yes. I like this solution:
<link rel="canonical" href="https://www.example.com/page">

Answer 2 · 2023-12-09T17:45:50.000Z

@nguyendev Using a canonical tag is okay, but if you have product specifications that are used for filtering, then addresses such as 'https://demo.grandnode.com/computers?cpu-type=intel-core-i7' will not be included

Answer 3 · 2023-12-09T18:31:14.000Z

You're right. I forgot about that😿. So why don't I set up dynamic canonical? For example, it is like a rule table. Which ones are canoical, which ones are not allowed? Likewise with robots.txt

Answer 4 · 2023-12-10T07:37:41.000Z

It will be easier to add changes to robots.txt
Disallow: /*?viewmode=...