bartholomej/svelte-sitemap

[New Feature Request] Automatically submit sitemap to search engine APIs

Glench opened this issue ยท 5 comments

Please excuse me if this is out of the scope of this project, but it would be really cool if this library, in addition to generating the sitemap, also had an option to ping search engine APIs that the sitemap had been updated (e.g. Google's Search Console API). Right now I'm doing this manually.

Hi Glen,
I understand your need. But this library does not aspire to be the ultimate solution for the entire sitemap flow.
It's just a single-purpose small helper ;)

I recommend using the available examples or ready-made libraries that ping the Google Search Console once the application is deployed.
It's pretty simple ;)

Maybe I can prepare a small example in the readme of this library, but it will only be an example and the implementation will be up to you.

@Glench @crazycto
This is how you can easily do it with Github Action. You should run this step after deploy step.

Create file in your repository: .github/sitemap.yml

name: Ping Google about updated sitemap

on:
  push:
    branches: [ master ]

jobs:
  ping:
    name: Ping Google
    runs-on: ubuntu-latest

    steps:
      - name: Send HTTP GET request
        # Example: `curl http://www.google.com/ping?sitemap=https://www.example.com/sitemap.xml`
        run: curl http://www.google.com/ping?sitemap=FULL_URL_OF_YOUR/sitemap.xml

Let me know if this is enough for you.

Hello @bartholomej,

Thanks, this is a nice solution.

Just for the records, this is a Netlify plugin that does just that.

I guess that for us who are not using Netlify this is the best solution to automate the process.

Couple GH Actions tips:

  • This waits for Cloudflare Pages deployment
  • This waits for Vercel deployment

Hope this helps.

I extended this to notify Google only if the new sitemap has changed, using a hash of the sitemap stored in Github Actions cache for comparison:

# - `actions/cache@v3` expires cache after 7 days and is not configurable,
#   so this will also not find a hit if >=7 days have passed since last cache,
#   which is fine for this use case.
# - `actions/cache@v3` requires a valid file path or it will say cache hit
#   false every time. But we don't need our sitemap contents for this to
#   work; to minimize cache consumption, we can use an empty dummy file.
- run: echo '' > dummy.txt
  if: github.ref == 'refs/heads/main'
- name: Configure cache
  id: sitemap-cache
  uses: actions/cache@v3
  with:
    path: dummy.txt
    key: sitemap-cache-${{ hashFiles('.svelte-kit/output/prerendered/pages/sitemap.xml') }}
    lookup-only: true
  if: github.ref == 'refs/heads/main'
- name: Notify Google about sitemap update
  run: curl "https://google.com/ping?sitemap=https://example.com/sitemap.xml"
  if: steps.sitemap-cache.outputs.cache-hit == 'false'

The sitemap ping endpoint is deprecated by google now,
https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping