/getSeoSitemap

Php library to get the sitemap. It crawls a whole website checking all internal and external links plus a Search Engine Optimization.

Primary LanguagePHPOtherNOASSERTION

getSeoSitemap v3.9.5 (2019-10-04)

Php library to get sitemap.
It crawls a whole domain checking all URLs.
It makes a full Search Engine Optimization of URLs into sitemap only.

donate via paypal
donate via bitcoin
Please support this project by making a donation via PayPal or via BTC bitcoin to the address 19928gKpqdyN6CHUh4Tae1GW9NAMT6SfQH

It requires PHP 5.5 and MySQL 5.5.

This script creates a full gzip sitemap or multiple gzip sitemaps plus a gzip sitemap index.
It includes change frequency, last modification date and priority setted following your own rules.
Change frequency will be automatically selected between daily, weekly, monthly and yearly.
Max URL lenght must be 767 characters, otherwise the script will fail.
URLs with http response code different from 200 or with size = 0 will not be included into sitemap.
It checks all internal and external links inside html pages and js sources (href URLs into 'a' tag plus form action URLs if method is get).
It checks all internal and external sources.
Mailto URLs will not be included into sitemap.
URLs inside pdf files will not be scanned and will not be included into sitemap.

To improve SEO following robots.txt rules of "User-agent: *", it checks:

  • http response code of all internal and external sources into domain (images, scripts, links, iframes, videos, audios)
  • malformed URLs into domain
  • page title of URLs into domain
  • page description of URLs into domain
  • page h1/h2/h3 of URLs into domain
  • page size of URLs into sitemap
  • image alt of URLs into domain
  • image title of URLs into domain.

You can use absolute or relative URLs inside the site.
This script will set automatically all URLs to skip and to allow into sitemap following the robots.txt rules of "User-agent: *" and robots tag into page head.
There is not any automatic function to submit updated sitemap to search engines.
Sitemap will be saved in the main directory of the domain.
It rewrites robots.txt adding updated sitemap informations.
Maximum limit of URLs to insert into sitemap is 2.5T.

Using getSeoSitemap, you will be able to give a better surfing experience to your clients.

Instructions
1 - copy getSeoSitemap folder in a protected zone of your server.
2 - set all user parameters into config.php.
3 - on your server cronotab schedule the script once each day preferable when your server is not too much busy.
A command line example to schedule the script every day at 7:45:00 AM is:
45 7 * * * php /example/example/example/example/example/getSeoSitemap/getSeoSitemap.php
When you know how long it takes to execute all the script, you could add a cronotab timeout.

Warning
From release v3.9.4, execution time of the script has increased a lot to run all new functions.
To run getSeoSitemap faster, using a script like Geoplugin you should exclude geoSeoSitemap user-agent from that.
Before moving from releases lower than 3.0 to 3.0 or higher, you must drop getSeoSitemap and getSeoSitemapExec tables into your dBase.
Do not save any file with name that starts with sitemap in the main directory, otherwise getSeoSitemap script could cancel it.
The robots.txt file must be present into the main directory of the site otherwise getSeoSitemap will fail.