The Hostway Blog

Sitemap XML Feeds

Search engine experts have been hounding Google for years to accept XML feeds from Webmasters who want to update their site information. When Google recently introduced the beta version of Google Sitemaps, they showed the experts that they have been listening—at least in part.

Google Sitemaps is a very useful feature for large shopping sites and other sites with frequently updated or changing content. It ensures that the search engine result listings related to their Web site stay fresh and up to date, a must for search engine optimization.

The initial results of Google Sitemaps seems encouraging. In a recent test, pages were crawled by Google within 14 hours of submitting a sitemap file. Without a sitemap file, it had taken weeks until updates to the pages were detected and applied.

Some search engine experts are discouraged that Google didn't go far enough. They didn't implement a real-time ping server allowing sitemap submissions to occur within seconds, and that feeds need to be continually submitted when changes occur. Nevertheless, offering Google Sitemaps is a step in the right direction.

It's also an empowering one.

Submitting the XML feed. The Google Sitemaps protocol uses very simple XML tags. Four tags are used to define individual pages:

  • Location — This is the URL path to your web, beginning with http://. For example, http://www.mywebsite.com
  • Priority — The priority of the page within your site is rated on a scale of zero to one. By placing high priority on these pages, you will increase their importance in Google. The least important pages in your site should be assigned a priority of 0.0. The most important pages should be ranked 1.0. Pages that are common should be ranked 0.5.
  • Last modified — When the page was last modified. This timestamp allows Google to avoid re-indexing pages that haven't changed.
  • Change frequency — Specifies how often a page is likely to change: never, weekly, daily or hourly. This helps you identify pages that need to be indexed by Google most often.

These tags are wrapped inside tags used to define the sitemap. The result is a XML file that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://www.iconinteractive.com/</loc>
<lastmod>2005-07-06</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
<url>
</urlset>

The restrictions on sitemap files are modest. URLs must not include embedded newlines. You must fully specify URLs because Google tries to crawl the URLs exactly as you provide them. Your sitemap files must use UTF-8 encoding. And each sitemap file is limited to 50,000 URLs and 10 megabytes when uncompressed.

When you submit a sitemap file to Google, you're notifying Google what URLs on your Web site are ready to be crawled. Each time a change occurs, such as the addition of a new page, you need to resubmit your sitemap file for changes to be considered.

You can submit it in different ways: on Google Sitemaps home page using a URL or with Google's Sitemap Generator installed on your Web server and scheduled to run automatically.