I have a site that allows people to look up words that 'Start with' or 'End with' a certain set of characters. I am trying to figure out how to get off on the right foot with search engines and I was wondering:

Is it better to have URLs that appear to be unique pages? For example: 

instead of 

Both of these would return the same data, but I am wondering if the first is better because it appears to be a unique page to a crawler?

At the end of the day the source code will only contain about 6 pages in it, but with all of the StartsWith/EndsWith letter sets, there are probably millions of possible combinations that people could get to. How would I (and should I) create a site map for pages that do not physically exist, but have unique URLs with unique content? Are there any other steps I should take to make sure that crawlers can find all of these different combinations?

Update There will be no duplicate content on the site

Both of the URLs will appear to be unique pages to the search engines. Just because one has a query string doesn't mean the search engines don't see it as a new page.

You definitely want to use an XML sitemap to let the search engines know about your pages. You can also do things like have the most recently searched words appear on your home page to help spiders find and crawl those pages.

  • Should I even attempt to create a sitemap that has every possible combination? It seems like it would have millions of entries. I've seen some sites that come up with pages in search results when you search for 'Words that start with ***'. Are they only able to do this because they have a site map that points to that page?
  • I can't say for sure that's how the other sites do it but it is a good possibility. Sitemaps do support millions of pages although there is a special format for sitemaps that are very large. Fortunately that's easy to do so I wouldn't let that be a showstopper.
  • Would that mean using a sitemap index with multiple sitemap files?

Instead of generating sitemap.xml on a regular basis which can lead to complex file managment :

  1. Record each search on your site in your DB
  2. Each day get the 50 (depends on your website) most famous searches and ad theme in a dedicated 'Trending searches' page, linked in the footer of your website.
  3. Set a reasonnable limit of links on this page (50 links seems OK)

A sitemap with millions of similar URIs (duplicate content) isnt useful.

  • There is no duplicate content. I will add a note to the original question. Also, the site map would not change once it was created
  • You will have some effectively duplicate content: I'm pretty sure the pages listing words starting with, say, 'eith' and 'eithe' will be pretty similar. Of course, you could fix this particular issue pretty easily with redirects or rel=canonical links (e.g. to the longest common prefix) if you wanted.

