A Spidering We Will Go
So I’m sitting here this morning working on a site spider tool that can be used to generate Google SiteMaps, and I find the more I spider, the more I must tweak my spider! It seems that every new site I test this against throws another link text format at me.
One would think that a simple href could be written consistently, so that my simple spider could process pages most efficiently. For those of you who think in simple terms like me, here are a few things to look out for when writing your own spider…
- href’s that do not use opening and closing quotes (either single or double)
- url’s with spaces in them (e.g., href=”ANNUAL REPORT.pdf”)
- pages that use redirects, whether temporary or permanent
- url’s that contain ‘../’
I realize that these things are really acceptable within HTML. However, for my own selfish reasons, I implore you to use simple links with no embedded spaces (escape them please) and no ..’s when you are writing your HTML. You will make my life much easier. And we all know that making my life easier is the most important thing to do! ;)
