Get Bots to Crawl your Site More Effectively
Benu Aggarwal, Milestone’s Founder and President, caught up with SES Chicago keynote speaker Maile Ohye, Senior Developer Programs Engineer for Google, and asked Maile for her top 5 tips on getting better results from Google bots crawling websites.
5 Tips to make your site better crawled and indexable:
- Strong Site Architecture
- Content should be indexable
- Check crawl errors and resolve from Google webmaster
- Use “Fetch as Googlebot” to verify all redirect and rewrite URLs
- Disallow searchers and URL parameter handling
Here’s a more in-depth look at each one of those tips:
- Make sure you have strong site architecture. All URL should be reachable from home page and child page.
- Make your whole content indexable. Means textual content should be plain text and should not be hidden in image and video. For Image- content, use alt tags to index it. Use Video sitemap to make sure videos are indexable.
- Check crawl error in webmaster tools. Find and resolve URLs which are unreachable or not found (404 errors). Make sure user from your site or external site, clicks on link to reach on your site should come on correct page where he should be.
Google webmaster tool also give you crawl error sources. Webmaster will tell you actually from where this broken link came. So ask that webmaster to correct that link or do 301 redirect to resolve any typo errors.
- Use “fetch as Googlebot.” Fetch as Googlebot will help you to verify 301 redirects and URL rewrites working properly. It’s in webmaster tool under the labs feature.
To use “Fetch as Googlebot”:
- Login to Webmaster Tools
- Select your site
- Go to Labs -> Fetch as Googlebot
- Fine tuning your crawling. Disallowing stuff which is not helpful for searchers. For example, it’s good to use product names in place of “shopping cart” in the URL because people search for the product name and not for “shopping cart.”
Use URL parameter handling in Google webmaster tools. You can tell Google webmaster which parameters on the site are important and which do not change the content and are therefore unimportant.