8 SEO Reasons to Crawl Your Sites

My latest article at Ecommerce Developer, read it in full here.

Xenu2_thumb

The first thing I do when working with a new site is set my favorite crawler on it. This gets me acquainted with all the URLs, site sections, interlinkings, forgotten pockets, scars and warts. A good crawler offers a wealth of data useful not just to search engine optimization, but also to site maintenance in general.

Luckily, some great crawlers are free. You’ll find pages of options just by Googling “web crawler” or a similar term. Xenu Link Sleuth is my favorite for the price — it’s free — and for the broad assortment of data collected on every URL it crawls. GSite Crawler is another good, free alternative. It’s focused mainly on creating XML sitemaps and feeds, but it’s good for other uses as well.

Read more »

XML Sitemaps: Dexy’s Midnight Runners of SEO

Dexy's Midnight Runners, "Come on, Eileen"Yesterday on the train, Brian R. Brown and I were chatting about orphaned pages, XML sitemaps and indexation without benefits. Brian referred to XML sitemaps as the “one hit wonder of SEO.” Brilliant! XML sitemaps, like Dexy’s Midnight Runners, are one hit wonders.

Dexy’s Midnight Runners, for those of you who missed the 80s, are famous for their one hit “Come on, Eileen.” XML sitemaps are famous for inviting the crawl. And just like Dexy’s Midnight Runners don’t have any other great songs, XML sitemaps really don’t provide anything other than a way to request that search engine spiders crawl your site. This comparison just begs for a Weird Al-style lyrics mod:

Come on Crawl Me,
I swear (well he means)
At this sitemap
You’ll find everything…

Actually Blondie’s “Call Me” was screaming for a “Crawl Me” spoof, but you can hardly call Blondie a one-hit wonder. Anyway, back to XML sitemaps.

What XML Sitemaps Do

  • Invite search engines to crawl specific URLs

What XML Sitemaps Do Not Do

  • Guarantee crawling of URLs included in the XML sitemap
  • Block crawling of URLs not included in the XML sitemap
  • Guarantee indexation
  • Improve rankings
  • Drive traffic or sales
It reminds me of  the horseshoe nail proverb:
For want of the crawl indexation was lost.
For want of indexation rankings were lost.
For want of rankings the visitors were lost.
For want of visitors the site was lost.
And all for the want of a crawl.

I’m taking a few liberties, but the premise is the same. No crawl, no organic search visitors. End of story. In this regard, XML sitemaps play a role in the initial discovery of your URLs.

The XML sitemap rolls out the red carpet and invites search engines to crawl and index the URLs you’ve so thoughtfully included. This, in turn, can increase indexation for large, complex sites that contain of thousands of pages. On such sites it could take even a committed bot (like Googlebot) many visits to crawl the whole site, especially if it keeps encountering duplicate content. Less thorough bots (I’m looking at you Bingbot) might take even longer to discover new content. A conscientiously updated and autodiscoverable XML sitemap helps bots find new URLs, which should speed time to indexation and rankings if the content is valuable.

Learn more about XML sitemaps at Google Webmaster Tools.

PS: “Come on, Eileen” makes me involuntarily dance like Elaine. It’s not pretty but I love the song anyway.

Cache in Hand

Google’s cache view is a valuable window into how googlebot “sees” a site. I find myself stalking the cache every couple of days in an effort to untangle architectural challenges to my SEO objectives. When I’m having difficulty helping colleagues understand why content or links are or aren’t crawlable, I often take them to the cache view as a quick and easy visual. Once they see what the bots see, from the googlebot itself, the conversation around how to resolve the issue is usually much easier. I wanted to include it in my article on advanced search operators at Pratical eCommerce last week, but I hit the word count cap. So here’s the scoop on cache.

A site’s architecture and the technology choices its development team choose can make or break the bots’ ability to crawl a site. The cache view offers a quick window into the bots’-eye view. For example, most humans surfing with a modern browser that incorporates JavaScript and accepts cookies will see Dell‘s homepage like this:

Dell.com HomepageDell Homepage As a human I am able to use the drop down menus to navigate to the main areas of the site, quickly consume many of Dell’s priority messages from the static feature boxes and the flash carousel, and browse the basic HTML links toward the bottom of the page. Dell makes its marketing priorities very clear and easy to understand… for humans with modern browsers. But what about the bots? What content can they consume? Let’s take a look at the cache view [cache:www.dell.com]:

With the cache view the page looks remarkably similar. There’s a gray header at the top of the page indicating that Google last cached this page on Oct 4, 2010 18:22:07 GMT, one hour and one minute ago at the time of this article. So any changes that Dell made to the site in the last 61 minutes will not be reflected in this cache view. That’s a very important note when you’re trying to confirm the crawlability of some new architectural change — make sure the change has been cached before you start analyzing the cache view.

Second thing to consider is that the cache view shows a far more human-centric view of the page than I’d expect. That’s because the initial cache view is still using your modern browser to execute the Javascript, CSS and cookies that the cached page calls. To see the bots’-eye view more realistically, we need to disable those by clicking on the “Text-only version” link in the upper right corner in the gray box. Now we see:

Now we’re seeing the textual version of the site, stripped of its technical finery. The rollover navigation in the header no longer functions. The links to main categories are still crawlable as plain text links, but the homepage doesn’t pass link popularity down to the subcategory pages. Depending on the marketing value of those pages, the lack of link juice flowing there could be an issue. The next thing we see is that the big lovely flash carousel, so front-and-center for human consumption, doesn’t exist without JavaScript enabled. Assuming the pages displayed in the flash piece are valuable landing pages, which they likely are to warrant homepage coverage and development time, this again is a missed opportunity to flow link juice to important pages. Both of these issues, the navigation and the flash carousel, could be coded to degrade gracefully using CSS to provide the same crawlable text and links to bots as well as humans.

Just to be safe any issue I see in cache view (or any issue that I don’t see that I expect to see) I double check as well by manually disabling my JavaScript, CSS and cookies; and I also set my user agent to googlebot. For more detailed information on the FireFox plugins I use to do this, see Surfing Like a Search Engine Spider on Practical Ecom. Cache view is a quick way to investigate to decide if a deeper analysis is required.

Note: The cache: operator only works on Google, but Yahoo Site Explorer offers a cache link on each page in its report as well. Bing does not support the cache: operator.

Share Web PieRat, Matey

Redesign with SEO in Mind


Tweet Me

Sphinn Me

A site redesign or switch to a new platform is kind of like a rebirth – it’s one of the most exciting and nerve-wracking times for the entire Internet marketing team. With everyone caught up in the branding, design, usability and technology, the impact on SEO can sometimes be forgotten until the last minute.

I wrote this article on redesigning a site with SEO in mind back in July for MultichannelMerchant.com and gave up looking for it to be published… so I missed its publish date in September. Maybe you did too. Here’s a redux of the original article.

While it’s difficult to determine what the natural search impact will be until working code hits a development server, keeping several mantras in mind and repeating them liberally will keep the team focused on the most critical elements to plan for SEO success. I love these mantras — I actually say them to myself as I’m auditing sites.

SEO Development Mantras

  1. Links must be crawlable with JavaScript, CSS and cookies disabled.
  2. Plain text must be indexable on the page with JavaScript & CSS disabled.
  3. Every page must send a unique keyword signal.
  4. One URL for one page of content.
  5. We’re going to 301 that, right?

When a site is stable on the development environment and the URLs are ironed out, identify a 301 redirect plan, build the new XML sitemap and make sure you have a measurement plan in place to measure the impact of the relaunch.

All this in more detail at the original article at Multichannel Merchant: http://multichannelmerchant.com/ecommerce/0901-using-seo-redesign/index.html

Share Web PieRat, Matey


Delicious

DIGG

Facebook

FriendFeed

Google

Linked In

MySpace

Reddit

Sphinn

Stumbleupon

Technorati

Twitter

Yahoo Buzz

Thanks for sharing.