Interested in SEO Blogs? |
Home | News | Papers | Dev | Reviews | Random | About the Editor | Contact

SEO Metrics: A Programmer's Guide

Posted by Miles Evans

SEO MetricsMy team took a small break on youmash and while I wait for batteries to be recharged I thought I would start enhancing the SEO toolset we started deving towards the middle of last year. Although I got sidetracked with more pressing projects the tools are quite usable and I think we should be able to make a formal release soonish.

Since I have been researching this again I figured I may as well publish a comprehensive list of the most common and useful SEO metrics along with their method of retrieval. Might be handy if you are deving out your own SEO tools or just curious how some of this is gathered.

Google

Backlinks (link:www.profitpapers.com) – Infamously annoying, Google does not reveal much of their backlink data. What you get instead is a small sampling (between 5-10%) of what seems to be a fairly random link set. People still want to see it of course.

Gareth Davies has a video via YouTube on gathering backlink data with Google if you’re new to SEO.

Google recently deprecated their Soap Search API and no longer hand out API keys. This is an odd thing to do as there was literally volumes written about interfacing with this API and it was quite flexible. Google is now urging developers to use its AJAX Search API to obtain search data. Of course, there is always good old regular expression parsing in PHP to collect this data if the new API changes scared you off. I talk about regex briefly at the end of this article.

site:www.profitpapers.com shows how many pages are indexed. This will also reveal if your website has landed in the supplemental results. Check out Gareth’s video on the site: command.

Different from link count, a search for your domain will give you an idea how much chatter there is about your website. Some people call this domain visibility. To see how many times your website address appears on other people's pages use this syntax: "www.profitpapers.com" -site:www.profitpapers.com

Google results for allin commands are fairly useless but here is how they are done:

allinanchor:organic seo will show you how many pages are using the keywords ‘organic seo’ in links within their pages. Check out Gareth’s video on allinanchor.

allintitle:organic seo will show you the pages using the keywords ‘organic seo’ in the title’s of their pages. Check out Gareth’s video on allintitle.

Yahoo

Yahoo has recently started redirecting all backlink requests to its Site Explorer app. You should also be sure ‘except from this domain’ is selected when viewing your results or sending API calls.

Yahoo returns a strong sampling of most of your backlinks (80-90% ?). The results also seem to be improving in relevance with more important pages appearing first. The data is accessible for developers via the Yahoo Site Explorer API which uses a very convenient REST based system and overall seems to have been widely adopted.

Gathering inbound link numbers with the API is fairly simple. Start here for the programming parameters available. For a quick example pulling backlinks you can try this rough code.

To see live Yahoo backlink data:

linkdomain:www.profitpapers.com - shows all pages linking to anywhere in the domain www.profitpapers.com.

link:www.profitpapers.com/papers/performance-tuning-mysql-for-load.php - shows all pages linking to the URL provided.

What’s more you can also use search modifiers with Yahoo to see how many links from a particular domain you have - like .edu, .gov etc. IE: linkdomain:www.profitpapers.com site:.edu

site:www.profitpapers.com reveals how many pages are indexed in Yahoo. The results are shown in Site Explorer.

MSN Live!

MSN counts nearly as many backlinks as Yahoo in most cases. Like Yahoo MSN has recently shown interest in providing link data and for that reason their results are really quite useful. The data can be retrieved via the MSN Search API together with nuSOAP.

linkdomain:www.profitpapers.com - shows pages linking to anywhere in the domain www.profitpapers.com.

link:www.profitpapers.com/papers/performance-tuning-mysql-for-load.php - shows pages linking to the URL provided.

MSN does provide allin commands but the results are so far off they seem almost useless to me. They are also a bit wonky to use so you will want to check out the official syntax.

site:www.profitpapers.com - reveals how many pages are indexed in MSN live.

Domain Age

Older domains carry more weight in the eyes of SEs particularly Google. There are a few different ways to get this data but I would just scrape the wayback machine like everyone else does. IE: http://web.archive.org/web/*/www.profitpapers.com

Class C IPs

These are useful for tracking as links coming from the same IP range of servers, holds little to no weight for SEO. People have A/B tested this in the past and it is still relevant today I suppose. Collect the data through DNS queries. This is pretty straight forward to do in PHP.

Alexa

Alexa’s toolbar data has long been a talking point for SEOs. Is it useless? Not completely. The data is based off of the surfing habits of the small subset of users who have installed the Alexa toolbar. I won’t get into it too much here but the common metric SEOs concern themselves with is Site Ranking.

Another cool feature with Alexa is dynamic graphs of one or more websites with their rankings. The graph is highly configurable via simple query string commands. For an idea of how this works check out Iconico’s Alexa graph tool.

Alexa also provides thumbnails of websites, search, and other widgets. They really have a robust and well implemented set of API’s for developers, and they seem to always be releasing new features. The cost is $0.15 per 1000 queries which is also quite affordable/scalable.

Technorati

Recently people have become more and more interested in Technorati numbers - me included. Technorati provides blog rankings based on inlinks to various blogs. The relevant SEO data from them includes: blog ranking, inbound links, and inbound blogs. IE: Here are Profitpapers Technorati numbers.

Technorati makes it ultra simple to collect their data via a REST based API returning formatted XML for easy management.

You may also want to check out Ducksoup which is an easy to use and straight forward API library for Technorati written in PHP.

DMOZ

Getting into DMOZ is notoriously difficult and for all intents and purposes is really just a discarded relic from Google’s oldskool reliance on the directory. Data from the ODP is downloadable in RDF format. In most cases I think a lot of people scrape their data. I wouldn’t expect a usable API or any future developments coming from DMOZ.

Del.icio.us

Many SEOs are watching social bookmarking sites like del.icio.us and digg for SEO metrics as these types of sites are now directing huge swaths of organic traffic and link love. del.icio.us will display how many users have saved your website to their bookmarks. IE: http://del.icio.us/url/6b59b61d5221b3699496d9f5c1a40d9e. They also have a newish API in developement with a few nifty features.

Digg

Digg is my favorite social news type of site. There are others but digg is definitely the 800lb gorilla of news promotion. They are always working on tools and developments but sadly nothing in the way of story metrics. It might be cool to display the amount of dugg stories for a particular URL. For that you could scrape away at something like this I guess.

A Note on Screen Scraping

Okay so we all know that if you can see it on a website you can data mine it. One problem with screen scraping HTML is that if the website changes at all, your code may break. So there is some maintenance involved in many cases. Another issue is that most companies do not take kindly to your plundering of their hard work. Use a little diplomacy. You will want to use an API over some kludgy parse and scrape, so always try to keep that mindset. The 90’s are over.

However, if there is no other option, the easiest way to do some complex and maintainable screen scraping is with PHP using regular expressions. Ruby might be another fine option. For a primer on scraping data of all sorts see the wiki.

Well I think this covers most of them! This article was a bit of a brain fart and I may have missed some useful things to gather, so feel free to chime in ;)

You may also want to signup to be invited to use the current beta build of our own SEO tools employing many of the metrics outlined above.

Posted Jan 29, 2007 at 07:47 PM | | Trackback URL | Del.icio.us | DIGG!

Comments

Okay that was a decent list! Dugg. I can't really think of anything you missed.

Niiiiice list =) I learned a great dea reading through that.

Dugg the data digging techniques! Informative read.

Wow.Very informative.Digg it....C

ha! never new it existed
thanks.

Useful list to gather information of different resources, thanks.

Thanks for putting to gether this great information and passing it on to all of us.

Wow this is a stellar list Miles. This stuff changes a lot and I was actually scraping with PHP up until now. I am going to take a hard look at the APIs and see what the differences are going legit...

One comment.... I use Google Blog Search to check about buzz and new backlinks to my site. I wish there was an API available but too bad there is not.... I would not expect Google to release one because with the Blog Search tool you can sort of get a peak at Googles own list of relevant backlinks to your site.... To that end I am scraping this data in my own private tools ;)

Hope that helps.

a good informative list

I wish there was an API available.

This is an OLD SEO article but still relevant for today's market.

Post a comment

* Required

* Required but not displayed

Home | News | Papers | Dev | Reviews | Random | About the Editor | Contact
Free Backlinks
Copyright © 2004-2007 ProfitPapers.com. All rights reserved. All other logo's and trademarks herein are © their respective owners.
You may not use any of this content without written permission from ProfitPapers.com.
For contact inquiries please see the contact us page.