Which are the best blog search engines?




Who Really Are the Elite Bloggers? And What are the Best Ways to Measure a Blog’s Reach?

 

Our NYU college intern Chris Duncan and I (much more Chris than I) have been researching the efficacy of the various blog ranking engines:

 

As part of the Virtual Handshake marketing plan, I was charged with the task of identifying the top influencers in the blogosphere. We are in the process of reaching out to them, mentioning our book, and encouraging them to consider writing about it. The easiest way to identify these thought leaders is to review the ranking lists created by the blog directories or search engines, including PubSub , Technorati , Bloglines , Blogrolling , Feedster , and Icerocket . Influencers use these lists as leverage in ad rate negotiation, to establish their level of influence, and simply as a bragging point. More generally, as in most areas of life, success is sticky. The more visible sites attract more visitors, and therefore become even higher ranked.

 

These sites profess to show the top blogs. But how do they measure popularity, and how accurate are they?

 

PUBSUB

Top 5 Sites by Inlinking Sites for one day

Top 5 Sites by Outlinking Sites

Top 5 Sites by Link Rank (based on number of incoming links, plus other factors such as recency)

photos1.blogger.com

well978area.seesaa.net

bbc.co.uk

purl.org

wantmedi777.seesaa.net

nytimes.com

livejournal.com

flickr.com

google.com

blogthings.com

thebostonblog.com

news.bbc.co.uk

static.flickr.com

if669nobe.seesaa.net

washingtonpost.com

Site ranks and link numbers as of September 18, 2005

PubSub has two different types of top lists, "in-links" and "out-links". Both have limited relevance. The in-link list counts how many sites link to a site (each site only counts once) and the out-link list counts how many sites a blog links to. The problem is that neither of these two lists have many actual quality blogs in them. The in-link list is cluttered with news sites, blog resource sites, and random spam. 20+ sites that make the list are adult web sites, and they’re not even good adult sites . The majority of the sites on the out-link list are spam, since all you need to do to appear there is post large numbers of links outgoing from your site. No sites appear on both of the lists. Add all of these factors together, and it is extremely difficult to find any influencers on this list.

 

Pubsub does have an interesting feature called “Site Stats” which tracks the in-links, out-links, and entries (new and modified entries discovered in a site’s feeds). It also creates graphs of these statistics. PubSub claims that “the intent of this system is not to measure the strength of any particular domain, but rather the relative likelihood that you’d find and follow a link to that domain. As such, the links are what’s really important, not the pages themselves.” I interpret this to mean that they are calculating the odds of reaching a certain site instead of measuring in-links, visitors, or hits. Regardless of their claims, the list is of limited accuracy, as shown by the .info spam domains that appear on both lists. No list is ever going to be perfect, but PubSub doesn’t appear to incorporate any more sophisticated criteria than the raw number of links. A better system would incorporate other factors, e.g., how credible are the sources of the links, while weighting them appropriately.

PubSub’s LinkRank system is much more effective than the other two lists they maintain. Initially, I was skeptical of their new system because it had a spam site listed number one the day after they implemented a new ranking formula. Since then, the rankings have fallen in line with my expectations and it seems to have evolved into a useful list.

 

TECHNORATI

 

Technorati’s top list has many more relevant entries. Its primary criterion is the same as PubSub, the number of sites that linking to a particular site. The top sites on PubSub’s in-link list approach 2,000 links for the top 10; on Technorati, they are all above 8,000 links. .

 

# of Sites Linking to Top 5 Blogs

www.boingboing.net

16,913

dailykos.com

11,245

fark.com

10,837

www.engadget.com

10,803

www.gizmodo.com

10,719

Site ranks and link numbers as of September 18, 2005

 

Alexa , a site commonly used to evaluate a site’s traffic, also has conflicting rankings for the top five sites above. BoingBoing is not first in Alexa traffic rank out of all the sites; Fark is now number one. By checking more relevant sites, Technorati does seem to capture the “buzz” better than Google or Alexa. Technorati is clearly better than PubSub in terms of analyzing in-link lists, although, as Jason Calacanis notes , they still have some less-than-relevant entries such as Yahoo Messenger on the list. I’m not sure how they’re reducing the number of spam entries, but they’re probably doing it by weighting the source pages by the number of links that go into them. That’s exactly what Google does for the Web as a whole. Jason Kottke says that Technorati also has trouble counting links . A post of his received 159 trackbacks (!), but Technorati only listed it as having 93 sites linking to it.

 

BLOGLINES

Top 5 Sites by Subscription

www.bloglines.com

www.wired.com

www.dilbert.com

www.boingboing.net

www.engadget.com

 

Bloglines takes a different approach by looking at feed subscribers instead of links to the web site. Their list looks like a good mix of PubSub and Technorati. It has sites such as BoingBoing, Gizmodo , and Engadget which also appear on Technorati’s list , as well as the New York Times and BBC which show up on PubSub. Bloglines also provides a simple description beneath each link, which helps you to scan a list of bloggers very rapidly.

 

The Bloglines system has a clear advantage over both of the other lists in that it is an “opt-in” list. The advantage of using feeds is that it more accurately measures the interests of the average blog reader (who doesn’t have an active blog), as opposed to the opinions of fellow influencers (who run blogs that can carry link weight). Bloglines is comparable to using TV’s Nielsen Ratings to evaluate a show’s popularity, as opposed to asking the commentators who speak on PBS what TV shows they most like. Of course, the downside is that feeds are very susceptible to manipulation. I’m assuming that Bloglines has already run into the problem of fake subscribers used to pump up perceived traffic for certain blogs. As of now, I would rank Bloglines as more useful for our purposes than PubSub and Technorati.

 

BLOGROLLING

Top 5 by Site Linking

instapundit.com

www.boingboing.net

dooce.com

slashdot.org

dailykos.com

 

 

Another approach can be seen with the Blogrolling Hot 500 . Blogrolling is a service that manages the link lists on blogs to make trading traffic easier. The fact that they count permanent links instead of looking at individual blog posts makes the list unique. When a blogger blogrolls a site, he has indicated his belief that that site is worth monitoring on an ongoing basis. Blogrolling also provides a reverse look-up feature that shows which blogs are linking to a specific site. We can see that some of the same sites appear in these rankings as the top five, including Instapundit , Boing Boing , Dooce , and Daily Kos . The only new site to appear in this top five is Slashdot. It really is quite a battle to move up this list; #500 is linked to by 125 users while Mark Cuban and The Guardian are tied at 200 links (which puts them at #228). This is a testament to the impressive reach that the top ten sites have. Michelle Malkin comes up on just over a thousand Blogrolling lists to put her at the tenth spot. I would say that the power of this list lies in the users of Blogrolling rather than their ranking system. It’s simplistic and imperfect (sites are listed twice but with different domains; www.wonkette.com is 20th with wonkette.com at 21st ; their link totals combined would put Ana Marie Cox easily in the top five).

 

FEEDSTER

Top 5 on Feedster 100

Top 5 on Feedster 500

www.wired.com

www.engadget.com

slashdot.org

www.deviantart.com

google.blogspace.com

www.boingboing.net

www.boingboing.net

www.albinoblacksheep.com

www.dilbert.com

dailykos.com

 

Feedster provides both an RSS feed aggregator as well as a search tool. They maintain two different top lists which use different analytic techniques: the Feedster Top 50 0 and the Feedster Top 100 . The Feedster Top 500 is much like the in-link lists that Technorati and PubSub have. I’m assuming (they don’t say) that the in-links counted by the Feedster Top 500 are taken from individual blog posts. This puts the top five sites at well over 20,000 links each; it takes 809 links to make the top 500 list. While the Top 500 is more of the same, the Top 100 is more useful. The Feedster Top 100 is like Bloglines; it looks at feed subscriptions to form its list. Boing Boing makes this top five, yet again. Otherwise, we see different sites but ones that we would expect to make the top five: Wired, Slashdot, and Dilbert. I’m not sure how fairly special-interest Google Weblog ended up on this list; perhaps Aaron Swartz has a lot of friends who use Feedster.

 

ICEROCKET

 

The last site I looked at was IceRocket, backed by Mark Cuban. They don’t have any ranking lists, but it is a useful resource. At IceRocket, you can search for blog posts by topic or by URL (find out who is linking to it). This way, you can also compare trends for certain topics to see how they are being posted about over about a two month period.

Link tracking is the other useful tool provided by IceRocket. They provide a service that lets bloggers track the links to their own posts with a short line of code. This can also be done through the search feature by entering a URL. The aforementioned post by Jason Kottke comes up with 59 sites linking to it. You can really see whose posts are creating buzz around the internet.

 

GOOGLE BLOG SEARCH

Google has just released their version of the blog search engine. Its advanced functions and speed are consistent with other Google products. It can search for blog title, authors, and by date (but not any posts before March of this year). Unfortunately, they do not provide a general ranking list of the top X number of blogs. However, we can use the "link:" feature to see how many sites are linking to some of the more popular sites we’ve seen. I’ll also compare this to their Google "link:" numbers so we can see how the differ.

 

Google Blog Search (# blogs linking to this blog)

Google

(# sites linking to this site)

www.boingboing.net

18,042

53,000

www.fark.com

1,462

52,100

slashdot.org

10,297

185,000

dailykos.com

1,794

7,120

www.engadget.com

1,170

62,800

Obviously, the numbers vary greatly. This can be attributed to the fact that the Google Blog Search has a much smaller pool of sites in which it finds results and possibly in part due to the fact that these results are only from the past 6 months. In any case, Google Blog Search rankings and in-links will probably become important benchmarks.

 

CONCLUSIONS

 

There is obviously no perfect ranking system on the internet–either for blogs or for sites in general– and there probably never will be. Our recommendation is that PubSub try to focus their results on real blogs; too many of their results are commercial sites or blog utilities. Technorati needs to become more accurate in their counts of links. Bloglines and Feedster could be substantially improved if they took the next step and separated the blogs into categories. Blogrolling is always going to be an imperfect system, but they should really remove sites off of their ranking list that haven’t been updated in months.

The ideal system would probably incorporate elements from all of the services that I’ve discussed–link tracking, in-linking, feed subscriptions, tags–and then find the best way to weight them. David Teten observed to me that it’s a (much-disputed) truism of biology that " ontogeny recapitulates phylogeny ,” but it’s happening here. Blog search engines and ranking tools are recapitulating the evolution of Web search engines such as Google.

Two more possible elements that might allow for a more comprehensive grade (and probably more controversy) would be visitor and expert rating systems. The latter would only come into play if a site has enough strength in the other categories to make it to the top list. With these tools, there would be a human element of judgment added to these rankings.

Blogs need a tool tailored to their type of content. It is clear that traditional search engines have trouble handling blogs because they are slow to react to new content and topics and do not have a reasonably comprehensive database of all the blogs that exist. These new search tools take the nuances of the blogosphere into account. They strive to correctly identify blogs and posts by their relevance, timeliness, and popularity. Eventually, more criteria will be added to their equations. As more and more websites incorporate blog-type functionality (frequent updating) and technology (RSS), figuring out how to search blogs will be more and more important.