Gathering Content Through Tags in Technorati

Technorati is a search engine, focused primarily on searching weblogs but also “tagged social media” (specifically, photos in Flickr and videos in YouTube). Technorati is an excellent case study of how a web site crawls for tags on the Web and then uses those tags to organize digital content. (Think of Technorati as a big tag-­based mashup.) Let’s now look in detail at how Technorati presents tags to users and how it finds the tags in the first place.

Searching Technorati with Tags

The primary emphasis in the Technorati user interface is on searching by tag. In fact, the default search is a tag search. For instance, a search for the term mashup brings you to this page:

http://technorati.com/tag/mashup

Generally, items for a given tag are at the following URL:

http;//technorati.com/tag/{tag}

where {tag} is the URL-­encoded version of the UTF-8 encoding of the tag. The items are broken as follows:

  • Blog posts (http://technorati.com/posts/tag/{tag})

  • Videos (http://technorati.com/videos/tag/{tag})

  • Photos (http://technorati.com/photos/tag/{tag})

  • Weblogs (http://technorati.com/blogs/tag/{tag})

Note that you can string tags together with OR to search for multiple tags.

A quick way to get a feel for Technorati is to look at the “most popular” search:

http://technorati.com/pop/

How Technorati Finds Tags on the Web

Technorati derives its tags from a variety of sources, as documented at http://technorati.com/? help/tags.html:

  • Categories embedded in Atom and RSS 2.0 feeds. (See Chapter 4 for more on feeds.)

  • Tags in links using the rel-­tag microformat, such as <a href="http://technorati.com/tag/{ tagname}" rel="tag">tagname</a>. (See Chapter 18 for a complete description.)

  • Tags from public photos in Flickr.

  • Tags from public videos in YouTube.

Word Inflections and Syntactic Constraints in Technorati Tags

As with Flickr and deli.cio.us, singular and plural nouns in tags are not conflated. For example, the following:

http://technorati.com/tag/mouse

and the following:

http://technorati.com/tag/mice

return different results. Technorati is, however, able to recognize that mouse and mice are related tags, as are peripherals and animals. Unlike Flickr, but like del.icio.us, punctuation in Technorati tags is significant in tag-­based searches. For example, the following:

http://technorati.com/tag/san+francisco

returns different results from the following:

http://technorati.com/tag/san-francisco

Tag searches are not case sensitive in Technorati, though other applications that use the rel-tag microformat may be case sensitive. Through rel-tag, you should be able to pass in the full range of non-­ASCII words as tags. (See the “Representation of Latin-8 and Unicode Characters” sidebar on representing non-­ASCII characters in tags to learn more.)

The next time you want to make a mashup of digital content based on tags, you can model what to do on how Technorati has dealt with making tags from different web sites work (interoperate) with one another. Moreover, you can leverage its work by linking directly to Technorati (through its URL language) or by using its API (http://technorati.com/developers/api/).