Technorati is a search engine, focused primarily on searching weblogs but also “tagged social media” (specifically, photos in Flickr and videos in YouTube). Technorati is an excellent case study of how a web site crawls for tags on the Web and then uses those tags to organize digital content. (Think of Technorati as a big tag-based mashup.) Let’s now look in detail at how Technorati presents tags to users and how it finds the tags in the first place.
The primary emphasis in the Technorati user interface is on searching by tag. In fact, the default search is a tag search. For instance, a search for the term mashup brings you to this page:
http://technorati.com/tag/mashup
Generally, items for a given tag are at the following URL:
http;//technorati.com/tag/{tag}
where {tag}
is the URL-encoded version of the UTF-8 encoding of the tag. The items are broken as follows:
Blog posts (http://technorati.com/posts/tag/{tag}
)
Videos (http://technorati.com/videos/tag/{tag}
)
Photos (http://technorati.com/photos/tag/{tag}
)
Weblogs (http://technorati.com/blogs/tag/{tag}
)
Note that you can string tags together with OR to search for multiple tags.
A quick way to get a feel for Technorati is to look at the “most popular” search:
Technorati derives its tags from a variety of sources, as documented at http://technorati.com/?
help/tags.html
:
Categories embedded in Atom and RSS 2.0 feeds. (See Chapter 4 for more on feeds.)
Tags in links using the rel-tag microformat, such as <a href="http://technorati.com/tag/{
tagname}" rel="tag">tagname</a>
. (See Chapter 18 for a complete description.)
Tags from public photos in Flickr.
Tags from public videos in YouTube.
As with Flickr and deli.cio.us, singular and plural nouns in tags are not conflated. For example, the following:
http://technorati.com/tag/mouse
and the following:
http://technorati.com/tag/mice
return different results. Technorati is, however, able to recognize that mouse
and mice
are related tags, as are peripherals
and animals
. Unlike Flickr, but like del.icio.us, punctuation in Technorati tags is significant in tag-based searches. For example, the following:
http://technorati.com/tag/san+francisco
returns different results from the following:
http://technorati.com/tag/san-francisco
Tag searches are not case sensitive in Technorati, though other applications that use the rel-tag
microformat may be case sensitive. Through rel-tag
, you should be able to pass in the full range of non-ASCII words as tags. (See the “Representation of Latin-8 and Unicode Characters” sidebar on representing non-ASCII characters in tags to learn more.)
The next time you want to make a mashup of digital content based on tags, you can model what to do on how Technorati has dealt with making tags from different web sites work (interoperate) with one another. Moreover, you can leverage its work by linking directly to Technorati (through its URL language) or by using its API (http://technorati.com/developers/api/
).