Table of Contents
A major challenge of dealing with digital content—our own and others—is organizing it. We want to be able to find the piece of content we want, and we want to be able see its relationship to the whole and to other digital content. We might want to be able to reuse this content. Also, most important, we want other people to be able to understand the organization of our digital content so that they can find and reuse it.
Tags are one of the most popular mechanisms used in contemporary web sites for letting users organize digital content. A tag is a label, typically a word or short phrase, that a user can add to a piece of digital content, such as a photo, a URL, a video, or an e-mail (don’t confuse these tags with the tags used to mark up pages, especially an HTML page’s metatags). You can then search for digital content with those tags. As you saw in Chapter 2, when tags are embedded in URLs, you can link and embed content related by tags through those URLs.
The term folksonomy was coined to contrast tags with taxonomies, which are formal schemes typically created by communities with strict practices of classifying items. In other words, folksonomy uses an informal collection of tags provided by the community to build up a collaborative description of an item. There are few restrictions on the tags you can come up with to associate with your content. In fact, there are no preset categories or controlled vocabularies from which you must choose. Still, tags have proliferated; users have taken to them en masse, generating collections—or clouds—of tags that help order their own content as well as content throughout the Web. You can use these tags to relate content in your mashups, if you’re mindful, however, that tags can often be idiosyncratic, ambiguous, and irregular.
For now at least, tags have not led to the anarchy predicted by some taxonomists, and there is more order to how people tag than you might think, created by rules such as personal and social conventions and the syntax of tags. On the other hand, the proliferation of tagging has certainly not obviated the need for formal classification schemes. There are rich opportunities to bring together user-generated, bottom-up folksonomic tags and controlled vocabularies and taxa.
This chapter will show you how to connect content by mashing things up, with tagging as the glue. Tags allow the aggregation of resources within a system (say, pictures in Flickr—your own and others) and across web sites (Technorati).
This chapter covers the following:
It illustrates how tags are used in Flickr, del.icio.us, and Technorati.
It shows how people are using tags to create interesting apps with tags.
It discusses how people are hacking the tagging system to put more information into Flickr and other web sites, specifically geotagging, and now, more generally, machine tags.
It covers some issues around the interoperability of tags across systems, specifically through a study of Technorati.
It briefly shows how tagging relates to formal classification systems, using books as an example.
According to the Flickr FAQ,[48]
Here are some practical skills related to tags in Flickr you will learn in the following sections:
You’ll see how tags are used in the Flickr community—by individuals and by subgroups—right across Flickr to bind photos together. (It’s useful to study tags before creating your own.)
You’ll see how to tag a picture and thereby run into issues when you sit down to tag your pictures or those of others.
You’ll see how to deal with the syntax of tags in Flickr, how to use multiword tags, and how multiword tags get boiled down to canonical tags.
In Chapter 2, I presented an overview of how tags are used in Flickr, specifically how they manifest in the web site’s URL language. Here, you’ll look deeper at Flickr tags, specifically at the social context of tags in Flickr, the syntax and semantics of tags in Flickr, hacks of Flickr tags, and some remixes and mashups that build upon the Flickr tags.
Before I jump to those topics, let me present parts of the URL language concerning tags. For instance, you can see a list of popular tags in Flickr here:
http://www.flickr.com/photos/tags/
The URL for the most recent photos in Flickr associated with a tag is as follows:
http://www.flickr.com/photos/tags/{tag}/
For example:
http://www.flickr.com/photos/tags/flower/
Instead of sorting photos by the date uploaded, you can sort them by descending “interestingness” (a quantitative measure calculated by Flickr of how “interesting” a photo is):
http://www.flickr.com/photos/tags/{tag}/interesting/
Finally, for some tags, Flickr identifies distinct clusters of photos, which you can access here:
http://www.flickr.com/photos/tags/{tag}/clusters/
For example:
http://www.flickr.com/photos/tags/flower/clusters/
You can display the popular tags used by a specific user here:
http://www.flickr.com/photos/{user-id}/tags/
You can list all the user’s tags here:
http://www.flickr.com/photos/{user-id}/alltags/
You can show all photos with a given tag for a specific user here:
http://www.flickr.com/photos/{user-id}/tags/{tag}/
So, how do people actually use tags in Flickr? Look around to get a feel for how people have been tagging their photos. It is also helpful to draw upon the observations of seasoned Flickr users with respect to general trends for how tags are used—or should be used.[50]
The issue of how tags are used is complicated. To get a feel for the issues involved, let’s look at how people tag photos for July 4. You can probably imagine a number of different ways of tagging, including the following:
july4
(for example, http://www.flickr.com/photos/tags/july4/
)
fourthofjuly
(for example, http://www.flickr.com/photos/tags/fourthofjuly
)
july4th
(for example, http://www.flickr.com/photos/tags/july4th
)
july04
(for example, http://www.flickr.com/photos/tags/july04
)
july4th2007
(for example, http://www.flickr.com/photos/tags/july4th2007
)
As an end user, which tag should you use? It depends. Are you trying to use the most popular one? Flickr offers no guidance about which specific tag to use but attempts to make pictures related to July 4 all findable regardless of the exact tag used. The Flickr clustering algorithm, when applied to some of these specific tags (for example, http://www.flickr.com/photos/tags/july4th/clusters/
), groups pictures with tags aimed at describing the same phenomenon.
It is significant that you can set a default permission that allows other people (which you can limit to your family, friends, contacts, or any registered Flickr user in general) to add tags and notes to your photos—but there is no provision for letting other people change the title or description of your photo. This suggests it might be a good idea to let other people tag your photos. Think of scenarios when it would be helpful to let others tag your photos. Consider why it might not be a good idea to let other people change the title or description of a photo.
To add a tag to a photo for which you have permission, follow these steps:
Go to the Flickr page of the photo.
Click the Add a Tag link. A text box will open, and you can enter a single tag or a series of tags separated by spaces. You can also enter phrases by using double quote marks around the phrase.
You can also choose to add tags by selecting from tags you already use by clicking the Choose from Your Tags link instead of entering tags in the text box.
The Flickr tagging system is sufficiently well designed that you may never have occasion to think about the syntactical limitations of tags in Flickr. However, let’s look at a simple case study. As noted earlier, you can add phrases as tags using double quotes, such as "San Francisco"
. The tag is displayed as "San Francisco"
, but internally, it is represented with spaces and with punctuation removed and letters turned to lowercase—that is, sanfrancisco
. You can prove this by going to a picture and trying to enter "San Francisco"
and sanfrancisco
as tags. Flickr will take only one of the tags since it considers them to be the same tag.[51]
Anyone who has spent much time using tags runs into the idiosyncrasies, inaccuracies, and irregularities often present in tagging. Drawing from an analysis in the Wikipedia, I list some possible causes for these problems:[52]
Polysemy: Since words often have multiple meanings, which meaning is supposed to be associated with a tag? (For example, does the tag apple
refer to the fruit or to a computer?)
Synonymy : When multiple words can have the same or similar meaning, which tag should you use, and how do you find all the tags that mean the same? (For example, are "Independence Day"
in the United States and "July 4th"
the same?)
Word inflections: Since words are modified for specific grammatical contexts, which variation do you use for a tag? (For example, you might see mouse
and mice
.)
Syntactic constraints: How should you create tags out of phrases when spaces are not allowed? How should you deal with punctuation? How do you deal with non-ASCII words?
In this chapter, I cover the issue of word inflections (specifically the handling of single versus plural forms) and the syntax of tags, a topic that is not explicitly mentioned in this list but that presents practical difficulties in making mashups based on tags.
Web sites often leave it ambiguous whether users should use the singular or plural form for tags. When you use these tags, it’s helpful to know whether tags created with the single and plural forms are treated as the same tag.
Here I describe a small experiment to figure out how Flickr deals with this issue, one you can adapt for other web sites. I tagged one of my photos with the tag mouse
and did a full-text search and a tag search for mouse
, mouses
, and mice
. Table 3-1 records whether the photo is returned in the search.
Search Term | Full-Text or Tag Search? | Was the Picture Found? |
---|---|---|
mouse | Full text | Yes |
mouse | Tag | Yes |
mouses | Full text | Yes |
mouses | Tag | No |
mice | Full text | Yes |
mice | Tag | No |
Based on these limited observations, I can make the following tentative conclusions about how Flickr handles singular and plural English nouns in tags:
Singular and plural forms of English nouns used are considered to be different tags.
In full-text searches, Flickr uses some form of stemming to match singular and plural forms of English nouns. The Flickr stemming process is at least sophisticated enough to recognize that mouse
and mice
are related words.
Obviously, you would have to either find official documentation from Flickr or test with many more tags to validate these conclusions.[53]
The Flickr map (http://www.flickr.com/map/
), which displays Flickr photos on a map, is the official implementation of what started as a hack. Before the map, there was no official way to store the location information of a picture and display that location information on a map.
The ad hoc solution that became widely adopted was to insert geo-related information into the Flickr tags, specifically the geotagged
tag along with geo:lat
and geo:lon
, to indicate the latitude and longitude of a photo.
This convention of geotagging worked well in many ways. Hundreds of thousands of Flickr photos were geotagged according to this convention. Tools such as the Google Maps in Flickr arose to use the geotagging data. On the downside, the Flickr user interface became cluttered with tags that were meant for programmatic consumption. There wasn’t ideal support for such tags in the Flickr API (for instance, the only reason for the geotagged
tag to be there was that the API did not allow you to look for tags that began with geo:lat
).
It was to fix these problems that Flickr introduced machine tags, also known as triple tags. Machine tags are tags with a specific syntax aimed primarily for programmatic consumption and not directly for display to the typical end user. You can use machine tags to store extra data elements for a given photo. The most important example of such data has so far been the latitude and longitude associated with a photo; it’s so important that Flickr ultimately introduced specialized functionality to handle this data to prevent people from shoehorning it into tags.
Machine tags are meant to support new types of applications along the lines of geotagging by adding functionality to the API that recognizes that machine tags have a different use pattern than standard tags. Also, the UI of Flickr has changed to hide the default machine tags from users.
The syntax of machine tags, which relates the triplets of namespace
, predicate
, and value
, is as follows:
namespace:predicate=value
So, for example, geo:lat=37.866276
is a machine tag, where geo
is a namespace, lat
is a predicate, and 37.866276
is a value.
Since machine tags are still in the early stages of uptake in Flickr, which is a pioneer in the field of letting people stick place in arbitrary data into their systems, I would be surprised to find other web applications that are further along. There are some nascent developments along these fronts in Google Base (which has attributes)[54][55]
A good way to understand how tags are used in Flickr is to study how others have built on top of the tagging system. Here are several to study:
Flickr Related Tag Browser (http://www.airtightinteractive.com/projects/?related_?tag_browser/app/
) lets you browse relationships among related tags.
findr (http://www.forestandthetrees.com/findr/findr.html
) lets you display related tags and photos that have been tagged by a combination of related tags.
fastr (http://randomchaos.com/games/fastr/
) is a game in which you guess a tag based on the photo presented to you.
ZoneTag (http://zonetag.research.yahoo.com/
) is an example of Flickr tag hacking to insert location data of photos taken by cell phones.
TagMaps (http://tagmaps.research.yahoo.com/
) shows on a map popular tags correlated with geotagged Flickr photos for a region.
These examples show how Flickr calculates relationships among tags by mining information about how tags are being used. You can get a sense of how people use tags.
[49] http://riya.com
and
http://www.riya.com/riyaAPI
(for the Riya API)
[50]
http://www.flickr.com/groups/central/discuss/2026/
and
http://www.flickr.com/groups/central/discuss/2730/
[51] http://www.flickr.com/services/api/misc.tags.html
draws the distinction between the “clean” version of a tag and the “raw” version of the tag.
[52] http://en.wikipedia.org/wiki/Folksonomy
as http://en.wikipedia.org/w/index.php?title=Folksonomy&oldid=145985651
[53] The thread at http://www.flickr.com/forums/bugs/31668/
includes a Flickr staff member confirming the use of stemming in titles and descriptions. http://tech.groups.yahoo.com/group/yws-flickr/message/1913
mentions stemming in the context of tags. http://www5.flickr.mud.yahoo.com/help/forum/37259/#reply211324
shows why these things happen.
[55] http://docs.amazonwebservices.com/AmazonS3/2006-03-01/BasicsObjects.html
and http://docs.amazonwebservices.com/AmazonS3/2006-03-01/RESTObjectPUT.html
, where you can stick in user metadata (name/value pair).