Folksonomies and tagging

Definition

Folksonomies is the new way of organizing human knowledge through the free use of tags by a community of peers.

Discussion

Categorization is both strongly influenced by and a powerful reinforcer of ideology, it follows that revolutions (political or scientific) must change the way things are sorted in order to throw over the old system. (quote from Rob Lightner)

Three Ways to Organize Human Knowlege

There are three broad ways to organize human knowledge:

"Taxonomies are suitable for classifying corpora of homogeneous, stable, restricted entities with a central authority and expert or trained users, but are also expensive to build and maintain. Faceted systems (a sort of polyhierarchy) are useful with a wide range of users with different mental models and vocabularies. They are also more scalable because new items (for users) and new concepts (for cataloguers) can be added with a limited impact and with no need to start a new classification from scratch.

Folksonomies require people to do the work by themselves for personal or social reasons. They are flat and ambiguous and cannot support a targeted search approach. However, they are also inexpensive, scalable and near to the language and mental model of users." (http://www.iskoi.org/doc/folksonomies.htm )

Why Classificatin is Different in Digital Environments

David Weinberger explains why classification can be different in digital environments:

"In the physical world, a fruit can hang from only one branch. In the digital world, objects can easily be classified in dozens or even hundreds of different categories; In the real world, multiple people use any one tree. In the digital world, there can be a different tree for each person. In the real world, the person who owns the information generally also owns and controls the tree that organizes that information. In the digital world, users can control the organization of information owned by others." (David Weinberger in Release 1.0.: http://www.release1-0.com/, reproduced in JOHO blogf)

Clay Shirky has also outlined the different conditions where the use of tagging (i.e. folksonomies) may be better than the old models:

The old ‘ontological’ methods of cataloguing work when the domain to be organized is a small corpus with formal categories, consisting of stable and restricted entities with clear edges; the participatns are expert, authorative, coordinated. However, tagging is suited when the domain is characterized by a large corpus without formal categories, with unstanble and unrestricted entities having no clear edtes. Participants are uncoordinated amateurs without authority. (paraphrased from http://shirky.com/writings/ontology_overrated.html)

Why Tagging is better than Metadata

Clay Shirky also explains why tagging are better than metadata

"This is something the 'well-designed metadata' crowd has never understood -- just because it's better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture." (cited by Cory Doctorow in theBoing Boing blog, January 2005)

“a lot of what we think we know about categorization is wrong. In particular, I want to convince you that many of the ways we're attempting to apply categorization to the electronic world are actually a bad fit, because we've adopted habits of mind that are left over from earlier strategies. I also want to convince you that what we're seeing when we see the Web is actually a radical break with previous categorization strategies, rather than an extension of them. The second part of the talk is more speculative, because it is often the case that old systems get broken before people know what's going to take their place. (Anyone watching the music industry can see this at work today.) That's what I think is happening with categorization. What I think is coming instead are much more organic ways of organizing information than our current categorization schemes allow, based on two units -- the link, which can point to anything, and the tag, which is a way of attaching labels to links. The strategy of tagging -- free-form labeling, without regard to categorical constraints -- seems like a recipe for disaster, but as the Web has shown us, you can extract a surprising amount of value from big messy data sets." (http://shirky.com/writings/ontology_overrated.html )

Narrow vs. Broad Folksonomies

Thomas Vander Wal explains the difference between broad vs. narrow folksonomies:

"Vander Wal [argues that], there are broad folksonomies and narrow folksonomies, and they are entirely distinct. "Delicious is a broad folksonomy, where a lot of people are describing one object," Vander Wal said. "You might have 200 people giving a set of tags to one object, which really gives a lot of depth.... No matter what you call something, you probably will be able to get back to that object." In a broad folksonomy, Vander Wal continued, there is the benefit of the network effect and the power curve because so many people are involved. An example is the website of contemporary design magazine Moco Loco, to which 166 Delicious users had applied the tag "design." Conversely, Vander Wal explained, Flickr's system is a narrow folksonomy, because rather than many people tagging the same communal items, as with Delicious, small numbers of users tag individual items. Thus many users tag items, but of those, only a small number will tag a particular item. " (http://www.wired.com/news/technology/0,1282,66456,00.html?)

Explaining and showing broad and narrow folksonomies / Thomas Vander Wal -- <http://www.personalinfocloud.com/2005/02/explaining_and_.html> : February 21, 2005

Historical Context

The Four Charasteristics of Traditional Knowledge Organization

David Weinberger in his book Everything is Miscellaneous

"NEW PROPERTIES, NEW STRATEGIES, A NEW SHAPE OF KNOWLEDGE

(From Chapter 5, pp. 100-106. The first four chapters have tried to convince the reader that how we order and classify our world has a history, is always the result of our culture and interests, confers power on those who get to do the classifying, and is complex and messy. I've also introduced the idea that there are three "orders of order": (1) Organizing the things themselves (books, photos...not Dinge an sich!), (2) physically separating the metadata and organizing them (e.g., catalog cards), and (3) digitizing both the content and the metadata. The third order requires us to invent new principles of organization.)

College students' silverware drawers, Delicious, Flickr, the BBC and Wikipedia are miscellaneous in different ways, except for one thing: How their content is actually arranged does not determine how that content can and will be arranged by their users. In some cases - Wikipedia, for example - no one even knows exactly where the raw contents are. These examples are miscellaneous _because_ users don't need to know the inner organization, _because_ that inner order doesn't result in a preferred order of use, and _because_ users have wide flexibility to order the pieces as they want, even and especially in unanticipated ways. This means that the miscellaneous enables _all_ of the information contained in the set to be discovered over time.

But this also means the miscellaneous doesn't much resemble our traditional view of knowledge. Knowledge, we've thought, has four characteristics, two of them modeled on properties of reality and two on properties of political regimes.

As we've seen, the first characteristic of traditional knowledge is that just as there is one reality, there is one knowledge, the same for all. If two people have contradictory ideas about something factual, we think they can't both be right. This is because we've assumed knowledge is an accurate representation of reality, and the real world cannot be self-contradictory. We treat ideas that dispute this view of knowledge with disdain. We label them "relativism" and imagine them to be the devil's work, we sneer at them as "postmodern" and assume that it's just a bunch of French pseudo-intellectual gibberish, or we say "whatever" as a license to stop thinking.

Second, we've assumed that just as reality is not ambiguous, neither is knowledge. If something isn't clear to us, then we haven't understood it. We may not be 100% certain whether the Nile or the Amazon is the longest river, we but we're confident one is. Conversely, if there's no possibility of certainty - "Which tastes better, beets or radishes?" - we say it isn't a matter of knowledge at all.

Third, because knowledge is as big as reality, no one person can comprehend it. So we need people who will act as filters, based on education, experience and clear thinking. We call them experts and we give them clipboards. They keep bad information away from us and provide us with the very best information.

Fourth, experts achieve their position by working their way up through social institutions. The people in these institutions are doing their best to be honest and helpful, but, until humans achieve divinity, our organizations will inevitably be subject to corrupting influences. Which groups get funded can determine what a society believes, and funding is often granted by people who know less than the experts: The fate of a DNA research center may rest with Congresspeople who couldn't tell a ribosome from a trombone.

The way we've organized knowledge has been largely determined by these four properties of knowledge. We've tried try to settle on a single, comprehensive framework for knowledge, with categories so clear and comprehensive that experts can put each thing in its proper place. Institutions grew to maintain the knowledge framework. Their ability to certify experts and to vouch for knowledge made them powerful and sometimes rich. So, when the miscellaneous shakes our certainty in the nature of knowledge, more than the future of the card catalog is at stake. Because a third order miscellany is digital, not physical, we no longer have to agree on a single framework. Things have their _places_, not a single place. We get to create our own categories, ones that suit our way of thinking. Experts can be helpful, but in the age of the miscellaneous they and their institutions are no longer in charge of our ideas.

Changes in the "Third Order" Digital Era

David Weinberger, continues his explanation, focusing on the present changes:

These are big changes, but perhaps the most urgent one is this: Over the course of the millennia, we've developed sophisticated methods and processes for developing, communicating and preserving knowledge. We have major institutions - serious contributors to our culture and our economy - devoted to those tasks. We're good at it. Now we have to invent new ways appropriate to the new shape of knowledge. We are doing so at a pace unparalleled in our history.

Three new strategic principles are emerging, severing the ties between the way we organize physical objects and ideas.

FILTER ON THE WAY OUT, NOT ON THE WAY IN. A friend of mine who worked at the Harvard Business Review tells amusing stories about the "slush pile," the unsolicited manuscripts that arrive every day. Harvard Business Review is a sober journal of research and ideas, yet people submit poetry, short stories, and arty photographs. My friend's job was to go through the slush pile to see what, if anything, was worth passing along for serious consideration. She was a gatekeeper, a filterer, a job that makes sense when the economics and physics of paper force us to make decisions about what knowledge we will publish and thus preserve. We rely on experts such my friend to spare us from having to wade through the slush pile on our own.

But, when anyone can publish at the press of a button, the social role of gatekeepers changes. For example, from the outside, the "blogosphere" looks like a self-indulgent pool of slush that wouldn't get past the usual publishing filters. While the economics of publishing ensure that most blogs indeed wouldn't be let through the gates, the aggregate value of all the blogs in the "long tail" (to use the term Chris Anderson made popular in his book of that name) - each perhaps of interest only to a few people - is incalculable. This is an inversion of the old model. In a world of parsimonious access to paper, filters increase the value of what's available by excluding the slush. But in the third order, where there's an abundance of access to an abundance of resources, filtering on the way in _decreases_ the value of that abundance by ruling out items that might be of great value to a few people. Filtering on the way out, on the other hand, increases the value of the abundance by locating what's of value to a particular person at a particular moment. For example, a young physics professor at McGill University, Bob Rutledge, started an electronic bulletin board that posts new findings for any research as soon as it can be summarized. Rutledge doesn't apply criteria to decide for the reader whether the research is important enough to be included (though only active, professional astronomers can register to post to the site). It's up to each reader to be the filterer. Similarly, the Public Library of Science's biology journal, a peer-reviewed but free online resource, started PLoS One in November 2006. "The idea is to take the editorializing out of the peer review process," says Hemai Parthasarathy, the managing editor. So long as a paper is "sound," it will be published. If it's good science, _someone_ may find it useful. So long as the user has good tools for finding what she needs - and this is a task many are working on - filtering on the way out vastly increases our shared potential for knowledge.

PUT EACH LEAF ON AS MANY BRANCHES AS POSSIBLE. In the real world, a leaf can only hang from one branch. In the first order of organization, there's no way around that limitation. In the second order, most cataloging systems have provisions for listing books under more than one heading, but the physicality of the second order still usually demands that one branch be picked as the primary one and there is a limit on the number of secondary listings.

In the third order, however, it's to our advantage to hang information from as many branches as possible. If you get a new Casio digital camera to sell in your online store, you'll want to list it under as many categories as you can think of, including cameras, travel gear, Casio products, graduation gifts, new items, sale items, and perhaps even sports equipment. Hanging a leaf on multiple branches makes it more findable by customers. Unlike in the second order, this doesn't make your e-store disorganized or messy. It makes it more usable‚Ä¶and more profitable.

EVERYTHING IS METADATA AND EVERYTHING CAN BE A LABEL. In a store, it's easy to tell the labels from the goods they label, and in a library the books and their metadata are kept in separate rooms. But it's not so clear online. If you can't remember the name of one of Shakespeare's plays, go to the search box at Google Book, type "Shakespeare tragedy," and you'll see a list of all of them. Click on, say, _King Lear_ and you can read the full text, including the famous line, "How sharper than a serpent's tooth it is to have a thankless child!" Now suppose you want to know where the quotation "How sharper than a serpent's tooth" comes from. Type the phrase into the search box and Google will list _King Lear_. Simple, but in the first case you used Shakespeare's name as metadata to find the contents of a book and in the second you used some of the contents of the book as metadata to find the author and title. In the miscellaneous order, the only distinction between metadata and data is that metadata is what you already know and data is what you're trying to find out.

In the first two orders of order, we've had to think carefully about which metadata we'll capture because the physical world limits the amount of metadata we can make available: A book's catalog card has to hold far less information than does the book itself. In the third order, not only can every word in a book count as metadata, so can any of the sources that link to the book. if we want to help our customers or users find information, we'll try to make as much of usable as metadata as we can.

This not only makes sites easier to use, it vastly increases the leverage of knowledge. Think of what we can do with just the few words that fit on a second-order card or label. Now that everything in the connected world can serve as metadata, knowledge is empowered beyond fathoming. We not only can find what we need based on whatever slight traces we have in our hand, we can see connections that would have escaped notice in the first two orders. The power of the miscellaneous comes directly from the fact that in the third order, everything is connected and therefore everything is metadata.

GIVE UP CONTROL. Build a tree and you surface information that might otherwise be hidden, just as Lamarck exposed information left hidden in Linnaeus' miscellaneous category of worms. But, a big pile of miscellaneous information contains relationships beyond reckoning. No one person or group is going to be able to organize it in all the useful ways, hanging all the leaves on all the branches where they might be hung. For example, iTunes shows users a branch that pulls together albums by a particular artist, but the millions of playlists that users have made there find relationships that the organizers of iTunes could not possibly have foreseen, from techno versions of children's songs to tracks played at someone's third wedding. iTunes simply cannot predict what people are going to be interested in, what a song is going to mean to them, and what connections they're going to see. Some of the combinations will be of passing value only to one person, but other people may find their world changed by how a stranger has pulled together a set of songs to express a mood, an outlook, or an idea.

That's why it's so powerful to let users mix it up for themselves. Go into a real world clothing store and try pulling everything in your size off the racks and into a shopping cart so ou can go through it in an orderly fashion. After all, that's the rational way to proceed. Everything that's not your size is just noise, a distraction. Yet, within ninety seconds you'll be thrown out of the store and firmly asked not to return. On line, on the other hand, we just naturally expect to organize digital information our way, through tags, bookmarks, playlists, and weblogs. And then we add to the information a site provides us by disagreeing with it in our own reviews. Users are now in charge of the organization of the information they browse. Of course, the owners of that information may still want to offer a prebuilt categorization, but that is no longer the only - or best - one available. Put simply, the owners of information no longer own the organization of that information.

Control has already changed hands. The new rules of the information jungle are in effect, transforming the landscape in which we work, buy, learn, vote and play." (from Everything is Miscellaneous)

Examples

Some of the sites that pioneered tagging are del.icio.us. <http://del.icio.us/> ; Flickr. <http://www.flickr.com/> ; Furl, <http://www.furl.net/>

More Information

The Wikipedia article is at http://en.wikipedia.org/wiki/Folksonomy

A good overview of folksonomies, at http://www.iskoi.org/doc/folksonomies.htm

Specialized site on tagging, is at http://www.tagsonomy.com/

Folksonomy / Alex Wright -- <http://www.agwright.com/blog/archives/000900.html> : January 5, 200

Social bookmarking tools / T Hammond, T Hannay, B Lund, J Scott -- <http://www.dlib.org/dlib/april05/hammond/04hammond.html/> : April 2005

David Weinberger. "Tagging and why it matters." Retrieved November 10, 2006 from <http://cyber.law.harvard.edu/home/2005-07>

Tagging explained by Business Week at http://www.businessweek.com/magazine/content/05_15/b3928112_mz063.htm

A philosophical analysis and critique of tagging, linking it to philosophical relativism, see Beneath the Metadata, by Elaine Peterson.

Podcasts/Webcasts

Listen to David Weinberger on Everything is Miscellaneous and to David Weinberger on Tagging and Folksonomies

Key Books to Read

A Book on The power of categorization:

Sorting Things Out, by communications theorists Geoffrey C. Bowker and Susan Leigh Star (The MIT Press, 2000), covers a lot of conceptual ground in this context: " After arguing that categorization is both strongly influenced by and a powerful reinforcer of ideology, it follows that revolutions (political or scientific) must change the way things are sorted in order to throw over the old system. Who knew that such simple, basic elements of thought could have such far-reaching consequences?" (Rob Lightner in a Amazon.com review)

Folksonomies

Contents