Home    Files

software times™  Files...
September 30, 2007

Tag Clouds, The New Human Interface

It's not often that a truly useful new human interface makes an appearance. The Tag Cloud, as popularized by Flickr, Del.icio.us and Technorati, is a recent one that I have finally found a use for, as a graphic substitute for a limited search facility.

The emphasis here is on "limited."

I have a file upload script for a small group of forum members, around 100 to 150 people, who wanted to share charts, spreadsheets and similar files. Without some kind of search feature, it was very difficult for a newcomer to find the files he was looking for. I had the choice of building a traditional search feature or experimenting with a tag cloud based on tags assigned to each file by its author.

One of the drawbacks of traditional searches is that a misspelling, for example, can result in an empty set. The person searching has to use words that exist in the searched universe but usually he has no visibility into that universe, it is like searching for a needle in a haystack while blindfolded. One solution is to provide the search feature with drop-down boxes with the key search words. If there are only a few words, that works very well but, if there are 100 items on the list or more, the scrolling is a nuisance at best. In any case, the search words have been extracted from the universe and displayed in a not too pleasant manner. Why not create a more pleasing and informative display, i.e. a Tag Cloud?

Bucketizing by Frequency:

In the interest of not reinventing the wheel I searched the web for free tag cloud code. The one I first opted to test does a poor job of mapping frequency to font size. It simply makes the font size proportional to the frequency. The problem is that most browsers have a hard time displaying any difference between a 110% and a 115% rendition of a font, they come out the same.

Next I discovered some scripts that create a fixed number of "buckets" and distribute the tag frequencies to them. While this is an improvement it is still not a good solution. Some buckets are left empty and this wastes the scarce font size resources that browsers have available to them.

There is no need to map the font size directly to frequency. A better solution is to map font size to relative frequency. Let us suppose there are ten different frequencies in the tag universe (1, 2, 3, 10, 12, 15, 18, 20, 45, 50). Instead of making the font size proportional to these values, we create ten buckets and each one has a slightly larger font size than the predecessor (100%, 120%, 140%, etc.) so as to fill our available font size universe, say 100% to 300%. The font size will not show the magnitude of a frequency, only that it is larger or smaller than some other frequency. This is no loss because the actual frequency can be shown exactly, for example, with the tooltip with a title attribute.

There is no need to have a fixed number of buckets, it is better to let the data drive the number of buckets. This will work until the tag population grows extremely large so that the actual number of frequencies also becomes large. The number of buckets only grows when a new frequency is added to the list. A new tag with an existing frequency will not increase the number of buckets. One thousand tags with a frequency of one each only uses one bucket. While I have no distribution statistics, in a recent application, 96 instances of 18 tags have arranged themselves into 6 buckets because the 18 tags only had 6 different frequencies. The examples I have seen on the web using a fixed number of buckets tend to have 25 to 30 of them. The Pareto Distribution should see to it that the number of buckets does not grow large.

The code:
# Script         tag cloud
# Copyright      2007 OEM Scripts
# Autor          Denny Schlesinger
# OEM Scripts    http://oemscripts.com
# $Id:$

// you need a data source
// this MySQL query returns
// data pairs: tag, frequency 
// ordered by: tag
if (!$result = get_cloud ($filer_db)) {
    echo error_message ("No tags <br />\n");
} else {

    // populate tags and create buckets
    $tags    = array ();
    $buckets = array ();
    while ($row = mysql_fetch_array($result)) {
        $tags[$row['tag']] = $row['frequency'];
        $buckets[$row['frequency']] = "";
    // order buckets by frequency
    ksort ($buckets);
    $count = count ($buckets) - 1;

    // min-max font sizes
    $max_size = 250; // max font size in %
    $min_size = 100; // min font size in %
    $range = $max_size - $min_size;
    // step is the difference in font size from one bucket to the next
    $step  = $range / $count;

    // populate buckets with font sizes
    $i = $min_size;
    foreach ($buckets AS $key=>$value) {
        $buckets[$key] = $i;
        $i += $step;
    // create tag cloud
    foreach ($tags AS $key=>$value) {
        // select CSS font-size
        $size = $buckets[$value];
        // uncomment if you want sizes in whole %:
        // $size = ceil($size);

<!-- link to a page that displays the files tagged with the selected tag -->

<a href="tags.php?tag=<?php echo $key; ?>" 
   style="font-size: <?php echo $size; ?>%" 
   title="<?php echo $value . " '" . $key; ?>' files">
   <?php echo $key; ?></a> 

    } // foreach ($tags AS $key=>$value)

} // if ($result = get_cloud ($filer_db))


Denny Schlesinger

Here you can see it working:
BMW Downloads - Search by Tag

I found a very interesting discussion of tag cloud markup:
Marking Up a Tag Cloud by Mark Norman Francis

Pareto Distribution

Home    Files Top

Copyright © Software Times, 2000, 2001, 2003. All rights reserved
Last updated June 22, 2003