Nik's Technology Blog

Travels through programming, networks, and computers

Create a jQuery Tag Cloud from RSS XML Feed

I previously created a jQuery Blogger Template Category List Widget to retrieve blog categories from a Blogger.com RSS feed and create a list of links which click through to Blogger label pages.

I've now taken this code a step further and modified it to calculate the number of times each category/tag occurs enabling me to create a tag cloud from the data, like the one below.

 

Before I explain the code I wrote to make the tag cloud I'll go through the solution to a bug I found with the original categories code.

You may recall this snippet of code where I iterate through each post and then each category of each post, finally, when all the categories have been added to the array I sort them prior to de-duping them.

$.get('/blog/rss.xml', function(data) {
//Find each post
        $(data).find('item').each(function() {
//Get all the associated categories/tags for the post
            $($(this)).find('category').each(function() {
                categories[categories.length] = $(this).text();
            });
        });
        categories.sort();

I later refactored the code removing the $(data).find('item').each iteration which wasn't required since find('category') will find them all anyway.

I then discovered that the JavaScript .sort() function was case-sensitive which resulted in lower case categories being placed at the end of the list, causing problems when I de-dup them.

So the rewritten snippet of code became:

$.get('blog/rss.xml', function(data) {
     //Find each tag and add to an array
     $(data).find('category').each(function() {
         categories[categories.length] = $(this).text();
     });
     categories.sort(caseInsensitiveCompare);

where caseInsensitiveCompare refers to a JavaScript compare function:

function caseInsensitiveCompare(a, b) {
    var anew = a.toLowerCase();
    var bnew = b.toLowerCase();
    if (anew < bnew) return -1;
    if (anew > bnew) return 1;
    return 0;
}

Creating the Tag Cloud jQuery Code

I start off as before fetching the XML, adding all the categories/tags from the RSS feed to a JavaScript array, then sorting them.

But I needed a way to store, not only the tag name, but the number of times that tag is used on the blog (the number of times the category appears in the feed).  For this I decided to use a multi-dimensional array which would essentially store the data in a grid fashion e.g.

Tag Name Count
ASP.NET 5
Accessibility 2
Blogging 15
jQuery 2

 

The de-dup loop from my previous categories script now performs two jobs, it removes the tag duplicates and creates a count of each tag occurrence.

Once the multi-dimensional array has been populated, all that's left to do is iterate through the array creating the HTML necessary to build the tag cloud, followed by appending it to a DIV tag with an ID="bloggerCloud" on the page.

Note the calculation I perform to get the tags appearing a reasonable pixel size ((tagCount * 3) + 12).

$(document).ready(function() {
    var categories = new Array();
    var dedupedCategories = [];
    $.get('blog/rss.xml', function(data) {
        //Find each tag and add to an array
        $(data).find('category').each(function() {
            categories[categories.length] = $(this).text();
        });
        categories.sort(caseInsensitiveCompare);
        //Dedup tag list and create a multi-dimensional array to store 'tag' and 'tag count'
        var oldCategory = '';
        var x = 0;
        $(categories).each(function() {
            if (this.toString() != oldCategory) {
                //Create a new array to put inside the array row 
                dedupedCategories[x] = [];
                //Store the tag name first 
                dedupedCategories[x][0] = this.toString();
                //Start the tag count 
                dedupedCategories[x][1] = 1;
                x++;
            } else {
                //Increment tag count
                dedupedCategories[x - 1][1] = dedupedCategories[x - 1][1] + 1;
            }
            oldCategory = this.toString();
        });
        // Loop through all unique tags and write the cloud
        var cloudHtml = "";
        $(dedupedCategories).each(function(i) {
            cloudHtml += "<a href=\"/blog/labels/";
            cloudHtml += dedupedCategories[i][0] + ".html\"><span style=\"font-size:" + ((dedupedCategories[i][1] * 3) + 12) + "px;\">";
            cloudHtml += dedupedCategories[i][0] + "</span></a> \n";
        });
        $('#bloggerCloud').append(cloudHtml);
    });
    return false;
});

Since building this script I've now gone one step further and created a jQuery plug-in based on this code.  For more details and the source code see my jQuery Blogger.com Tag Cloud Plugin page.

jQuery Blogger Template Category List Widget

Blogger is a hosted blogging service which allows you to publish your blog to your own URL and create your own custom HTML templates to match your website design. 
I have been using Blogger for this blog for several years, and have been trying to find a good way of displaying a list of categories on each blog page.

As yet I haven't found an official way of creating a category list using the Blogger mark-up code, so I decided to write my own widget to do the job for me.

When I say category list I mean a list of all the blog tags/labels in your blog, each linking to a page with posts categorised using that particular tag, just like the examples below.

Blog Categories

Because Blogger is a hosted blogging service you can't use a server-side language to create the category list for your HTML template, instead you must rely on client-side JavaScript.

Thankfully the Blogger service publishes XML files to your website along with the post, archive and category HTML pages.  These are in ATOM and RSS formats and are there primarily for syndication, but XML files are also fairly straight-forward to parse using most programming languages and contain all the category data we need to build a categories list.

I chose to use the jQuery library because it makes the process even easier.

The Blogger XML Format

From the Blogger ATOM XML snippet below you can see that each blog item can have multiple category nodes.  This means that the code must loop through each blog post, then loop through each category of each post to create our category list, but it also means that we will have duplicate categories, because more than one post can have the same category.

<item>
  <guid isPermaLink='false'></guid>
  <pubDate>Thu, 14 May 2009 18:30:00 +0000</pubDate>
  <atom:updated>2009-05-15T11:35:03.262+01:00</atom:updated>
  <category domain='http://www.blogger.com/atom/ns#'>C Sharp</category>
  <category domain='
http://www.blogger.com/atom/ns#'>ASP.NET</category>
  <category domain='
http://www.blogger.com/atom/ns#'>Visual Studio</category>
  <title>Language Interoperability in the .NET Framework</title>
  <atom:summary type='text'>.NET is a powerful framework which was built to allow cross-language support...</atom:summary>
  <link>http://www.nikmakris.com/blog/2009/05/language-interoperability-in-net.html</link>
  <author>Nik</author>
  <thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total>
</item>

The jQuery Code

The jQuery code is fairly easy to follow, but here is a quick explanation.  After the DOM is available for use, I create two JavaScript arrays, one to hold the categories and one to hold our de-duped category list.  Then I load in the Blogger RSS feed and iterate through each blog post adding each category to the categories array.
Once it reaches the end of the RSS feed, I need to sort the array into alphabetical order so that I can de-duplicate the categories list I just populated, which is what the next jQuery .each() function does.
All I have left to do is loop through the de-duped categories list, create the HTML link for each category and the append the HTML unordered list to the page.

$(document).ready(function() {
    var categories = new Array();
    var dedupedCategories = new Array();
    $.get('/blog/rss.xml', function(data) {
        //Find each post
        $(data).find('item').each(function() {
            //Get all the associated categories/tags for the post
            $($(this)).find('category').each(function() {
                categories[categories.length] = $(this).text();
            });
        });
        categories.sort();
        //Dedup category/tag list
        var oldCategory = '';
        $(categories).each(function() {
            if (this.toString() != oldCategory) {
                //Add new category/tag
                dedupedCategories[dedupedCategories.length] = this.toString();
            }
            oldCategory = this.toString();
        });
        // Loop through all unique categories/tags and write a link for each
        var html = "<h3>Categories</h3>";
        html += "<ul class=\"niceList\">";
        $(dedupedCategories).each(function() {
            html += "<li><a href=\"/blog/labels/";
            html += this.toString() + ".html\">";
            html += this.toString() + "</a></li>\n";

        });
        html += "</ul>";
        $('#bloggerCategories').append(html);
    });
    return false;
});

 

Update your Blogger Template HTML to Show Categories

The only HTML you need to add to your Blogger template is a call to jQuery, and this script in the head of your page, plus an empty HTML DIV tag, in the place where you want your categories list to appear.

<script type="text/javascript" src="/scripts/jquery.js"></script>
<script type="text/javascript" src="/scripts/blogcategories.js"></script>

<div id="bloggerCategories"></div>

You can see the script in action on my blog, or see this code rewritten to create a tag cloud.

Blogger.com has changed their feed syndication

It seems that Blogger has changed the type of syndication feed they use during the last month (Jan – Feb 2008), I discovered this when my Atom feed XSLT transformation broke when I published my last post.
I originally wrote XSLT to transform the previous feed type for my homepage blog updates, which assumed the following heirachy with the Atom 0.3 namespace:

<feed>...<entry>...</entry></feed>

Whereas the latest feed has changed to use both Atom and openSearch namespaces and the following structure:

<rss>...<channel>...<item>...</item></channel></rss>

The root node seems to suggest it is the RSS 2.0 standard, using the Atom namespace, which is peculiar, notice the openSearch namespace too...

<rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'>

Here's my updated XSLT to convert the new Blogger.com format.



<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">

<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="channel">
<div id="FeedSnippets">
<xsl:apply-templates select="item" />
</div>
</xsl:template>


<xsl:template match="item" name="item">
<xsl:if test="position()<6">
<h4>
<xsl:value-of select="title"/>
</h4>
<p>
<xsl:choose>
<xsl:when test="string-length(substring-before(atom:summary,'. ')) > 0">
<xsl:value-of select="substring-before(atom:summary,'. ')" />...<br />
</xsl:when>
<xsl:when test="string-length(substring-before(atom:summary,'.')) > 0">
<xsl:value-of select="substring-before(atom:summary,'.')" />...<br />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring(atom:summary,0,200)" />...<br />
</xsl:otherwise>
</xsl:choose>
<strong>Read full post: </strong>
<a href="{link}">
<xsl:value-of select="title"/>
</a>
</p>
<hr />
</xsl:if>
</xsl:template>
</xsl:stylesheet>