More Cloudy Days

If you have the eye of a good copy editor, you might have noticed some volatility in the ole Tag Cloud on the right. I made some changes at the request of people on the SubText developers list and reworked some stuff—most of which is completely invisible. The biggest visible change is that I decided that assuming an even dispersion around the mean might work in natural statistics, blog post tags tend to be more of a declining curve however.

What that means is that most algorithms for displaying tag clouds use a formula that allocates about half their categories to styles that are never visible. Most tag clouds I saw had slots for up to seven different visual styles that only really displayed three or four.

Well, that's just a waste of a good idea, I say! So I mucked with the weighting formula so that I'd see more visual variation in my clouds. We'll have to see if my theory holds after scrutiny and on blogs with larger traffic.

15. April 2007 15:48 by Jacob | Comments (5) | Permalink

Cloudy Day

I just completed some SubText hacking and the results should be visible now. In addition to upgrading to version 1.9.5 (which should officially release here shortly), I implemented a new feature: Tag Clouds. I'd had enough of my feature envy from all the cool kids who had them, so I went and rolled my own. If you're at my actual site (as opposed to a feed reader), the Tag Cloud is off on the right.

This was a non-trivial feature to add. In an email conversation on the SubText developers' list a couple months ago, Phil Haack (semi-benevolent project dictator) indicated that he planned on tags being first-order objects—meaning that they'd have their own table and post cross-reference. They'd also be allowed to penetrate into other public interfaces (as they'd pretty much have to do). The main point from the majority of developers in that discussion was that they wanted Tags to be done well and fully and support all the different tag providers out there.

Like RSS and other Internet wonders, tags are fundamentally simple in concept. A tag is defined as simply any hyper link with a "rel" attribute of "tag". Something like this:

<a href="http://technorati.com/tags/competence" rel="tag">competence</a>

The biggest gotcha in this setup is that officially, the "name" of the tag is determined by the link and not by the text of the link. So if your link were like this:

<a href="http://technorati.com/tags/competence" rel="tag">incompetence</a>

 The tag is officially "competence" even though it displays "incompetence".

Which means that the core of the feature is scanning posts on insert and update to catch any tags they might contain and adding/updating the links needed for the tags in the post. All I can say is thank heavens for RegEx. It's a pain in the pinky toe to understand, learn, or debug, but you just can't beat it for parsing text. For the curious, and to open myself to ridicule, I'll give you the expressions I used (both case insensitive).

To find a link: <a(?<element>.*?href=[\"'](?<url>.*?)[\"'].*?)>.*?</a>

To figure if the link is a tag: rel=[\"']tag[\"']

Oh, be careful there if you use or copy those expressions; this was done in C# so quote characters are escaped using the \ character.

Anyway, it's done now, and I'd appreciate hearing any trouble you might have. I already caught a choke when the tag includes a "." in it. Fortunately that one just took a RegEx change in web.config so the tag display handler would pick it up. I'll create another post later detailing how to add a tag cloud to a random skin once this upgrade gets added to the SubText project officially.

13. April 2007 18:15 by Jacob | Comments (0) | Permalink

Calendar

<<  May 2013  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar