Jeff Atwood has a recent post on why he finally gave up and disabled Trackbacks on his blog. My blog is the tiniest fraction of his and I had to disable trackbacks just for sheer spam volume back in October (inspiring an anti-spam rant of my own).
Jeff lays the blame for Trackbacks' demise on Six Apart--the outfit that created the standard in 2002. Ah, those heady glory days when you still had to explain to people what a blog was. Trackbacks were a great idea. They still are a great idea. But Jeff is right, the simplicity of the standard has left it wide open to abuse and that abuse has killed them dead.
So my question is (to Jeff or anyone else), how would you alter the design to make the standard more robust?
My initial take on it was to alter the standard to incorporate a public key exchange and a signature. But then I realized that, hey, spammers can create asymmetric keys as well as anyone else can. In other words, the problem isn't being able to authenticate the link--it's being able to evaluate the linked post.
Jeff's current stop-gap of finding links to his posts through Technorati seems like a reasonable short-term solution. Introducing a third party is problematic, though, because it leads to inevitable issues in finding a trustworthy third party that will carry the authentication burden for you (as well as traffic and processing costs as people ping them for link verification). Indeed, Akismet (a popular stab at trackback filtering) has those same third-party screening issues and isn't substantively different from Jeff's use of Technorati.
I suspect that the problem might not even be in bad design by Six Apart, however. The thing that makes Trackbacks so popular and led them to be so widely adopted is that it allows the creation of inter-post linkages from unaffiliated sources with very little effort. I'm afraid that any solution to the trackback problem is going to necessarily involve increasing the effort of unaffiliated linking to a point where it becomes much less attractive.
A Stab in the Dark
That said, here's two thoughts about potentially rewarding avenues for solving the problem picked up from spam solutions in other realms.
First from email spam, and recognizing that I'm pretty thoroughly ignorant of the underlying mechanisms involved in Bayesian content analysis, I wonder if there might be some useful application for content analysis here. Spammers are increasingly sophisticated in overcoming content analysis, though. Trackbacks may be easier to analyze, though, because they have an easily available comparison text (the originating post). It may be easier to compare your post with that of the linker and come up with a tougher analysis than you can in, say, a lone email. I don't know about that.
Second, the key to the success of spamming is that they have such a very low cost per "signal" (email, comment, trackback, what have you). Their only incremental cost is bandwidth to find blog posts and to send trackback signals. Raising those costs can have a significant effect on spam. This is essentially the key to Captcha's success in curbing comment spam. If a trackback request prompted a user-interactive Captcha-like query, that alone may well be enough to stem the vast majority of trackback spam. Perhaps a design amendment that included a short interaction on a trackback ping would be successful in cutting spam back to manageable levels.
In looking at those two ideas, it occurs to me that the main problem with a Bayesian solution is that it places the burden (both in implementing the Bayesian algorithms and in processing the incoming links) squarely on the target of the spam. This can lead to an unwanted side-effect by leaving your blog much more open to another Internet dirty trick--denial of service attacks. Frankly, you don't even have to deny service to affect a lot of private bloggers--attacks that increase their bandwidth usage would be as unwelcome to many as a full-on denial might be. After all, it doesn't cost you extra hosting fees when your blog goes down.
So maybe that means I only really have one thought/solution/suggestion. Bayesian analysis would be cool for the AI geeks, but not terribly practical in the constrained environment confronting most bloggers. I wonder what it'd take to create a Captcha mechanism in trackback notification?