Identifiers
Before you can share metadata about something, you need to be able to reliably identify it. As regular readers may know,I am interested in thechallenges of identifying common things like web pages and XML namespaces (which ought to be easy to identify unambiguously).
Even more interesting is the challenge of identifying something which the creator did not necessarily intend to be identified easily. Spam or viruse-mails are a great example. Typical MD5 hash of e-mail message is unlikely to work, since the sender will change subtle things between messages that would result in completely different hashes.
In addressing this problem, the Razor/SpamNet project has done some very cool work. Think of the project as RBL, just capable of filtering individual messages and not just hosts, and extended to end-users and not just ISPs. They have selected an algorithm for producing identifiers from an e-mail’s text in a way that allows accurate identification accross variations of the same message. The fact that reliable identifiers exist makes it possible for thousands of users to collaborate and share the work of filtering spam.
This is a real-world example of what the semantic web is about.
In other news, I found another picture of Carter lobbying for the peace prize. This one is pretty cool; it comes from 1994, and shows Carter handing over $30 million to the North Korean dictator to help him buy nuclear bombs.