Tweaks to my RSS feed
I've changed my RSS feed to use version 1.0, and have the titles in the right place. There's no standard place for dates in RSS, so I used the Dublin Core module to specify the dates, e.g.:
<dc:date>2003-04-03T20:00Z</dc:date>
The dates have to be in a format described in a
W3C note, which is just a
subset of the ISO 8601 format. For now all my dates are in GMT/UTC (which
is what the Z at the end means), but really some of them should have
a timezone thing on to specify BST.
Here's the template I'm using to generate my feed, which uses my own little templating language:
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel rdf:about="http://ungwe.org/blog/qefsblog.rss"> <title>Qef's Blog</title> <description>A diary of random things Geoff Richards has been playing with recently</description> <link>http://ungwe.org/blog/</link> <dc:creator>Geoff Richards (mailto:qef@ungwe.org)</dc:creator> <dc:language>en</dc:language> <items> <rdf:Seq>$[foreach $entries:$[if $recent: <li rdf:resource="$url" />]] </rdf:Seq> </items> </channel> $[foreach $entries:$[if $recent: <!-- $datetime --> <item rdf:about="$url"> <title>$title</title> <link>$url</link> <dc:date>$w3c_datetime</dc:date> </item> ]] </rdf:RDF>
A brief and probably inaccurate history of RSS
It's rather annoying and confusing that there are so many different and incompatible versions of RSS. As far as I can work it out, the history goes something like this:
Userland came up with a simple XML syndication format called Scripting News.
RSS 0.9 was defined by Netscape, and was a very simple vocabulary which used bits of RDF. It may have been called Really Simple Syndication.
RSS 0.91
(Netscape spec,
UserLand spec)
up to at least RSS 0.94 added more elements for
things like dates (<pubDate>), borrowing features from
Scripting News, and gave up the idea of using
RDF for it. At some point it may also have been redubbed
Rich Site Summary. These versions,
I think, allow HTML content to be included, so that titles and things
can be marked up. I've seen this done both with embedded XML markup (which
I think is probably wrong) and with fully escaped XML text included as
PCDATA in the RSS.
RSS 1.0
(spec)
was based on version 0.9, but put the focus back on
RDF, and also allowed modules (identified by namespace URL) to be
plugged in, allowing things like Dublin Core metadata to be used. It
doesn't incorporate the things which were added in the later 0.9x series,
making it incompatible with those. It also changes the root element
from <rss> to <rdf>, and the name was again
changed, this time to RDF Site Summary.
RSS 2.0 (spec) is derived from the 0.9x series, and seems to have nothing to do with version 1.0. It mainly tidies up the previous versions and clarifies things.
Well, it's a bit convoluted, but that's XML for you :-)
Ideas for an aggregator
Having discussed making a web-based RSS aggregator with Aaron we think it would be useful to have a full text search which searches an archive of collected news items from all the feeds being monitored, and also searches the content they refer to. That would make it easier to find articles you remember reading some time ago, when you can't remember which site they were on or what was in the title.
Collecting old items does present a problem: how do you merge a new copy of an RSS file with the items already read from previous versions? I think you just have to take everything in the new version of the file as they are, and delete any previously-collected copies of the information unless they are older than the oldest item in the new file (because that probably means they've dropped of the end of the feed, although that makes assumptions about what the items in the feed actually mean).
Prune Juice
I've just watched Star Trek TNG Yesterday's Enterprise. It's one of at least three episodes where Tasha Ya comes back from beyond the grave, and also the one where Worf gets introduced to prune juice (“a warrior's drink”). Good stuff.