More thoughts on RSS

Tweaks to my RSS feed

I've changed my RSS feed to use version 1.0, and have the titles in the right place. There's no standard place for dates in RSS, so I used the Dublin Core module to specify the dates, e.g.:

<dc:date>2003-04-03T20:00Z</dc:date>

The dates have to be in a format described in a W3C note, which is just a subset of the ISO 8601 format. For now all my dates are in GMT/UTC (which is what the Z at the end means), but really some of them should have a timezone thing on to specify BST.

Here's the template I'm using to generate my feed, which uses my own little templating language:

<?xml version="1.0" encoding="utf-8"?>

<rdf:RDF xmlns="http://purl.org/rss/1.0/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">

 <channel rdf:about="http://ungwe.org/blog/qefsblog.rss">
  <title>Qef's Blog</title>
  <description>A diary of random things Geoff Richards has been playing with
   recently</description>
  <link>http://ungwe.org/blog/</link>
  <dc:creator>Geoff Richards (mailto:qef@ungwe.org)</dc:creator>
  <dc:language>en</dc:language>

  <items>
   <rdf:Seq>$[foreach $entries:$[if $recent:
    <li rdf:resource="$url" />]]
   </rdf:Seq>
  </items>

 </channel>

$[foreach $entries:$[if $recent:
 <!-- $datetime -->
 <item rdf:about="$url">
  <title>$title</title>
  <link>$url</link>
  <dc:date>$w3c_datetime</dc:date>
 </item>
]]

</rdf:RDF>

A brief and probably inaccurate history of RSS

It's rather annoying and confusing that there are so many different and incompatible versions of RSS. As far as I can work it out, the history goes something like this:

Userland came up with a simple XML syndication format called Scripting News.

RSS 0.9 was defined by Netscape, and was a very simple vocabulary which used bits of RDF. It may have been called Really Simple Syndication.

RSS 0.91 (Netscape spec, UserLand spec) up to at least RSS 0.94 added more elements for things like dates (<pubDate>), borrowing features from Scripting News, and gave up the idea of using RDF for it. At some point it may also have been redubbed Rich Site Summary. These versions, I think, allow HTML content to be included, so that titles and things can be marked up. I've seen this done both with embedded XML markup (which I think is probably wrong) and with fully escaped XML text included as PCDATA in the RSS.

RSS 1.0 (spec) was based on version 0.9, but put the focus back on RDF, and also allowed modules (identified by namespace URL) to be plugged in, allowing things like Dublin Core metadata to be used. It doesn't incorporate the things which were added in the later 0.9x series, making it incompatible with those. It also changes the root element from <rss> to <rdf>, and the name was again changed, this time to RDF Site Summary.

RSS 2.0 (spec) is derived from the 0.9x series, and seems to have nothing to do with version 1.0. It mainly tidies up the previous versions and clarifies things.

Well, it's a bit convoluted, but that's XML for you :-)

Ideas for an aggregator

Having discussed making a web-based RSS aggregator with Aaron we think it would be useful to have a full text search which searches an archive of collected news items from all the feeds being monitored, and also searches the content they refer to. That would make it easier to find articles you remember reading some time ago, when you can't remember which site they were on or what was in the title.

Collecting old items does present a problem: how do you merge a new copy of an RSS file with the items already read from previous versions? I think you just have to take everything in the new version of the file as they are, and delete any previously-collected copies of the information unless they are older than the oldest item in the new file (because that probably means they've dropped of the end of the feed, although that makes assumptions about what the items in the feed actually mean).

Prune Juice

I've just watched Star Trek TNG Yesterday's Enterprise. It's one of at least three episodes where Tasha Ya comes back from beyond the grave, and also the one where Worf gets introduced to prune juice (“a warrior's drink”). Good stuff.

< Linux Babble, Apache training, RSS feeds | Unicode, freaky puzzle >

Miniblog

(nuggets of inanity)

Tuesday Apr 24th 2007, 16:54 »
Just took the annual web design survey that AListApart do. I don't realy consider myself to be a web designer, but I have been doing a lot of HTML and CSS lately.
Monday Apr 23rd 2007, 18:23 »
Strange, there appears to be a bare-knuckle boxing match going on in the field outside my flat. Wish they wouldn't make so much noise about it.
Thursday Mar 1st 2007, 18:47 »
“In its written form, Hebrew has no vowels, making it the ideal language for texting.”
—Said in jest on some Radio 4 programme just now.

Archive: 2007 · 2006 · 2005 · 2004
Feed