May 30, 2014

Snownews: Using xmllint to deal with problematic Atom to RSS conversion

I had a couple of feeds that never worked in Snownews. I never investigated this too much, rather just assumed it was attributed to using XSL to convert them from Atom to RSS and that something must be failing in the conversion. Turns out the Atom feeds in question are not actually valid XML and contain minor parsing errors, enough to scupper the XSL conversion. I wondered if there was a less strict parser I could use or whether there was a “XMLfix” tool along the lines of HTML Tidy; although I couldn’t recall the name of HTML Tidy at the time - if I had, I’d have read it does have limited support for XML. I then came to realise I already had something I could use for correcting minor errors: xmllint.

In my urls file, whereas for Atom feeds I’d normally have an entry like this:

exec:curl -o - http://noxmlparsingerrors.com/atom.xml||stuff|xsltproc /path/to/atom2rss.xsl -

I can now have:

exec:curl -o - http://xmlparsingerrors.com/atom.xml||stuff|/path/to/lintedatom2rss -

Where lintedatom2rss is a simple script:

#!/bin/sh
xmllint --recover - | xsltproc /path/to/atom2rss.xsl -

Using a separate script is necessary as Snownews uses the pipe character (“

”) as a field separator rather than an actual pipe and I need to pipe from xmllint to xsltproc.

On a related note, I realised a lot of the Atom feeds did not contain links which meant I could not quickly open the feed item up in elinks. Looking into the atom2rss.xsl file I was using I saw that it was looking for a type of “text/html”:

<link><xsl:value-of select="atom:link[@type='text/html']/@href" /></link>

A lot of the Atom feeds, my blog included, did not specifiy the type on the links (does now!) so I decided to wrap this in a conditional so that if not found it would try without the type (although avoid rel=self links which should/would be a link to the feed itself). I’ve captured these changes in this gist.

Tags:
- program

atomicules

Mostly walking the dogs

Snownews: Using xmllint to deal with problematic Atom to RSS conversion