Just found the ALP RSS feed. It’s excerpts only, but should make generating a full article OzPolFeed significantly easier. Excellent!
Of course – the ALP and Liberals would go and change their websites, breaking the feeds.
The ALP feed is back working again, but I’m afraid that the Liberals have spoiled the party by only providing their media releases as PDFs, making it impossible even to screen scrape a feed together (their website doesn’t work under Firefox either, but that’s beside the point…). So, until further notice, the Libs feed will be a deadend I’m afraid.
I did a late night coding session the other night (haven’t done one of those in ages) to update the OzPolFeeds – in part to prepare for playing with Atomflow. Lots of internal re-organisation. The service now attempts to handle common parsing errors and now logs errors and continues rather than failing completely. The biggest change, though, is that all of the feeds should now correctly validate (including the Greens feed which uses the most messy and invalid URLs I’ve seen on a site in a long, long time – a lot of tweaking was required to get them working correctly as GUIDs for the feeds without breaking the actual link to the article).
I have also moved the service to a server that is online 24/7 so that it can keep running all the time without interuption. In doing so I established a system that would assign a time to the releases as they were added to the corresponding website. For example, the service polls the sites every hour, and when it detects a new entry dated for the current day, it assigns the current time to the entry. The idea being that as things are added during the day they then correctly appear in order in your favourite news aggregator. Unfortunately it appears that most sites are adding their entries retrospectively – the ALP site, for example, adds entries from Sunday on Monday etc. which means the times are never assigned.
I have a question for the people that use these feeds – is it important that the RSS entry date be the date on the media release? Or would this be better/ok to be the date/time added to the site? The benefit of the latter is that aggregators will display the entry in the order they were added rather than lumping them all together. TIA.
Looks like Diego, by releasing atomflow might have made my life a lot easier in building the aggregator for OzPolFeeds. I’ll have to modify my scraper to produce Atom feeds instead of RSS, but (hopefully) that won’t be too much work.
UPDATE: And easier!. I hope to post a bit more about my thinking around this later in the week (after I’ve had a read of what others have been saying too).
After seeing the PM’s feed pop up in my RSS aggregator again last night led me to the conclusion that something was amiss with the OzPolFeeds. Obviously it’s corrected itself (probably an awry article somewhere). I must get back to cleaning that up and handling errors on individual articles properly. Now that I have Mono.NET on my Mac I’ll see if I can work on it from home.
Unfortunately the service that I had that created the OzPolFeeds stopped working for a couple of days. It’s now back up and running today, so the feeds are back to their usual active self. Sorry to anyone affected by this.
Seems something strange is going on with the ALP feeds. I’ll try and have a look at this tomorrow. Basically, the links to the articles aren’t working correctly. I suppose this is the problem with screen scraping. At least the problem seems reasonably consistent, so hopefully that will help with tracking it down…
Now that the PM has had something to say in the new year, I have updated the Libs feed to point to the default page (which contains the most up-to-date announcements).