Recently, I’ve been involved in a project to ensure our consultations support RDFa markup, to make them indexable and reusable by third parties, including Directgov. Without duplicating the quite accessible and useful COI guidance, I thought I’d summarise here the process involved from the perspective of implementing the standard with minimal prior knowledge of the whys and wherefores.
Why bother?
As of Jan 1st 2010, it’s now a mandatory requirement for government sites. But more importantly than that, it’s a Jolly Good Idea to provide a low-maintenance way of enabling other systems and services to grab a list of consultations from your site, and identify the important metadata about them, including the closing date and how to respond. Short term, it will make services like TellThemWhatYouThink and Directgov more useful, but in terms of the bigger picture, it will expose the opportunity to get involved with policymaking to a wider audience, and reduce the hassle for those who are already part of our regular stakeholder group (by making possible new services such as auto email alerts, RSS feeds, cross-government updates and so on).
What’s involved?
RDFa offers a simple way to add meaningful information to existing web pages, which can be extracted easily by software (as opposed to hit-and-miss ‘scraping’ of regular web pages). As a lay person, I’d say there are three key principles which I can articulate:
- Be unobtrusive and minimalistic: taking this approach lets you add extra items to pages which aren’t seen by regular browsing visitors, but which are accessible to software robots looking for them. It’s also not ‘an extra thing’ to maintain and serve like an RSS feed, so reduces risk, in theory.
- Offer clean data: through being consistent in how data about the consultation is described, the idea is that RDFa helps to extract very clean information about the consultation – for example, an unambiguous closing date, a response email address, an exact postcode, all in formats which can then be used in other ways (plotted on a map, listed on a calendar, turned into a mailform on a website etc)
- Extend existing conventions: the most complicated aspect of implementing this particular specification is that the authors have gone out of their way to find existing wheels rather than reinvent their own. So they use Dublin Core metadata to describe authors and organisations; vCard to describe response contact information; plus nods to DBPedia and FOAF (Friend Of A Friend) to support these major semantic web initiatives. Only for theย gaps where specific consultation information needs to be marked up is there a new standard introduced, using the namespace (prefix)
argot
.
In a nutshell, the process involves tweaking the template for your consultation pages, adding extra metadata elements and attributes. This is only as easy or hard as your CMS makes it. It’s important that it’s right though – even a few ‘broken bits’ could render the page useless to a software robot trying to extract data from it.
How to do it
Read the COI guidance (and give it to your developer), which is the most comprehensive guide, with useful illustrated examples. There’s also a worked up HTML page showing how this works, and of course you’re welcome to look at ours (which I *think* are right, based on feedback from the gurus).
As an example (but again, you should read the official guidance) I found I needed to work through the following:
- ensure we have a single page per per consultation
- amend the DOCTYPE, if you’re using something like the standard XHTML strict/transitional version. Needs to tell requesters of the page that it contains RDFa
- add some attributes to the <html> element, highlighting the namespaces (vocabularies) you’re referencing in the document
- add Dublin Core metadata elements/attributes to your page <head> element if they’re not there already
- ensure we have a wrapper <div> around the consultation information which again references the namespaces (vocabularies) you’re using. This also identifies the name of the organisation publishing the document
- add some Dublin Core metadata attributes as <spans> within this <div> identifying this as a consultation
- add some Dublin Core attributes to key bits of the HTML, such as the consultation title, start date, closing date and description, marking these as such – and in the case of dates, ensuring there’s a machine-readable data format value in the attribute. Also add a unique identifier – a reference number – to each consultation (not something we’d done routinely before)
- ensure the contact details for responses is carefully structured using vCard format, with separate ‘Full Name’, ‘Street Address’, ‘Locality’ and ‘Post Code’ elements, suitably marked-up with attributes. Since vCard doesn’t cover the specific case of a consultation with an email reply address, for example, these elements are marked up with the new argot: namespace attributes
- add Dublin Core-based attributes describing the file attachments – the consultation document itself, and any related ones such as appendices or Impact Assessments
UPDATE: in retrospect, it was foolish to attempt a blog post about code without some code examples. I’ve tried and failed to find a half-decent code syntax highlighter plugin for WordPress, but the following couple of screenshots hopefully illustrate the before and after situations for the contact information part of a consultation:
Before, plain HTML:
After, with RDFa added (and marked up more semantically as a list item within the consultation metadata)
What help is available?
I worked from the examples given in the COI guidance and the pioneers in this at the Ministry of Justice. The COI Digigov team are your allies in helping to implement this, and should be able to answer queries and/or direct you to sources of further implementation advice and support.
In terms of online tools, you can see whether your RDFa is visible to suitably-equipped applications using Mark Birbeck’s tool or bookmarklet, if you prefer (and he should know; he invented RDFa).
Good luck!
P.S. If you Know About This Stuff and feel I’m giving duff advice here, please drop me a line in the comments or via the contact form and I’ll correct. Thanks.
Comments
Good stuph, Steff. I can’t even begin to understand this, but it is clearly important and your post has cleared it up a little bit, which is very useful!
Presumably in WordPress this has all to be hard coded into templates – or have you found a plugin that helps?
As ever, Steph, very useful and may just use this blog to show our developers how they can make our data clean – so important when opening up public data – so that you can get your school on a map, it’s opening times on a calendar, etc. Is RDFa a mandatory requirement for all public sector sites?
Give it to your developer? If only ๐
I wonder if our friends at Limehouse etc are on board with this?
‘@Dave: you’re very kind, but reading between the lines I’m not sure I’ve explained it much more clearly. In case it helps, I’ve added a couple of code examples illustrating the before and after.
For WordPress, which I suspect is somewhat of an edge case when it comes to government CMSes ๐ then the approach involved Simon Wheatley’s Custom Post Template plugin to make the post use a special post template, with all the necessary markup ready. In the WordPress admin, all the special metadata is added as a series of custom fields, which are then slotted into the front-end template in the necessary places.
@Noel: you ask a good question – I think it is for central government but not sure about the wider public sector. I’ll ask the COI team.
@Josh: another excellent point, to which I don’t know the answer. Will ask COI too.
The bit I don’t get is how does the guy on the street – who is very interested in policy and responding to consultations but is not a programmer – get access to the consultations through the RDFa?
Are there publicly available apps that will allow them to somehow locate the RDFa tagged data and view it, or do they need to rely on someone else building an interface for them that searches for the RDFa tagged content?
If the latter is the case, then while it is a step forward, for the people interested in the content it is still of little consequence.
[…] be doing our bit towards publishing online consultation information as Linked Data (Steph Gray has recently blogged on a similar […]
[…] be doing our bit towards publishing online consultation information as Linked Data (Steph Gray has recently blogged on a similar […]
‘@Noel, the RDFa standard for consultations applies to all central government, i.e. all Departments, their agencies and their NDPBs of whatever form. The standard for jobs (which uses identical form) applies to all public sector. LAs mostly deliver this through LGjobs, run by JobsGoPublic. There are few others in development too.
@BrianHoadley, the man in the street probably won’t use RDFa directly, but will see it use in many different places. For example we did a demo on Yahoo! showing how job information could be displayed directly in the search results, as opposed to ‘on this page you may find what you are looking for’. Google now indexes RDFa, so it won’t be long before we see useful results giving you information directly being displayed in search results. Then it can be used in a number of ways to provide Linked Data – i.e. lists of items that also refer to the same object on which you can surf.
@Josh – good question. We’re seeking to help educate our suppliers about what we’re looking for from them in terms of coding. Perhaps we should offer a coders’ day sometime?
Thanks David, hopefully it will come to local gov too
@understood Try http://consultations.direct.gov.uk and my write up: http://bit.ly/4BbeSP
This comment was originally posted on Twitter
[…] be doing our bit towards publishing online consultation information as Linked Data (Steph Gray has recently blogged on a similar […]