At Barcamp on Saturday, Harry Metcalfe of TellThemWhatYouThink and I presented some work we’ve been doing to build a web application which makes it easier to turn PDF versions of consultation documents into structured XML. Before you click on to something more interesting, give me a chance to explain in plain English why this matters.
Typically, a formal consultation is a pretty tedious process: a department will write up a big PDF document, print it, send it to some people, stick it on their website and wait for people to respond. The whole process is pretty dated: it doesn’t really take advantage of the web, and is pretty inaccessible to most people.
We’re not going to get away soon from the reality that the final, definitive versions of these documents live in PDF (well InDesign/QuarkXPress, then PDF) formats. Somehow, they need to be turned into something which can ‘live’ online, stimulating a conversation and real two-way interaction about policy. They need to be documents which help people get into the issues, discuss and share them, relate them to their own situation and provide feedback on the aspects that concern them.
There are a couple of nice examples of this in practice right now:
- In Birmingham, a group of Concerned Citizens took what they perceived to be a jargon-heavy consultation document from the local authority, translated it into plain English, and posted it in commentable form on a website, to help ensure that local people had a chance to meaningfully engage with the proposals. I imagine this took some laborious cutting and pasting and some hairy-chested technical skills, but what if the council had published its document in a way which any group could easily dissect and interpret in this way? What if all councils did so? Might the proposals and the ideas become more important than the jargon, and might the quality and extent of the debate improve as a result?
- The Power of Information Taskforce Report, published in beta via a WordPress-based tool on Sunday, also publishes a parallel RSS feed of all the sections of the document, and all the contributed comments. Within 12 hours of appearing on the web, the document had already been converted – not by the publishers, but by interested third parties looking to widen the debate – into a wiki and a special XML dialect for strategy documents known as StratML.
Even without surrendering quite so much control over the words themselves, there is much that becomes easier to do once the core document is structured:
- Convert it easily into HTML which can be loaded into a big enterprise CMS and published on the corporate site, linked to and/or read on mobile devices
- Build a tool to attach a comment box to each paragraph or section, for people to comment in detail on the proposals
- Automatically generate an online response form, which picks up all the questions asked in the document, and sends the results to a database, spreadsheet, analysis package, discussion forum or whatever
- Generate ‘widgets’ or mini-questionnaires based on a few of the questions raised in the document, for bloggers and social networkers to embed in their sites and profiles, like we attempted for the Science and Society consultation
- Publish out information about the consultation, like the closing date, summary and so on, to services like Directgov to aggregate across government, or to third party user-generated sites like TellThemWhatYouThink – like we’re doing at DIUS using a simple Atom feed.
The fact is, PDF files of consultations are a big soup of paragraphs, pictures, case studies and questions. To help build the kind of labour-saving tools which might encourage debate around them, we need to describe the content of the documents in ways which machines can work with (“this bit’s a question”,”that bit’s a case study”,”the consultation closes on date X” and so on). And that’s what Harry’s tool sets out to do. It’s a proof of concept for now, designed for a user community of quite knowledgable web publishing teams, and understandably still has some rough edges. But he’s overcome an impressive array of technical challenges to illustrate how investing some time in marking up a consultation document this way could open up an exciting world of potential applications. The next phase of the project is to start to build some of those applications.
Harry’s set up a sandbox environment so you’re welcome to have a look at the tool as it stands, and give us your thoughts and ideas on where we should take it next.
p.s. If you want to pick up some quick tips on how to structure website data from someone who knows what they’re talking about, read Jeni Tennison’s fantastic guide to what government should do to facilitate data reuse.