At Barcamp on Saturday, Harry Metcalfe of TellThemWhatYouThink and I presented some work we’ve been doing to build a web application which makes it easier to turn PDF versions of consultation documents into structured XML. Before you click on to something more interesting, give me a chance to explain in plain English why this matters.

Pile of papers

Image: lotyloty

As Harry says in his write up:

Typically, a formal consultation is a pretty tedious process: a department will write up a big PDF document, print it, send it to some people, stick it on their website and wait for people to respond. The whole process is pretty dated: it doesn’t really take advantage of the web, and is pretty inaccessible to most people.

We’re not going to get away soon from the reality that the final, definitive versions of these documents live in PDF (well InDesign/QuarkXPress, then PDF) formats. Somehow, they need to be turned into something which can ‘live’ online, stimulating a conversation and real two-way interaction about policy. They need to be documents which help people get into the issues, discuss and share them, relate them to their own situation and provide feedback on the aspects that concern them.

There are a couple of nice examples of this in practice right now:

  • In Birmingham, a group of Concerned Citizens took what they perceived to be a jargon-heavy consultation document from the local authority, translated it into plain English, and posted it in commentable form on a website, to help ensure that local people had a chance to meaningfully engage with the proposals. I imagine this took some laborious cutting and pasting and some hairy-chested technical skills, but what if the council had published its document in a way which any group could easily dissect and interpret in this way? What if all councils did so? Might the proposals and the ideas become more important than the jargon, and might the quality and extent of the debate improve as a result?
  • The Power of Information Taskforce Report, published in beta via a WordPress-based tool on Sunday, also publishes a parallel RSS feed of all the sections of the document, and all the contributed comments. Within 12 hours of appearing on the web, the document had already been converted – not by the publishers, but by interested third parties looking to widen the debate – into a wiki and a special XML dialect for strategy documents known as StratML.

Even without surrendering quite so much control over the words themselves, there is much that becomes easier to do once the core document is structured:

  • Convert it easily into HTML which can be loaded into a big enterprise CMS and published on the corporate site, linked to and/or read on mobile devices
  • Build a tool to attach a comment box to each paragraph or section, for people to comment in detail on the proposals
  • Automatically generate an online response form, which picks up all the questions asked in the document, and sends the results to a database, spreadsheet, analysis package, discussion forum or whatever
  • Generate ‘widgets’ or mini-questionnaires based on a few of the questions raised in the document, for bloggers and social networkers to embed in their sites and profiles, like we attempted for the Science and Society consultation
  • Publish out information about the consultation, like the closing date, summary and so on, to services like Directgov to aggregate across government, or to third party user-generated sites like TellThemWhatYouThink – like we’re doing at DIUS using a simple Atom feed.

The fact is, PDF files of consultations are a big soup of paragraphs, pictures, case studies and questions. To help build the kind of labour-saving tools which might encourage debate around them, we need to describe the content of the documents in ways which machines can work with (“this bit’s a question”,”that bit’s a case study”,”the consultation closes on date X” and so on). And that’s what Harry’s tool sets out to do. It’s a proof of concept for now, designed for a user community of quite knowledgable web publishing teams, and understandably still has some rough edges. But he’s overcome an impressive array of technical challenges to illustrate how investing some time in marking up a consultation document this way could open up an exciting world of potential applications. The next phase of the project is to start to build some of those applications.

Harry’s set up a sandbox environment so you’re welcome to have a look at the tool as it stands, and give us your thoughts and ideas on where we should take it next.

p.s. If you want to pick up some quick tips on how to structure website data from someone who knows what they’re talking about, read Jeni Tennison’s fantastic guide to what government should do to facilitate data reuse.

Get notified of new blog posts by email


Valuable piece of work and widgetisation is certainly the way forwards. But as good as it is, it’s still a sticking plaster over the fundamental issue of publication processes in Government. Print versions of documents (and PDFs by default) are considered the first and most important to be finished by policy peeps, when we know that XML versions should be worked on first allowing a variety of reuse options – PDF being one.

With all the successes we’re having getting policy to engage a little more through consultations, should it be that difficult to get them to sign-off a digital version of the document first? Could this tool reinforce the position of the PDF as primus inter pares in the medium term?

Hopefully we can win round the @quillpushers before too long…

I am so excited about this piece of work. Need to get the website managers using this across govt – heads of ecomms + heads of consulation teams? Though it might need more of DIUS’s excellent hosting provision to make it a reality in the short term?

We may have something in the pipeline. Need more strategy and co-working behind all these things though… Heads of Digital Engagement group needs to start quickly with the secretariate at DIUS. We need help to embed best practice, deal with security, common approach to working with contractors etc.

I think UKGovbarcamp showed that there’s someone in every govt dept ready to work on interesting projects…

Woah, woah there Mr F – secretariat at DIUS? Life in an organisation of 140,000 people has *changed* you, man 🙂

But let’s have a first meetup soon, definitely.

There’s an issue in that by trying to take a strategic, big tent approach at DIUS, we’re reducing our available capacity to do this kind of more technical innovation a bit, so we’re likely to struggle to keep up the pace. But we’ll try.

Everything is happening at your door… consider it the cost of success!

Policy people and private office are referencing your work and are asking us to do the same. The organisation is waaaaay off being nimble enough to do the stuff you’re doing, so we need to collaborate, and crack some of these common issues.

And if your capacity to innovate is decreasing, we may be able to look at ways of increasing it elsewere. This way we can make sure that innovation doesn’t happen for innovation’s sake, but we can continue it and really embed it in our work. That’s what we all want.

Strategy needn’t be restrictive. It’s just a way of looking at the bigger picture. A wise old man on top of a mountain once said to me: strategy is like life… it’s what you make it! (alright he didn’t, but it sounds good)

So – maybe a bit OTT, but we need a place to discuss 🙂

I am currently looking at DTD of this great idea. Looking for some spare time to build various conversion modules for CMS’s, widgets, browser add-on, Office plug-in or stand-alone app to convert the XML to Word, RTF, PDF, Voice, ODT etc. This way we can get better Comms buy-in.

Not sure how well Office plug-in and browser add-on will be accepted by Dept IT people.