Open data is sometimes a really challenging thing, particularly when different organizations are involved, where there are multiple partners with competing agendas.
Often documents are created with the sole purpose of being read by humans, with no thought to data reuse.
In many cases, this approach makes data analyses impossible. One option to improve analysis is to publish data in appendices with the main report.
If a sector can agree on consistent standards for data reporting, this then opens the door for simple comparisons, with little effort t needed by data analysts. This data standard needs to be negotiated and tested, and it needs to work for the industry, regulators, data scientists, hackers, civil society, armchair auditors, and governments.
Imagine our data challenge for this task: getting raw data which the hack events could work from. Finding comparable data within one country across a number of years is a challenge in itself. Getting data from a range of countries across a twelve year time period added an even greater challenge.
Extracting the extractives data was not, what one may call an easy task. Our rather ambitious aim, to be able to machine-read Extractives Industry Transparency (EITI) reports and put it into one structure fell at the first hurdle. We found that common reporting elements were defined differently in different countries. The measurement units also differed. Language and translation also added complexity.
We concluded that using machines to extract the data from the reports was not an option. The simplest way, it transpired was to manually type the figures: quite a labour intense task.
In order to make the most of the EITI reports, we needed a sound level of numeracy and industry knowledge. As the Extractives Transparency sector evolves, it is clear that the capacity needs to be built around data: data collection, analysis, and communication.
Rather than letting the work we did (in partnership with the excellent Open Knowledge Foundation) go to waste, today we are making it available online as open data set, for use by anyone.
Open standards are essential. When adhered to they make cross- industry, cross-border, cross- machine, data comparisons possible.
Elsewhere, standards have proven to be a huge benefit — for developers, data scientists, researchers, business analysts, and for curious people with an interest (!). For publicly funded projects, along with interoperability issues, it’s nonsensical to not standardise and mandate this as a condition of funding. No adoption, no funding.
We want the applications developed to have a future and be updated and developed as more data becomes available. If the extractives industry worked towards an open data standard, this task would be achievable.
Data tools need to be contemporary, not historical.
The risk we face is that new efforts go un-noticed, un-questioned, un- praised, un-accountable.
My call to you is to publish those SQL dumps. Publish Excel workbooks. Publish exports from Oracle. Please.