Data is Named the New Oil – guest blog by developer Abiola Afolabi

Data is named the new oil, every one can now scream ‘Drill Baby Drill’. In a society such as Nigeria with different literacy span and pressing need for optimum service delivery, it will be so nice to amplify the voices on deep down data. However, data can be so crude, slimy and messy in its defualt state and will need cleansing into a format that can be used by all.

The ‘Follow the Data’Hackathon has been a thrilling experience for the BudgIT team. The hackathon ties into our daily chore which is using creative means to strengthen societal inclusion in terms of public finance. Citizens are mostly lost when they peer into the arcane subject of extractive revenue and find it difficult to join the conversation. Most importantly, the size of the huge revenue cannot empirically valued by the citizens. With serial zeroes punctuated by commas counted as oil revenue, citizens cannot connect with the huge size of the fund and its impact in the community. Based on the goals of the hackathon which is to find the best ways to use data mining to stimulate actions from local people, we want show what oil revenue can really do to a community.
Our goal is to mine at the oil revenue of government of the last fourteen years of democracy and show the backward possibilities of using such revenue to solely build schools, police stations, clinics, fertilizers and other common items. We now link the datasets to the pressing need of the individual/community and pass that to his representative for urgent attention. For the BudgIT team, data must be clearly be meaningful to citizens and intersect with government institutions.

DSC_0079

Advertisements

Hacking the Extractive Industries for Transparency – Guest blog by Naomi Smith, Earth Security Initiative

As part of the G8 Summit agenda, the ESI was a jury member of an exciting ‘data hacking’ competition hosted by the UK Government’s Department for International Development (DFID), the Extractive Industries Transparency Initiative (EITI), Revenue Watch Institute, and the World Bank Institute.. The purpose: To find new ways of mining existing data on the extractive industries and use technology in order to increase transparency.
International government support for more ‘open’ data has been growing since the launch of the Open Government Partnership in September 2011. Global efforts to increase transparency in the extractive industries have learned that the availability of more and better data, while a fundamental condition for better governance, does not necessarily guarantee greater accountability. Who uses data, for what purposes, and how, is at the root of the challenge. Could ‘apps’ and technology have a role to play?

The competition: From London to Lagos
The first ‘hack-days’ took place on the 4th and 5th of May at the Hub Westminster, in London. Transparency of data in the extractives sector, ranging from production figures to financial values and tax revenues, has long been a focal point of the open data agenda. The Follow the Data hack days brought together top web-developers and designers. In three teams, they are competing to create a web-based application that can enable data to be shared with, and used by, a wide range of stakeholders.
The jury was composed of: Justine De Davila- Senior Extractives Adviser, Department for International Development (DFID); Tim Davies- Founder, Practical Participation Ltd; Naomi Smith – Earth Security Initiative; Luke Balleny- Commentary Editor, Thomson Reuters Foundation; and Marinke van Riet- International Director- Publish what you Pay.
The results:
On May 5th, after working for 24 hours (non-stop, for some) the three teams of developers presented their different apps to the judging panel. While they displayed great technical ingenuity and potential, what resonated above and beyond the apps themselves, was the fact that the three teams chose very different points of engagement in order to tackle the challenge. These varied approaches highlighted that managing this open data tidal wave will be best met by a process with a series of essential and complementary steps. What the three groups found, at a glance:

• Team 1: “first thing’s first”: The first group to present highlighted their focus on a crucial first step: achieving synthesis and coherency in the sea of data that is available They suggested the development of an Application Programming Interface (API), which enables different software components to communicate with each other. With the impending influx of available data, we need to make sure we have the processes and tools in place to synthesise the data arriving en masse and in many different formats, before we expect audiences to engage with it.

• Team 2: “Users must ‘get it’”: The second team presented the next stage in the process: the importance of user comprehension of, and engagement with, the data. This team focused on creating interactive maps and graphs with a sample dataset to show possible ways in which users can better understand and engage with the data, a necessary step to create demand and action.

• Team 3: “Spread what you find”: The third and final group focused on the effective dissemination of findings. They presented an app that allowed users working with the data to capture their main findings in a flash card form that could then be shared with a much greater audience through social networks like Twitter and Facebook.

Whether the developers intended for this or not, their apps, taken together, presented an overarching point of guidance for this challenge and for how the design process will continue in the coming sessions in the run up to June. The overall message: There is no silver bullet answer to the issue of how to create the greatest value and impact from the open-data revolution. The best approach will be that which offers a series of steps and processes along the three stages identified. Whether this is done through one single app or a collection of complementary ones remains to be seen, as the competitive juices kick-in and the teams of developers take the design challenge to the next level.

Open Data in Extractives: Meeting the Challenges- Guest Post by Tim Davies, Practical Participation

There’s lots of interest building right now around how open data might be a powerful tool for transparency and accountability in the extractive industries sector. Decisions over where extraction should take place have a massive impact on communities and the environment, yet often decision making is opaque, with wealthy private interests driving exploitation of resources in ways that run counter the public interest. Whilst revenues from oil, gas and mineral resources have the potential to be a powerful tool for development, with a proportion channeled into public funds, massive quantities of revenue frequently ‘go missing’, lost in corruption, and fuelling elements of a resource curse.

For the last ten years the Extractive Industries Transparency Initiative has been working to get companies to commit to ‘publish what they pay‘ to government, and for government to disclose receipts of finance, working to identifying missing money through a document-based audit process. Campaigning coalitionswatchdogs and global initiatives have focussed on increasing the transparency of the sector. Now, with a recognition that we need to link together information on different resources flows for development at all levels, potentially through the use of structured open data, and with an anticipated “data tsunami” of new information on extractives financials anticipated from the Dodd-Frank act in the US, and similar regulation in Europe, groups working on extractives transparency have been looking at what open data might mean for future work in this area.

8713819458_08a1bf9c10_z

Right now, DFID are taking that exploration forward through a series of hack days with Rewired State under the ‘follow the data’ banner, with the first in London last weekend, and one coming up next week in Lagos, Nigeria. The idea of the events is to develop rapid prototypes of tools that might support extractives transparency, putting developers and datasets together over 24 hours to see what emerges. I was one of the judging panel at this weekends event, where the three developer teams that formed looked respectively at: making datasets on energy production and prices more accessible for re-use through an API; visualising the relationship between extractives revenues and various development indicators; and designing an interface for ‘nuggets’ of insight discovered through hack-days to be published and shared with useful (but minimal) meta-data.

In their way, these three projects highlight a range of the challenges ahead for the extractives sector in building capacity to track resource flows through open data:

  • Making data accessible – The APIfy project sought to take a number of available datasets and aggregate them together in a database, before exposing a number of API endpoints that made machine-readable standardised data available on countries, companies and commodities. By translating the data access challenge from one or routing around in disparate datasets, to one of calling a standard API for key kinds of ‘objects’, the project demonstrated the need developers often have for clear platforms to build upon. However, as I’ve discovered in developing tools for the International Aid Transparency Initiative, building platforms to aggregate together data often turns out to be a non-trivial project: technically (it doesn’t take long to get to millions of data items when you are dealing with financial transactions), economically (as databases serving millions of records to even a small number of users need to be maintained and funded), socially (developers want to be able to trust the APIs they build against to be stable, and outreach and documentation are needed to support developers to engage with an API), and in terms of information architecture (as design choices over a dataset or API can have a powerful affect on downstream re-users).
  • Connecting datasets – none of the applications from the London hack-day were actually able to follow resource flows through the available data. Although visions of a coherent datasphere, in which the challenge is just making the connection between a transaction in one dataset, and a transaction in another, to see where money is flowing, are appealing – traceability in practice turns out to be a lot harder. To use the IATI example again, across the 100,000+ aid activities published so far less than 1% include traceability efforts to show how one transaction relates to another, and even here the relationships exist in the data because of conscious efforts by publishers to link transaction and activity identifiers. In following the money there will be many cases where people have an incentive not to make these linkages explicit. One of the issues raised by developers over the hack-day was the scattered nature of data, and the gaps across it. Yet – when it comes to financial transaction tracking, we’re likely to often be dealing with partial data, full of gaps, and it won’t be easy to tell at first glance when a mis-match between incoming and outgoing finances is a case of missing data or corruption. Right now, a lot of developers attack open data problems with tools optimised for complete and accurate data, yet we need to be developing tools, methods and visualisation approaches that deal with partial and uncertain data. This is developed in the next point.
  • Correlation, causation and investigation – The Compare the Map project developed on the hack day uses “scraped data from GapMinder and EITI to create graphical tools” that allow a user to eye-ball possible correlations between extractives data and development statistics. But of course, correlation is not causation – and the kinds of analysis that dig deeper into possible relationships are difficult to work through on a hack day. Indeed, many of the relationships mash-ups of this form can show have been written about in papers that control for many more variables, dealing carefully with statistically challenging issues of missing data and imperfectly matched datasets. Rather than simple comparison visualisations that show two datasets side by side, it may be more interesting to look for all the possible statistically significant correlations in a datasets with common reference points, and then to look at how human users could be supported in exploring, and giving feedback on, which of those might be meaningful, and which may or may not already be researched. Where research does show a correlation to exist, then using open data to present a visual narrative to users about this can have a place, though here the theory of change is very different – not about identifying connections – but about communicating them in interactive and engaging ways to those who may be able to act upon them.
  • Sharing and collaborating – The third project at the London hack-day was ‘Fact Cache‘ – a simple concept for sharing nuggets of information discovered in hack-day explorations. Often as developers work through datasets they may come across discoveries of interest, yet these are often left aside in the rush to create a prototype app or platform. Fact Cache focussed on making these shareable. However, when it was presented discussions also explored how it could make these nuggets of information into social objects, open to discussion and sharing. This idea of making open data findings more usable as social objects was also an aspect of the UN Global Pulse hunchworks project. That project iscurrently on hold (it would be interesting to know why…), but the idea of supporting collaboration around open data through online tools, rather than seeing apps that present data, or initial analysis as the end point, is certainly one to explore more in building capacity for open data to be used in holding actors to account.
  • Developing theories of change – as the judges met to talk about the projects, one of the key themes we looked at was whether each project had a clear theory of change. In some sense taken together they represent the complex chain of steps involved in an open data theory of change, from making data more accessible to developers, creating tools and platforms that let end users explore data, andthen allowing findings from data to be communicated and to shape discourses and action. Few datasets or tools are likely to be change-making on their own – but rather can play a key role in shifting the balance of power in existing networks or organisations, activists, companies and governments. Understanding the different theories of change for open data is one of the key themes in the ongoing Open Data in Developing Countries research, where we take existing governance arrangements as a starting point in understanding how open data will bring about impacts.

In a complex world, access to data, and the capacity to use it effectively, are likely to be essential parts of building more accountable governance across a wide range of areas, including in the extractives industry. Although there are many challenges ahead if we are to secure the maximum benefits from open data for transparent and accountable governance, it’s exciting and encouraging to see so many passionate people putting their minds early to tackling them, and building a community ready to innovate and bring about change.

Note: The usage of ‘follow the data’ in this DFID project is distinct from the usage in the work I’m currently doing to explore ‘follow the data’ research methods. In the former, the focus is really on following financial and resource flows through connecting up datasets; in the latter the focus is on tracing the way in which data artefacts have been generated, deployed, transferred and used in order to understand patterns of open data use and impact.

 This entry was originally posted on http://www.timdavies.org.uk. Tim Davies is the co-director of Practical Participation,Web Science/Social Policy PhD student at the University of Southampton, and open data research coordinator with the Web Foundation

See the original post here

Geeks try their hand at mining…data – Guest Blog by Luke Balleny, Thomson Reuters Foundation

A man types on a computer keyboard in Warsaw, on February 28, 2013. REUTERS/Kacper Pempel

A man types on a computer keyboard in Warsaw, on February 28, 2013. REUTERS/Kacper Pempel

Stepping into the cavernous room in the heart of London’s West End last Saturday, I felt very much out of my comfort zone. While everyone there spoke fluent English, it was filled with people who spoke a different language than I do – the language of computer code.

I had come to report on and judge an event called “Follow the Data Hack Day: London”, part of a much-needed effort to help extractive transparency campaigners and the media to make sense of the impending tsunami of data that will come out of the oil, gas and mining industry in 2014. (If you’re wondering, there’s more on what a “hack day” is just a few paragraphs down.)

The U.S has passed a law that forces oil, gas and mining companies listed on the U.S. stock exchange to publish the royalties and taxes that they’ve paid to the governments of the countries in which they operate. The payments will be listed on a project-by-project basis and cover anything over $100,000. The European Union is expected in June to sign off on a similar law that will cover European extractive companies.

While transparency campaigners have lobbied long and hard for these two laws, most would acknowledge that they are not yet ready for what is expected to be an enormous amount of data. In what format will the data arrive? Will it be easily comparable across countries and across projects? Are the campaigners numerically literate enough to be able to interpret the data? “Follow the Data Hack Day”, organised by professional hack day coordinators Rewired State was set up to address these very issues.

For the uninitiated, the concept of a hack day, or “hackathon”, is to bring together a group of technology developers for an intense 24-hour period (sometimes longer) during which the developers are tasked with creating a piece of software or an app that addresses a particular challenge. The objective of the “Follow the Data Hack Day” was to create something that would either help alleviate poverty in resource-rich countries or highlight discrepancies in the extractive data that could point to possible corruption.

GEEKS LEARN GOVERNANCE

The day started with introductions from Rewired State and the UK’s Department for International Development (DFID) and was quickly followed by short speeches from the judges as to what they were looking for in the apps. It should be noted that the developers were not campaigners or governance experts, so in order to help them understand why their apps would be important, they received a crash course in extractive transparency, corruption and the resource curse from Marinke van Riet, the international director of the Publish What You Pay (PWYP) coalition and a member of the judging panel.

Following the introductions and the setting of the challenge, I left to enjoy the rest of my Saturday while the developers toiled away on their projects.

A little over 24 hours later I returned to find the hackers finishing up their projects. In line with the cliché surrounding late-night computer programming, large quantities of pizza had been consumed as they had worked long into the night. And while I wasn’t there to see if for myself, I was assured that some of the developers hadn’t even left the room, simply laying sleeping bags down on the floor in order to save crucial time as they raced against the clock.

The 10 hackers had initially split themselves into two teams of five, but one of the hackers split off to pursue his ideas by himself.

Perhaps surprisingly, the hackers created three very different products that they presented to the audience of judges, transparency campaigners and fellow hackers.

The first was a website that allowed the user to compare and contrast data from the oil industry across countries and included figures such as oil output, price and estimated barrels of oil still to be explored. While not visually particularly impressive, the technical aspects behind the front end of the site were what wowed the plaudits.

The second team to present their app was the only team to directly attempt to meet the challenge set by the organisers, in that they made an effort to analyse the relationship between resource wealth and poverty via a number of interesting and insightful graphs.

The third presentation was by the individual who had split off from one of the other teams. His skills were in web design as opposed to coding and developing, and this showed in the result. He had designed a website that would allow campaigners to upload pithy facts about the extractive industry and either share the entire website or tweet each fact individually. While there was no specific data component to it, it looked good and was the only project which could have been used right away.

The results of the hackathon may not have solved the resource curse, but they were a fascinating look at what a group of talented hackers could do given 24 hours and a few spreadsheets-worth of data. If their efforts and their creativity are representative of what is possible once extractive companies are forced to publish what they pay in 2014, the extractive transparency movement should have no problem holding governments and extractive companies to account.

Drilling Down into the Data -Harriet Macdonald-Walker, Department of International Development

Last weekend (4-5th May) the follow the data journey began. DFID alongside Rewired State,  World Bank Institute, Extractives Industry Transparency Initiative (EITI), and Revenue Watch Institute hosted our first Hackathon.

We welcomed 10 developers to the Westminster Hub in London, and asked them to spend the weekend working alongside extractive industry experts, Open Data specialists and DFID economists to transform publicly available global extractives data into user friendly data apps and tools.

A few days before the hackathon, we gave the developers three challenges, and access to a database of extractives data- providing them with the time needed to get to grips with the data, and to begin thinking about how they could tackle these challenges.

The first day began with a morning of introductions, aimed to brief the developers on what kind of products we were looking for. Presentations included:

  1. ‘Linking the extractives industry to poverty alleviation’
    DFID representatives
  2. Introductions from Judges, including data user perspectives on the importance of creating data tools- Marinke VanRiet, International Director of PWYP, Naomi smith , Earth Security Initiative and Luke Balleny, Thomson Reuters Foundation
  3. Using trade to examine the new political economy of resources-
    Jaako Kooroshy and Felix Preston – Senior Research Fellows, Chatham House.

The rest of the first day was spent analysing and sorting through the data available, brainstorming ideas and settling down into teams.
With decisions made and the sun setting over London, fuelled by Pizza and coke, the developers worked into the night…

Sunday 4th May
After just a few hours’ sleep at the hub, and with time running out, the developers remained concentrated, heads down, working hard to finalise their product before ‘show and tell’.
At 3.30pm the room began to fill with journalists, extractive and open data experts, and other developers – all eager to see what had been created in the last 24 hours. Among those attending were representatives from Global Witness, PWYP, ONE, Thomson Reuters Foundation, and Westminster Foundation for Democracy, and Synergy Global.

The developers did not disappoint, presenting us with three impressive apps:

Team 1- API-fy

The creation of an API (online database of records) that draws data from a variety of datasets and holds data in a central depository. Using the API the team then created virtual country and company cards, almost like top trumps- this enabled clear comparisons to be made across countries and companies. One particular strength of the product was that it identified and highlighted gaps where more data is needed.

Team 2: Comparethemap.com

The team used data to explore the relationship between the extractives industry and poverty alleviation. The team used data from GapMinder and EITI to create graphs and maps which enabled citizens to compare extractives revenues with poverty indicators.

The concept of a game that would enable similar comparisons was also presented but the developers were unable to create the product this weekend due to challenges encountered in data manipulation

Team 3: Fact Cache
A clever yet simple approach, this ‘team’ or impressive one man band, designed an online platform that held ‘facts’ taken from extractives/poverty alleviation data. The idea behind the webpage was to focus on one set of data or one event. Visitors to the site would be encouraged to analyse data and submit their ‘fact’s which could be easily disseminated via social media platforms.There was discussion around the validity of these facts and how the concept could be developed further, perhaps via a fact quality voting function.

And the prizes went too…..!
After much diliberation our judges (now joined by Tim Davies, Pratical Participation and Justine De Davila, DFID) , presented the following awards:

Winner of ‘Best Concept’: Fact Cache
Winner of ‘Best Potential Impact’:  API-fy
Winner of ‘Best Response to the Challenge’: Comparethemap.com

Thanks to the enthusiasm and expertise of all those who attended, the weekend was a success. I am looking forward to seeing what is produced next week at our second Follow the Data hackathon in Lagos (May 13-14th).

SAVE THE DATE LAGOS

Transforming lives……. by Lizzy Whitehead, UK Department for International Development

transforming lives

I’m inspired by the notion that we’re all surrounded by ‘things’ which could transform into
something entirely different, and better.

Extracting oil, gas and minerals is based on the theory that you take one thing (e.g. oil)
and turn it into something else (kerosene). Or you take natural resources below the ground
and turn them into a profitable resource above ground. Decision makers in resource rich
developing countries are faced with many challenges, many transformational choices. To
extract or not? Spend profits or save? Encourage foreign investors or not? Choices are
made all along the resource chain, from geo-mapping to licenses to investment choices
and public spending – all of which will have an impact on poor people living in resource rich
countries.

For the last few months I’ve been inspired by a group of people (Extractive Industries
Transparency Initiative, Revenue Watch Institute, UK Department for International
Development, Rewired State and the World Bank Institute). We’ve come together to work
on a challenge – to take some of the complexity described above and find ways of making
it meaningful to poor people. We want to take complex ‘data’ on oil, gas and mining and
create tools to transform it into stories. Stories which make links between extractives (oil,
gas and mining) and poor people living in resource rich countries.

We want to turn the data into something meaningful so that people can start to demand
accountability from their governments. We hope to inspire a new generation of ‘story
tellers’ who can take messages to a wide range of people in a way which makes sense to
them. For some, that might mean transforming data into a programme for a website. Web
tools like budgiT in Nigeria allow people to analyse complex information and turn it into
something visual and compelling. Data visualisations like this one by Global Witness and
the Open Knowledge Foundation turn complex issues into compelling narratives.

Our goal is to show people how to make the most of the data available on oil, gas and
mining. We want to create demonstration tools which allow people to tell a story about the
data. We want to bring those tools to the right people – like National Coordinators who lead
Extractive Industries Transparency Initiative processes all over the world. We also want to
work with journalists and civil society groups in resource rich developing countries. They
can start to share data stories and build an interest in transparency and accountability.

Our data tools are being created through a range of #followthedata events in London,
Lagos and Sydney between now and June 2013. We’ll be showcasing our journey and our
products at the Mining for Development Conference, the Extractive Industries Transparency
Initiative Global conference and the G8, where the UK will shine a light on transparency.

We’re in an era of transformational technology. More and more organisations are investing
in technology to raise standards of transparency – like Global Integrity, the Omidyar
Network and Random Hacks of Kindness.

Follow our journey, or better still, be part of it. Transform the future of transparency and
accountability.

Follow the Money, Follow the Data – Guest blog by Martin Tisne

Some thoughts which I hope may be helpful in advance of the ‘follow the data’ hack day this week-end:

The open data sector has quite successfully focused on socially-relevant information: fixing potholes a la http://www.fixmystreet.com/, adopting fire hydrants a la http://adoptahydrant.org/. My sense is that the next frontier will be to free the data that can enable citizens, NGOs and journalists to hold their governments to account. What this will likely mean is engaging in issues such as data on extractives’ transparency, government contracting, political finance, budgeting etc. So far, these are not the bread and butter of the open data movement (which isn’t to say there aren’t great initiatives like http://openspending.org/). But they should be:

At its heart, this agenda revolves around ‘following the money’. Without knowing the ‘total resource flow’:

Parents’ associations cannot question the lack of textbooks in their schools by interrogating the school’s budget

Healthcare groups cannot access data related to local spending on doctors, nurses

Great orgs such as Open Knowledge Foundation or BudgIT cannot get the data they need for their interpretative tools (e.g. budget tracking tool)

Investigative journalists cannot access the data they need to pursue a story

Our field has sought to ‘follow the money’ for over two decades, but in practice we still lack the fundamental ability to trace funding flows from A to Z, across the revenue chain. We should be able to get to what aid transparency experts call ‘traceability’ (the ability to trace aid funds from the donor down the project level) for all, or at least most fiscal flows.

Open data enables this to happen. This is exciting: it’s about enabling follow the money to happen at scale. Up until now, instances of ‘following the money’ have been the fruit of the hard work of investigative journalists, in isolated instances.

If we can ensure that data on revenues (extractives, aid, tax etc), expenditures (from planning to allocation to spending to auditing), and results (service delivery data) is timely, accessible, comparable and comprehensive, we will have gone a long way to helping ‘follow the money’ efforts reach the scale they deserve.

Follow the Money is a pretty tangible concept (if you disagree, please let me know!) – it helps demonstrate how government funds buy specific outcomes, and how/whether resources are siphoned away. We need to now make it a reality.

Written by Martin Tisne – Original: http://tisne.org/2013/05/01/follow-the-money-follow-the-data/