The way data will be optimized is changing. It is not enough to know single lines of information. Data must be connected and multi layered to be relevant. It means knowing not one thing or ten things or even 100 things about consumers but tens and hundreds of thousands of things. It is not big data but rather connected data – the confluence of big data and structured data – that matters. Furthermore, with the growth in social tools, applications and services, the data in the spider’s web of social networks will release a greater value. In the UK alone, YouGov now knows 120,000 pieces of information about over 190,000 people. This is being augmented every day. The analysis of this allows organisations both public and private to shape their strategy for the years ahead.
We are also growing a huge data-store of over a million people’s opinions and reported behaviours. These are explicitly shared with us by our panelists to use commercially as well as for wider social benefit (indeed we pay our panelists for most of the data shared).
But many companies exploit data that has been collected without genuine permission; it’s used in ways that people do not realize, and might object to if they did. This creates risks and obstacles for optimising the value of all data. Failure to address this will undermine public trust. We all have the right to know what data others have and how they are using it, so effective regulation about transparency and the use of data is needed. Europe is leading the way in this respect.
Governments, however, are the richest sources of data, accounting for the largest proportion of organized human activity (think health, transport, taxation and welfare). Although the principle that publicly-funded data belongs to the public remains true, certainly in the UK, we can expect to see more companies working with, through and around governments. Having the largest coherent public sector datasets gives Britain huge advantages in this new world
It is clear that encouraging business innovation through open data could transform public services and policy making, increasing efficiency and effectiveness. In the recent Shakespeare Review it was found that data has the potential to deliver a £2bn boost to the economy in the short-term, with a further £6-7bn further down the line. However, the use of public data becomes limited when it involves private companies. To address this in the future, when companies pitch to work with governments, preference should be given to those that share an open data policy, or at least the relevant parts. Furthermore, where there is a clear public interest in wide access to privately generated data – such as trials of new medicines — there is a strong argument for even greater transparency.
Aside from governments (whose data provision is by no means perfect) access to large, cheap data sets is difficult. The assumption is that everything is available for crunching and that the crunching will be worth the effort. But the reality is that there are different chunks of big data – scientific, business and consumer – which are collected, stored and managed in multiple ways. Access to relevant information let alone the crunching of it will take some doing. On top of this, much corporate and medical data is still locked away, stuck on legacy systems that will take years to unpick. Many would say the sensible thing is to adopt a policy of standardization, particularly for the medical industry, given the growing number of patients living with complex long-term conditions. And yet, many standards abound. So in addition to regulation around transparency, over the next ten years we can expect to see agreement on standardisation in key areas.
But the potential benefits from this wealth of information is only available if there are the skills to interpret the data. Despite Google’s chief economist, Hal Varian, saying that “the sexy job of the next ten years will be statisticians;” number crunchers are in short supply (or at least not always available in the right locations at the right time). By 2018 there will be a “talent gap” of between 140,000 and 190,000 people, says the McKinsey Global Institute. The shortage of analytical and managerial talent is a pressing challenge, one that companies and policy makers must address.
Separately, it is entirely plausible that the infrastructure required for the storage and transmission of data may struggle to keep pace with the increasing amounts of data being made available. Data generation is expanding at an eye-popping pace: IBM estimates that 2.5 quintillion bytes are being created every day and that 90% of the world’s stock of data is less than two years old. A growing share of this is being kept not on desktops but in data centres such as the one in Prineville, Oregon, which houses huge warehouses containing rack after rack of computers for the likes of Facebook, Apple and Google. These buildings require significant amounts of capital investment and even more energy. Locations where electricity generation can be unreliable or where investment is limited may be unable to effectively process data and convert it to useful, actionable knowledge. Yet, it is the growing populations in these same areas – parts of Asia and Africa, for example – that will accelerate data creation, as more of its inhabitants develop online activities and exhibit all the expected desires of a newly emerging middle class. How should this be managed?
 Shakespeare Review: An independent Review of Public Sector Information, May 2013