Looking ahead, I believe the best approach to future-proof access to big data is to ensure there is agreement around its use, not its collection. Governments should define a core reference dataset, designed to strategically identify and combine the data that is most effective in driving social and economic gain. This will then become the backbone of public sector information, making it possible for other organisations to discover innovative applications for information that were never considered when it was collected.
This approach has the potential for huge societal benefit. The shorter-term economic advantages of open data clearly outweigh the potential costs. A recent Deloitte analysis quantifies the direct value of public sector information in Britain at around £1.8bn, with wider social and economic benefits taking that up to around £6.8bn. Even though these estimates are undoubtedly conservative, they are quite compelling.
And yet, at the same time individuals need to be protected. There are instances where, for very good reasons, ‘open’ cannot be applied in its widest context. I therefore suggest we acknowledge a spectrum of uses and degrees of openness.
For example, with health data, access even to pseudonymous case level data should be limited to approved, legitimate parties whose use can be tracked (and against whom penalties for misuse can be applied). Access should also be limited to secure sandbox technologies that give access to researchers in a controlled way, while respecting the privacy of individuals and the confidential nature of data. Under these conditions, we can create access that spans the whole health system, more quickly and to more practitioners, than is currently the case. The result: We gain the benefits of ‘open’ but without a significant increase of risk.
Nor should we consider ‘free’ (that is, at marginal cost) to be the only condition, which maximises the value of public information. There may be some particular cases when greater benefits accrue to the public with an appropriate charge. Finally, as big data unquestionably increases the potential of government power to accrue un-checked, rules and regulations should be put in place to restrict data mining for national security purposes.
We will also have to look to how we focus resources within academia. The massive increase in the volume of data generated, its varied structure and high rate at which it flows, have led to the development of a new branch of science – data science. Many existing businesses will have to engage with big data to survive. But unless we improve our base of high-level skills, few will have the capacity to create new approaches and methodologies that are simple orders of magnitude better than what went before. We should invest in developing real-time, scalable machine learning algorithms for the analysis of large data sets, to provide users with the information to understand their behavior and make informed decisions
We should of course strive for an increased shift in capital allocations by governments and companies to support the development of efficient energy supply and robust infrastructure. These investments can prepare us for serving continued growth in world productivity – and help offset the increasing risk for the massive, destructive disruptions in the system that will inevitably, come with our growing dependency on data and data storage.
Innovation in storage capabilities should also be considered. Take legacy innovation, for example. The clever people at CERN use good old-fashioned magnetic tape to store their data, arguing that it has four advantages over hard disks for the long-term preservation of data: Speed (extracting data from tape is about four times as fast as reading from a hard disk). Reliability (when a tape snaps, it can be spliced back together; when a terabyte hard disk fails, all the data is lost). Energy conservation (tapes don’t need power to preserve data held on them). Security (if the 50 petabytes of data in CERN’s data centre was stored on a disk, a hacker could delete it all in minutes; to delete the same amount from the organisation’s tapes would take years).
The key thing to remember is that numbers, even lots of numbers, simply cannot speak for themselves. In order to make proper sense of them we need people who understand them and their impact on the world we live in. To do this we need to massively spread academia vertically and horizontally, engaging globally at all levels, from universities to government to places of work. The current semi-fractured structure of academia is actually an advantage; it will help us ensure plurality of ideas and approaches. Remember, we’re not just playing with numbers; we’re dealing with fundamental human behaviors. We need philosophers and artists as well as mathematicians, and we must allow them to collectively develop the consensus.
If we get it right, over the next 10 years I would expect to see individuals being more comfortable with living in the metaphorical glass house, allowing their personal information to be widely accessible in return for the understanding that it will enable them to enjoy a richer, more ‘attuned’ life. I would also expect to see a maturing of our individual data usage, a coming of age with regards to appreciating and integrating data and less of a fascination at its very existence. We will also perhaps see a new segment appearing, those who elect to reduce their data noise by avoiding needless posts of photos of their lunch and such.
We will also see a structural shift in employment, markets and economies as the focus in maturing economies continues to shift away from manufacturing and production and toward a new tier of data-enabled jobs and businesses. As we demand more from our data, we will need to match it with a skilled workforce that can better exploit the information available.
After all the noise perhaps it would be wise to remember that big data, like all research, is not a crystal ball and statisticians are not fortune tellers. More information, and the increasing ability to analyse it, simply allows us to be less wrong. I believe that we will have continued growth in world productivity, probably accelerating over the next ten years, even as the risk for massive destructive disruptions in the system increases. There will be huge challenges and even dangers, but I am confident we will be the better for it. Every time humans have faced a bigger crisis, they have emerged stronger. Although we can’t be sure that this will always be the case, now is the time to be bold and ambitious.