Size doesn’t matter

…at least when it comes to information!

As the statistical nerds amongst you may be aware, there is a bit of a backlash going on against “big data”, fueled partly by discovery of the hubris in Google’s attempts to predict flu outbreaks. Tim Harford is thankfully on the front line, as well as the Economist.   Health sector colleagues have also focused on the particular limitations of the application of “big data” philosophy to health.

The conclusions that these critics have come to is one that those who have any knowledge of statistics had probably already drawn: apart from for some very specific purposes, there is very little to be gained using “big” datasets. Once you have a good conceptual understanding of what is generating your data, the value of an additional data point drops exponentially after the first 10 or 20.

This can be demonstrated using an example from physics. Start off by talking to your friendly neighbourhood physicist, and she will tell you that there is a law to explain the relationship between the temperature of a strip of copper and its length. Armed with this knowledge you can then perform an experiment to test it, observing a strip of metal at different temperatures. This will give you a graph that might look like this:


With only 10 data points it is trivial to verify the prediction of how copper will expand. Another 10 is not going to change your conclusion, nor make you much more certain about it:


Much of the noise about big data has come from the IT industry, for whom big data does present some non-trivial problems. For example, there is the oft-mentioned fact that Rolls-Royce jet engines generate hundress of gbs of data every second. Getting this data off a plane, stored somewhere and fed through some system for detecting malfunctions is a real feat of computing.

I’m not going to reiterate the flaws in unconditional claims of boosters of Big Data as others mentioned above have done this much more eloquently. My plea is much more practical and relates to something we all have a personal interest in.

The scandal surrounding  quite reasonably frightened a lot of people. Most people in the UK view their medical records as very personal information and the way the potential re-use of this information was presented left a lot to be desired.

However, the real tragedy of this episode was that it delayed for years an innovation that could be the most powerful force for improving the effectiveness of the NHS, and reducing its costs. This innovation is linked data. It is not a complicated idea, just that of joining up information so that information about how someone is treated in one place is joined up to how they are treated in another.

As described in the article in the Health Services Journal by Axel Heitmueller and Sandy Pentland (linked above, and again), joining up multiple data sets, in particular across different care settings (for example acute hospitals and community providers) brings many more benefits than ‘big’ data as commonly shouted about.  The aim of was really to facilitate medical research.  However, for the NHS at the moment it is much more important that current treatments are delivered more efficiently.  Rather than glamorous product innovation to create new treatments, this means process innovation: making things work better.

The primary benefit of linked data is in helping healthcare providers and regulators better understand patterns of healthcare and provide a more seamless journey for us, the patients.  Everyone has an incentive for this to happen – patients obviously have a better time if the people caring for them can talk to each other and provide a smooth journey between services, and as documented by the vast literature on Lean, providers invariably save money when customers/patients have a better journey.

Rather than being distracted on the one hand by silly market fads about big data, and on the other by the merits of sharing medical data, let’s all demand something that undeniably benefits patients and also has the potential to save the NHS a lot of money.

Here is a message to go and repeat wherever you can – “link my data!”


