I’ve recently read two great books about Big Data as it’s something we’re thinking about more and more at Nesta when we design our research projects.

I started with The signal and the noise by Nate Silver who is famous for correctly predicting the 2008 US election results in 49 of 50 states. He tells some interesting stories about the failure of prediction, for instance in foreseeing the global financial crisis or many recent major political changes. I found the chapters on predicting baseball outcomes, chess and poker quite tedious, whilst the ones on the weather and earthquakes were much more engaging. But it was halfway through the book when things got really interesting for me.

This is where Silver explains two very different approaches to understanding the world: Bayes versus Fisher. Fisher developed the terminology of the statistical significance test and the set of statistical methods that I used when I used to design, run and analyse large government sample surveys for a living (amongst other things). You collect information from a (large) sample, weight this back to the general population to make sure it’s representative, report your confidence intervals (estimates of margins of error) and run statistical significance tests to ensure that the results (eg differences between men and women) are not just due to chance.

But Silver argues that instead it’s better to use Bayes’ theorem to think probabilistically about the world and to factor in prior probabilities, rather than assuming that Fisher’s methods used properly will mean that error approaches zero. Fisher’s methods rest on assumptions (eg that underlying uncertainties in measurements follow a bell curve/normal distribution). But Bayesian methods demand you challenge these assumptions and stop and think about prior probabilities. So Bayes is better at taking into account messy real-world context and allowing us to strive to be less subjective, rather than aiming to achieve perfect objectivity. Fascinating stuff.

I then went on to read Big Data by Kenneth Cukier and Viktor Mayer-Schonberger. This is a great summary of the arguments made by proponents of big data. It uses lots of interesting examples of where big data and algorthymns have transformed industries like retail, where the data knows exactly what we want to buy before we even know ourselves. Clearly, we need to get better at using big data to understand the world around us.

But, my first caveat to all of this is that in some areas this is easier to do than in others. There is a lot of data out there on things like the price of flights and what books we buy. The power of big data, together with people willing to work for the common good, means we can now all help search scans for cancerious cells or search images of the skies for new stars. But big data is only powerful when a lot of data is available about the thing we are interested in. We may not be able to use big data to find out how much time people spend contributing to community projects, and sample surveys may still end up being the only way to do this. My second caveat is that we make sure we don’t throw out the causation baby with the correlation bathwater.

But what does all of this mean for how we should be approaching research in the big data era? We’re now doing a couple of new pieces of research at Nesta to explore the power of big data in helping us understand public and social innovation. Firstly, using big data in healthcare to see what we know about the adoption of innovations by Doctors, and, secondly, to see what big data can tell us about the size and nature of the social economy in the UK. So watch this space.