A HUB FOR IDEAS, DEBATE AND RESOURCES ON HOW THE WORLD IS DOING ON INTERNATIONAL DEVELOPMENT GOALS
Could Big Data provide alternative measures of poverty and welfare?
This is the fifth in a series of blogs that debate how a post-2015 framework ought to measure poverty - find out more.
Emmanuel Letouzé is a PhD Candidate at UC Berkeley and a regular consultant for the UN and the OECD, currently serving as a Non-Resident Adviser at the International Peace Institute and an adviser on Big Data for the OECD-Paris 21 initiative. He previously worked as senior development economist on the UN Global Pulse team where he wrote 'Big Data for Development: Challenges and Opportunities'.
’Google knows more, or is in a position to know more, about France than INSEE [National Institute of Statistics and Economic Studies]’, two French scientists wrote in an op-ed published in Le Monde in January. In the context of developing countries, the question raised by this bold claim is: could Big Data help us know more about poverty and welfare, including, or perhaps especially, in places where the dearth of traditional data is often turning poverty monitoring and forecasting into an exercise in guesstimation? Could the Big Data revolution contribute to fixing part of the ‘statistical tragedy’?
The underlying argument is that these new kinds of data, stemming from individuals and communities as they go about their daily lives, contain insights into their experiences that we can mine to help them in return. This idea can be traced back to a much-cited 2009 paper, which found that light emissions picked up by satellites could track GDP growth.
Since then, widely cited evidence that Internet-based data could be used to monitor inflation in real-time and allow digital disease detection, as well as construct economic indicators to forecast the present, and build a ’real-time growth index’, among many other applications, have given weight to the promise. Cell-phone Call Detail Records (CDRs), which capture the time, location, recipients’ location etc. of each call, have also helped model malaria spread, unveil reciprocity giving in the aftermath of disasters, and study internal migration.
So it seems only logical, and very appealing, to claim that the same data and tools could be deployed to monitor poverty, and may even be conducive to a leap-frogging of statistical systems. Although the term Big Data is absent from the report of the High-Level Panel on the post-2015 framework, it is hard not to read it between the lines of the development data revolution it sketches.
But conceptual clarity, practical guidance, ethical considerations and innovative foresight have too often been lacking, leaving an open field for sceptics who have long stressed the risks and challenges of Big Data or insisted that the real revolution is small data (or long data). Findings that Google got flu wrong this year in the US have cast additional doubt on Internet-based data’s reliability, representativeness, and thus relevance, to inform policy decisions, while the revelations about PRISM have raised concerns over privacy to a whole new level. But recent publications and debates have shed direct light on some of the specific promise, challenges and requirements of leveraging Big Data to improve current, and perhaps develop alternative, measures of poverty and welfare.
In particular, a paper showed that cell-phone records from a major city in Latin America could help predict socioeconomic levels, poverty’s first cousins. This was done by matching CDR-inferred behavioural data and official statistics on socio-economic levels, using supervised machine learning techniques to unveil how differences in socioeconomic levels typically ‘showed’ in cell-phone data, and back. This example illustrates a key and seemingly purpose-defeating requirement for developing models and algorithms able to translate digital data into indicators of the social world: the availability of ‘ground truth’ indicators of the social world (such as survey data) used to build and validate the models.
But this does not mean that Big Data is useless, or rather superfluous, in such contexts: indeed, assuming a sufficiently high and time-resistant level of accuracy (internal validity), CDR data would then provide some sense of changes in socio-economic levels that would not get captured until the next official survey.
The problem is evidently more acute in places where no such data exist, ie precisely where alternative indicators are most needed. One avenue would be to apply ‘matching’ rules developed elsewhere to local CDRs. But the resulting ‘alternative’ indicators will be highly conjectural because the underlying algorithm may not pass the test of external validity: applying a model matching CDRs and socio-economic levels developed using CDRs and Demographic and Health Surveys (DHS) data from Côte d’Ivoire to a neighbouring country, may yield misleading values because of cross-country differences. In such a case, the question is: is any data better than no data at all?
Another recent paper studying the impact of biases in mobile-phone ownership on estimates of human mobility inferred from CDRs is also worth mentioning for two reasons. One is its key finding: that CDR-based estimates of mobility appeared to be surprisingly robust to substantial biases in phone ownership, which may turn out to be equally true for measures of welfare. The other is its research question and method: asking how accurate a picture of the social world some Big Data streams may paint, given, or in spite of, their inherent biases, drawing (again) on survey data as ’ground truth data’.
Noteworthy investment and progress are also visible in the critical strand of research (and advocacy) on privacy-preserving analysis. In particular, researchers, using CDRs for mobility analysis too, have developed an algorithm that uses an emerging technique known as ‘differential privacy’ that injects ‘noise’ into the model at points in order to reduce the likelihood of individual re-identification.
Although not directly concerned with poverty these papers are important because they point specifically to the methodological avenues and leads that need to be explored to develop privacy-preserving Big Data capacities that may, in time, help monitor poverty.
It is also crucial to note that Big Data is not only about data production (and analysis), but also about data consumption (and exchange). If we care about adequately monitoring human welfare, we should account for the consumption of free data. Think of the hours spent on social media in cyber-cafés, and increasingly on cellphones, around the world, that provide a ‘consumer surplus’ not captured in any official statistics. The caveat may not apply to the poorest of the poor, but there is no reason to consider that a problem receiving increasing attention in developed countries is irrelevant to developing countries where Internet penetration is growing much faster. In other words: Big Data do not stand apart from the quantities and phenomena to be measured but add to the measurement problem.
The related, and perhaps even more critical, point here is that the rise of data-driven activities is deemed to render GDP (and GDP per capita) less and less relevant over time as the measure of human welfare it was never intended to be. The argument that monetary poverty and GDP per capita are very crude indicators of human progress is not new, but Big Data may prove instrumental in devising true alternative measures.
In particular, the growing availability of such rich individual data about people’s behaviors and desires will offer new options for communities to capture, monitor and improve their own welfare in ways that may increase local empowerment through Big Data—very far from the misleading notion that Big Data is about offering a 30,000 foot view of the world.
A few take-away messages emerge. First, for the purposes of poverty monitoring or development more broadly, “Big Data” is not about size, but about the qualitative nature of these data trails—what some refer to as “digital breadcrumbs”. Second, Big Data is not even primarily about the data but about the carefulness of their analysis, which requires even more, not less, contextual and ethnographic grounding. Third, Big Data is also about data consumption, not just production. Lastly, much more conceptual, empirical and methodological work is needed before Big Data can be leveraged concretely and safely for poverty monitoring; but Big Data may in time fundamentally change how we measure, and perhaps even fight, poverty.
Other contributions to our debate on measuing poverty come from Martin Ravallion on two goals for fighting poverty, Lant Pritchett on the case for a high global poverty line, Stephan Klasen's argument for internationally coordinated national poverty measurement, Sabina Alkire's proposal for a multidimensional poverty index post-2015 and Amanda Lenhardt on the need for disaggregated poverty measurement.