TL;DR: In this blog post, we will describe a custom clustering algorithm we designed to efficiently cluster grids into enumeration areas for grid-based sampling
The DSEM team at IDinsight is the technical workhorse for project teams, and nearly every piece of technical work we do involves grouping things by some measure of similarity. Let me explain.
In our previous post, we examined how satellite imagery can be used in the social sector and how the MOSAIKS algorithm enables us to draw out “features” from these images without needing complex image-processing models. But the story doesn’t end with the algorithm.
Satellite imagery has become a valuable tool in global development: from environmental monitoring and disaster response to urban planning and agriculture. With more and more high-resolution satellite imagery available as open-source datasets, information about land usage and populations have become widely accessible. But this data also needs advanced analytical techniques to make sense of it.
In Karl Popper’s The Open Society and its Enemies (1945) he introduces “piecemeal social engineering,” his framework for building up social institutions incrementally informed by experimentation and evidence. This is in contrast to the more prevalent “utopian social engineering” of his time which he criticized for overly lofty / abstract ideals that largely ignored practicality; indeed today we might regard such methods as colonial and paternalistic. For Popper, the “piecemeal engineer knows, like Socrates, how little he knows. He knows that we can learn only from our mistakes.”1
In that spirit, I want to begin with lessons we have learned in trying to apply engineering principles and methods to help our partners increase their social impact. I hope that our learnings can be helpful for others in the sector.
As data practitioners, we are separated by vast distances from the ground truth. There is, in one sense, the literal physical distance between our laptop screens and the places and sites of data collection which can cause fidelity losses in context and empathy. There is also a representative distance – in some cases, an asymmetry of power – between the reality of researching and practicing machine learning, of publishing papers, of open-source repositories, of commercial applications – and the labor that goes into each row of data; the families represented by vectors; each interaction is distilled into a potential flag for data quality. In this blog, I hope to illustrate those minutiae and bring together these two worlds.
Amahle is pregnant and soon expecting a new addition to her family. She has been seeking maternal care through South Africa’s national WhatsApp helpline for the past seven months where she frequently consults a help-desk team about pregnancy challenges she’s been facing. At one point she got really worried that she couldn’t feel her baby move and it took a while for the help-desk to get back to her. Soon, she will be able to get instant recommendations with the help of a technology developed by IDinsight that will automatically answer her questions.