Beyond the hype: Data Science, democracy and the limits of measurement – Friday’s Five with Professor Simon Munzert
In this interview, Simon Munzert reflects on how computational methods have reshaped social research. He discusses the promises and limits of data-driven approaches, the rise of machine learning and online behavioral data, and the risks of technical solutionism. Munzert also examines the societal role of political forecasting, the challenges of communicating uncertainty, and the feedback effects of models on democratic behavior. Looking ahead, he outlines key research and policy priorities – from climate change and democratic backsliding to AI regulation – and argues for why deep technical skills and critical thinking remain essential in the age of generative AI.

With the rise of data science and computational methods, social research has gained a new set of possibilities. What do you think has changed since we started using these methods more intensively? What can we do now that we couldn’t do before, and are we seeing real benefits from this shift?
That’s a very good question. I think what we are seeing is a mix of continuity and disruption. Social science assisted by computers did not start in the 2000s or 2010s, we need to go back to the 1960s and 1970s, when researchers already began simulating social processes using computers. So computational social science, as a cornerstone of data science, is still rooted in these early ideas: that we can at least partially simulate social dynamics and learn from emergent phenomena.
What has changed over the last 10 to 20 years is mainly accessibility and scale. I started studying in 2005, trained by people who already used computers in social science research. I studied political science and public administration at the University of Konstanz, which I later realized was a place strongly focused on empirical and quantitative methods. Back then, software like Stata was dominant; later, R became more widespread. I was among the early adopters in my environment, partly due to the power of the internet and the open-source movement.
Access to computational tools has been democratized considerably. You no longer need proprietary software or expensive machines. Social science research can be relatively cheap, [as] we don’t need large labs or massive teams to collect data. Over the last 10 to 15 years, large amounts of data have become easily accessible online.
This connects to a second major trend: more and more social life happens online. This doesn’t just create data – it creates new social phenomena that deserve scrutiny. Questions such as how social platforms affect politics, communication, preferences and attitudes are now central.
The third major trend is the rise of machine learning. While the fundamental technologies have existed for decades, their widespread adoption has led to major improvements in measurement, prediction and, to some extent, explanation. Machine learning has become mainstream and part of everyday life – you can literally talk about that with your relatives. That level of visibility is genuinely new.
Do you think this trend comes with the risk of technical solutionism — treating social science as only what is quantifiable and measurable, potentially moving away from deeper qualitative research questions?
I’ll answer from a scientific perspective, ignoring the societal level risk of, say, ‘tech bros’ going into politics. Yes, there is a risk, but this is not a binary issue. The limits of measurement have always existed. What I primarily see are positive developments: creative uses of data that allow us to observe phenomena directly or indirectly that were previously impossible to observe. But I do see a risk when we take measures for granted and don't think about what it cannot observe. There's a lot of stuff that we still cannot observe and that doesn't mean it's not there. That has been a continuous challenge for almost every domain. Ask astronomers, ask biologists, you get the same answer.
I’m happy to provide one example from my own research on media consumption and its effects on political behavior. Traditionally, researchers measured media consumption by asking people what they consumed daily or weekly. Even before the internet, this approach had clear limitations, because there's only so much you can do with reported data. Today, we can observe media exposure ‘in the wild,’ including inadvertent exposure such as doom-scrolling or content people do not actively seek out. This is not only interesting from a measurement perspective, but also crucial for understanding how changes in media diets shape political attitudes and perceptions.
That said, this does not mean we should abandon qualitative perspectives. I am not the person to do that since I was trained primarily in quantitative methods, but I enjoy qualitative accounts of data-rich phenomena just as much. For example, even with large-scale tracking data, it would be entirely possible and potentially very insightful to focus on individual stories, examining how people spend their digital day, perhaps enriched with qualitative data. To my knowledge, this has rarely been done, but it is clearly feasible.
You have worked with forecasting models, such as electoral and political forecasts. What role do these models play in today’s society, especially in terms of political communication and public perception? Are there risks or limitations we should be aware of when presenting these models to the public?
Yes, there are certainly risks. But forecasting is not unique to politics. Consider weather forecasts. They have become more reliable over time, just as election forecasts have. Despite public narratives about failure, election forecasts have not become less accurate.
With weather forecasts, people have learned to consume them with caution. If rain is predicted and it doesn’t rain, no one storms the weather station in outrage. Election forecasts, however, deal with human behavior, not natural phenomena, which introduces additional complications. One concern is feedback loops: forecasts can become self-fulfilling or self-defeating prophecies. For instance, if voters believe their preferred party will win anyway, they may decide not to vote.
This is something we take seriously in research. I’m involved in an election forecasting project funded by the German Research Foundation, where we not only produce forecasts but also study how people react to them using experimental designs. We know that forecasts can affect behavior under certain conditions.
In Germany, for example, parties must surpass a five-percent threshold to enter parliament. When forecasts suggest a party will fall below that threshold, voters may avoid voting for it, fearing a wasted vote. Small parties have long argued that published polls contribute to this effect, and research provides some evidence supporting that claim.
Communication also matters, especially how uncertainty is conveyed. Uncertainty is essential in forecasting, but many people struggle to interpret it, or it may reduce trust in the results. Despite these challenges, I believe election forecasting and scientifically grounded projections are fundamentally valuable. They can counter false narratives, particularly in authoritarian contexts where official results may be manipulated.
The bottom line for me is: election forecasting and any kinds of scientifically grounded projection of relevant developments are, in principle, a good thing. Think about an autocratic regime that claims a landslide victory, while the polls and the forecasts have said something completely different. Are they wrong, or has the regime a certain interest in a certain outcome?
So I can totally imagine other cases where scientifically grounded models, when well communicated, can make a difference and are important. At the same time, that doesn't mean that we shouldn't think about negative consequences, because they can be real.
Looking ahead, what research questions or social policy challenges do you think should be prioritized in the next five to 10 years within social science?
There is certainly no shortage of important questions. Many crucial topics are already being studied, though not all of them at Hertie.
One issue that will not go away is climate change, along with its far-reaching consequences. Beyond immediate meteorological effects such as droughts, floods and disasters, climate change will affect all aspects of human life: international security, migration, resource scarcity and conflict. I’m personally involved in the Lancet Countdown on climate change and public health, which examines how climate change affects public health worldwide. Despite established scientific evidence, there is still widespread ignorance and political avoidance of this issue.
A second major concern is the backlash against democracy, how people want to be governed and how governments act. I’m not sure if Europe will necessarily remain stable in democratic terms over the next 10 to 15 years.
The third challenge is technological change, particularly AI (Artificial Intelligence). I believe the honeymoon phase is over. AI will not disappear, but many of the evangelistic claims about its transformative potential are unrealistic. There are clear limitations and significant downstream costs, including environmental ones linked to massive AI infrastructures.
Out of the three challenges that I sketched out, AI may be the easiest to tackle from a policy perspective, but I currently don’t see sufficient action being taken. Climate change requires a massive global effort. We've seen this effort in multiple forms and formats for decades now. And it's still super difficult. Democratic backlash? We'll see. But the problem is not the backlash only, but its roots. And those are very difficult to tackle.
Out of those three issues – regulation of AI, technologies and related companies – the first one is the most focused. If a few countries in the world or supranational bodies took proper action, I think that is within reach. But, I don't see that happening at the moment.
In the age of generative AI and ‘vibe coding,’ what skills should students and researchers in social data science prioritize? Is it still worth investing heavily in coding skills?
Absolutely. This is something I think about a lot as a teacher. I have taught my data science course multiple times, and each iteration has required adaptation, especially with the rise of generative AI.
We’ve seen similar debates before. Around 10 to 15 years ago, MOOCs (Massive Open Online Courses) were seen as a potential replacement for universities. People said, why would you want to go to university anymore? You can just take this MIT (Massachusetts Institute of Technology) course online; go on Coursera, you have it all there. And guess what? Universities didn't go away.
When it comes to technical skills – not just using packages, but building code, analyses and products from scratch – this still requires significant intellectual effort that cannot be delegated to automated systems.
Learning these skills involves frustration, failure and constant feedback, much like learning a new language. These experiences are essential for deep understanding, but they can be corrupted if learning is overly outsourced to AI tools.
The takeaway for me is that we must continue teaching technical skills, critical thinking and applied skills together. It’s the combination that makes people valuable and hard to replace. Students should not rely on autopilot, but rather become the pilot, ready to handle emergencies.
This also matters for policymaking. Many people involved in AI regulation do not fully understand the technologies they are regulating, which is problematic. We should aim to be well positioned to train people who can think across disciplines, communicate with different stakeholders and make informed decisions. That is something we should continue to cultivate.
_________________________________________________________________________________
Simon Munzert is professor of Data Science and Public Policy at the Hertie School in Berlin and director of the Data Science Lab. His research focuses on political communication, attitude formation in the digital age and the use of online data in social research. He regularly teaches causal inference, online data collection and general computational social science. He received his Doctoral Degree in Political Science from the University of Konstanz.