Spring Meeting of the “AK Demokratie” 2024
Conference Report: "Should we automate democracy (promotion) and its evaluation?" // House of the Leibniz Association, June 3–4, 2024
On June 3 and 4, 2024, the spring conference of the DeGEval Working Group on Democracy (AK Demokratie) took place in cooperation with the PrEval joint project – Future Workshops, Evaluation, and Quality Assurance in Extremism Prevention, Democracy Promotion, and Civic Education. Held at the House of the Leibniz Association in Berlin, the event brought together around 60 participants, including evaluators, researchers, civil society representatives, officials from federal ministries, and other interested parties, to discuss the role of artificial intelligence (AI) in evaluation. The central questions of the conference were whether AI can strengthen democracy (promotion) and how it can be used for its evaluation. Through keynotes and interactive formats, AI-based data collection methods, knowledge transfer approaches, political and ethical challenges, and data protection issues were explored. The aim was to identify new opportunities that AI may offer for evaluation and democracy (promotion), while also taking risks into account.
After opening remarks by PD Dr. Rainer Strobl (proVal, on behalf of the AK Demokratie spokespersons) and Dr. J. Olaf Kleist (DeZIM Institute, on behalf of the PrEval Future Workshops), the conference began with a panel discussion on AI’s influence on democracy. Rainer Rehak (Weizenbaum Institute) outlined different types of AI and their logic of application—both cautioning against overestimating their capabilities and emphasizing the need for regulation. Dr. Katja Muñoz (German Council on Foreign Relations) critically examined the use of AI for disinformation and the potential manipulation of democratic elections and processes. Dr. Deborah Schnabel (Anne Frank Educational Center) highlighted new opportunities in civic education—for example, through bots or "holograms" instead of eyewitnesses—while also warning against the amplification of racism and antisemitism by anti-democratic actors. The Federal Ministry for Family Affairs, Senior Citizens, Women and Youth (BMFSFJ) pointed to the diverse potential applications of AI within the ministry, such as in research and democracy promotion, but also warned of privacy issues and the dangers of hallucinated outputs. The ensuing discussion concluded that AI will inevitably permeate administrative and educational sectors, but each use case must be critically evaluated—especially considering unforeseeable consequences. AI applications themselves must be evaluated, and AI literacy, like media literacy, must be taught and learned.
In the second part of the conference, various PrEval Future Workshop working groups presented their initial thoughts. Building on the panel discussion, Marcus Kindlinger (University of Duisburg-Essen/PrEval) presented criteria for monitoring AI literacy, particularly for young people and in the education sector. Next, Svetla Koynova (PrEval/Violence Prevention Network) and Moritz Lorenz (PrEval/i-unito) led an interactive session exploring how ChatGPT might support the creation of evaluation designs. The consensus: While generative AI could provide some correct outputs, it lacked the specificity and clarity needed for practical implementation. Without sound evaluation knowledge, general-purpose AI is of limited use—though it can serve as a helpful assistant.
In the evening keynote, moderated by Irina Bohn (ISS/AK Demokratie), Linda Raftree (MERL Tech Initiative) offered a broad perspective. She discussed different types of AI, their commercial data processing, and application examples in evaluation to highlight ethical risks and limitations. The following debate focused on the societal use of AI-generated data and its developmental potential. Raftree emphasized that AI use in evaluation is still in its infancy and that collaborations beyond the major tech companies are needed to unlock the democratic and ethical potential of AI for evaluation purposes.
Day two opened with a welcome from Dr. Mirjam Weiberg (DeZIM Institute/AK Demokratie) and continued with presentations on data collection using AI. Kai Rompczyk (German Institute for Development Evaluation) demonstrated how automatic categorization using AI was implemented transparently in evaluating development projects. Dr. Susanne Friese (Founding Director of Queludra/Max Planck Institute for the Study of Religious and Ethnic Diversity) introduced a tool for assisted analysis of qualitative interviews, illustrating how flexible coding can help address different questions using the same material while avoiding typical pitfalls. Dimitar Dimitrov (GESIS) presented an example of using AI for big data analysis based on tweets during the COVID pandemic and their relation to academic studies, aiming to improve the quality of public discourse. In the panel discussion that followed, moderated by Simon Müller (DeZIM Institute/PrEval), speakers agreed that AI use in social research is still in pilot stages but holds considerable potential for processing data volumes and questions previously out of reach. In a concluding talk on data protection in AI use, Dr. Susanne Friese gave a cautiously optimistic assessment, noting that appropriate tools and settings can protect users—though questions from the audience reflected skepticism about tech companies’ promises and the lack of transparency in AI systems, especially in the gray zone between legality and ethics.
The final keynote, presented by Prof. Dr. Jan Hense and moderated by PD Dr. Rainer Strobl (AK spokesperson team/proVal), examined the intersection of AI and the diverse tasks of evaluators. Hense provided insights into the workings of large language models (LLMs) and reflected on what evaluators actually do, based on studies and surveys. He proposed a matrix matching AI functions with evaluation tasks to determine where and how AI could be useful in the future. He emphasized that AI cannot take over the work and responsibility of evaluation, but can serve as a tool. This sparked discussion not only about evaluators’ competencies but also about whether responsibility should shift to commissioners—who, equipped with the right AI tools, might conduct evaluations themselves. The keynote's title rang true: "Predictions are hard—especially when they concern artificial intelligence."
In summary, the conference provided insight into the current state of AI in the context of democracy (promotion) and evaluation, sparked critical discussions, and cautiously underlined AI’s potential relevance for future evaluation practices.
Report written by Dr. J. Olaf Kleist and Simon Müller (DeZIM).