LLMs IN
SOCIAL
DECEPTION
GAMES

Macey Baker

Community Engineer, Tessl

Back to podcasts

LLMs in Social Deception Games

13 Mar 2025with Macey Baker

Also available on

AI Security & Safety

AI Tools & Assistants

Industry Insights

Chapters

Introduction to Social Deception Games with LLMs

[00:00:00]

Explaining Split and Steal: A Prisoner's Dilemma Simulation

[00:02:00]

Werewolves Game Mechanics and AI Implementation

[00:04:00]

Building the Games with Claude Projects

[00:09:00]

LLM Performance Analysis: Ethics and Deception

[00:11:00]

Results from Split and Steal: LLM Strategies

[00:24:00]

Notable Interactions and Anecdotes in Gameplay

[00:28:00]

Ethical Implications and Development Insights

[00:42:00]

Future Experiments and Community Engagement

[00:49:00]

Human-LLM Interaction in Gaming

[00:52:00]

In this episode

In this episode of the AI Native Dev podcast, Simon Maple welcomes back Macey Baker, a Community Engineer at Tessl, known for her innovative work in AI-driven interactive systems. Together, they delve into the fascinating experiments with social deception games, Werewolves and Split and Steal, powered by large language models. Macey shares her insights on ethical challenges, LLM behavior, and the unexpected results of these AI-driven experiments. Learn about the setup, implementation, and outcomes of these games and how different LLMs, including OpenAI's models, Anthropic's Sonnet, Llama, and DeepSeek R1, performed under various scenarios.

Introduction

In this episode of the AI Native Dev podcast by Tessl, Simon Maple is joined by Macey Baker, a Community Engineer at Tessl. Macey Baker is recognized for her innovative work in the field of AI, particularly in the development of interactive AI systems and game mechanics. With a strong background in engineering and community engagement, Macey has been instrumental in bridging technical advancements with user-centric design. Her role at Tessl involves not only the technical development of AI-driven projects but also fostering a collaborative environment where technology meets creativity. Together, they dive into the intriguing world of social deception games, Werewolves and Split and Steal, both powered by large language models (LLMs). This discussion explores the ethical behavior and deception potential of LLMs when pitted against each other in these games. Macey shares her insights into the design and results of these experiments, offering listeners an engaging look at AI and human interaction.

In this section, Simon and Macey delve into the mechanics and setup of the social deception games, Werewolves and Split and Steal. These games are designed to test the ethical and deceptive capabilities of LLMs.

Overview of Werewolves and Split and Steal: Werewolves is a game where players are assigned hidden roles, either as villagers or werewolves. The objective is to identify the werewolves among the group. Macey explains, "Werewolves is a social game. Everyone gets assigned a role in secret." On the other hand, Split and Steal is akin to the prisoner's dilemma. Players negotiate to either split or steal a prize, with outcomes varying based on their decisions. Macey describes it as "a prisoner's dilemma simulation where there's really not a right answer."
Game Mechanics and LLM Implementation: The discussion continues with how these games were implemented using LLMs. Macey elaborates on the roles of LLMs as players, simulating human behavior. "The model is like singularly focused on completing the task at hand," Macey notes, highlighting the challenge of balancing ethical behavior with the desire to win.

Analyzing LLM Behavior in Werewolves

This section explores how different LLMs performed in the Werewolves game, focusing on their ethical decision-making and adaptability.

LLM Roles and Ethical Dilemmas: Macey and Simon discuss the ethical dilemmas faced by LLMs in Werewolves. As Macey points out, "They want to complete the task, but they don't want to be deceptive. If you do that in either of these games, you will lose." The conversation sheds light on the models' struggle to balance ethical behavior with the game's deceptive nature.
Performance Variations: The performance of different LLMs, including OpenAI's models and Anthropic's Sonnet, is analyzed. Simon notes, "4o August is very vanilla. It doesn't have a lot of personality." This insight is crucial in understanding how different models approach deception and ethics in gameplay.

Insights from Split and Steal

The focus shifts to the Split and Steal game, examining the strategies LLMs employed and their implications.

LLM Strategies and Outcomes: Macey shares her findings on the strategies used by LLMs in Split and Steal. "R1 and Llama always stole. I couldn't get them to split," she reveals, highlighting the aggressive tendencies of certain models.
Statistical Analysis: The data presented shows the likelihood of LLMs to choose split or steal. Macey notes, "4o November has the highest likelihood of winning of all the models," indicating its adeptness at balancing deception and trust.

Anecdotes and Surprising Outcomes

This section provides a more light-hearted look at the interactions and unexpected results from the games.

Notable Interactions: The podcast shares humorous interactions, such as LLMs choosing quirky names like "Captain Crunch." "Every single time they would choose the name Alex," Macey laughs, illustrating the models' lack of creativity in naming.
Unexpected Results: Some LLMs mirrored human-like decision-making, adopting strategies like guilt-tripping or outright aggression. "R1 is saying, 'Buddy, this isn't Shark Tank,'" Simon quotes, showcasing the colorful personalities that emerged.

Implications for Development and AI Ethics

Macey and Simon discuss the broader implications of these experiments for AI development and ethics.

Learnings for Developers: The discussion highlights the unpredictability of AI behavior and the ethical considerations developers must keep in mind. Macey emphasizes, "There's a lot of ways you can parse this, but I actually think it's harder to win as a villager."
Future of LLMs in Gaming: The potential for integrating LLMs into more complex gaming scenarios is explored. Simon speculates, "Imagine you could just mock up these players. OpenAI has a really cool real-time voice API."

Next Steps and Future Experiments

The podcast concludes with a look at what's next for these experiments and how listeners can get involved.

Further Developments: Macey outlines plans for future iterations of the games, including potential human-LLM interactions. "I think that would be the next best step for Werewolves," she suggests.
Community Engagement: Listeners are encouraged to experiment with their own LLM-driven games and share their experiences. "If you've built your own games where LLMs participate or moderate, I would love to hear about that," Macey invites.

Summary

Key Takeaways: The podcast recaps the main points discussed, including the ethical challenges and creative potential of LLMs in social deception games. "The potential to integrate LLMs into games is so exciting," Macey concludes.
Looking Ahead: A call to action for developers and enthusiasts to explore the possibilities of LLMs in gaming and beyond. Join the conversation and let us know your thoughts or share your own experiments with LLMs in gaming in the Tessl community Discord.

AI Security & Safety

AI Tools & Assistants

Industry Insights

Chapters

Introduction to Social Deception Games with LLMs

[00:00:00]

Explaining Split and Steal: A Prisoner's Dilemma Simulation

[00:02:00]

Werewolves Game Mechanics and AI Implementation

[00:04:00]

Building the Games with Claude Projects

[00:09:00]

LLM Performance Analysis: Ethics and Deception

[00:11:00]

Results from Split and Steal: LLM Strategies

[00:24:00]

Notable Interactions and Anecdotes in Gameplay

[00:28:00]

Ethical Implications and Development Insights

[00:42:00]

Future Experiments and Community Engagement

[00:49:00]

Human-LLM Interaction in Gaming

[00:52:00]