research | Sam Earle

You can check out my full academic Research Statement. I’ll summarize it here in more casual terms.

My high-level goal is to build more generally capable AI. Lately, I’m interested in using LLMs inside open-ended loops for generating video games. The idea is to generate environments that maximize difficulty/complexity with respect to some AI player agents, and use V/LLMs guide the process toward agent-environment dynamics that are interpretable for humans.

the future

Because I find myself generally uncertain about the near future of AI—and in particular how capable large pretrained models like LLMs are or will be—I imagine this research direction leading to three different possible outcomes.

Outcome A: LLMs are generally intelligent. We plug them into our various game-generating loops—have them write code in domain specific languages for describing game mechanics, have them pipe together various tools for symbolic Procedural Content Generation, have them iterate by analyzing playtraces generated by a diverse array of AI players—and we have ourselves a fountain of AAA immersive blockbusters, of endlessly deep sandbox simulations, of fiendishly tricky puzzle games. The question becomes how to control this sytem’s output, how to hang on to the human element of games as an art form, and how best to use it to augment our own creativity. If nothing else, video games can act as a rich expressive medium for this new intelligence, one through which we can come to better understand it.
Outcome B: LLMs are halfway there, but lack a special kind of intelligence that is uniquely afforded by interacting with an environment. We address this by using LLM-driven open-ended loops to bootstrap the LLMs’ own intelligence. The system generates a diverse array of environments that provide tasks that are both challenging for embodied agents and human-interpretable. We use these environments to train or fine-tune large (language/vision/behavior) models (by casting them as players inside these games). This makes them smarter, more worldly, better at generalizing to novel tasks. This means that we can use them to generate a still broader array of diverse environments, which can then be used to train the next generation of more worldly agents, and so on, until our large models—trained at first on entirely static data generated by humans—have made use of all of our tools for generating realistic/complex/puzzling environments (borrowed conveniently from the games industry) to teach themselves how to be in the world.
Outcome C: LLMs are fundamentally flawed. The representations they have learned from our mess of data on the internet are irreperably broken, and no amount of playing their own games is going to fix them, because the games they generate will always be broken and wrong on some level, lest we game AI people manually lay down guardrail after guardrail until we wind up with basically another family of PCG algorithms, one in which a noisy ghost rattles along some more traditional symbolic tracks. So we do the opposite: we remove the guardrails, strip away as many semantic assumptions as possible, and have the system generate games closer to SimCity, The Sims, or Conway’s Game of Life—simulated soups from which novel forms of natively-embodied intelligence might emerge. We make the mechanical primitives of this soup fast and scalable, so that we can run it for a long time, and render it at various spatial resolutions and timescales. VLMs don’t need to understand how this soup works at a low level—maybe we’ll use more naive means of searching over its various configurations—but they can give us a signal when interesting things start to happen at multiple scales, can perhaps recognize when something resembling “agent” has emerged, or when it has begun to behave in a way that looks intentional. The key to the next generation of Artificial Intelligence might lie in these simulations of Artificial Life.

Those are, broadly speaking, the ways in which I see my work branching out into the future. In reality, I don’t think they’re mutually exclusive. They will probably happen in parallel, each suffering many interesting complications along the way.

the past

How did I get here? Laterally, crab-like, through the convergent evolution of investigatory branches stretching past-wards. My research thus far can be divided along three primary lines:

Thrust 1: Procedural Content Generation via Reinforcement Learning. PCG is tricky business. A lot of good-old-fashioned PCG is constructive in nature, requiring designers to bake their domain-specific knowledge into a low-level recipe for how content should be generated (e.g. “do a random walk to dig corridors in a dungeon, place a room with some probability, place enemies of treasure within this room with some other probability, etc.”). More recent PCG via Machine Learning methods learn from large datasets of human authored game content, but cannot guarantee that newly-generated content will be functional (e.g. is the level solvable? Is it challenging?) because they are not trained to distinguish aesthetic from functional patterns in their training data. Search-based PCG can optimize directly for such functional metrics, but takes a long time to search for individual valid artifacts, making it infeasible for use at run-time. With PCGRL, we use RL to train content-generating agents that learn to satisfy these functional metrics under various conditions during training, so that they can efficiently generate functional content at run-time while adapting to new constraints. We can train generators to be controllable, diverse, to operate in 3D. We find that by making these agents local and distributed, we force them to self-organize, to learn more general strategies, and make them more scalable to larger and more complex domains.
Thrust 2: Open-Ended Learning for Robust Embodied Agents. PCGRL uses fixed, search-based player agents to measure the complexity of generated game content. But this approach will not scale to more complicated game mechanics. Prior work in Unsupervised Environment Design shows how we can evolve a curriculum of levels for training robust player agents. In Autoverse, we push this idea further, and evolve levels and mechanics over a cellular automaton-like substrate and domain-specific language for grid-based game environments, implemented as a collection of neural cellular automata in JAX to make training highly efficient. We find that in this radically unconstrained search-space, UED’s value-based proxies of player regret are unstable. But by warm-starting RL players via imitation learning on search-based trajectories, we see clear gains in terms of generalization. And we imagine that further integrating learned policy and value networks with heuristic-driven tree search could allow us to derive more scalable and general measures of environment complexity in open-ended learning loops.
Thrust 3: Semantic Guidance in Environment Generation via Large Pretrained Models. A pitfall of letting go of semantic assumptions and searching through an unconstrained space of possible environments is that it becomes difficult to relate to the agents we’re training. Sometimes what they’re doing looks interesting, almost meaningful, but often it looks alien, even nonsensical. So I’ve also considered ways in which we guide environment generation with large pretrained multi-modal models that contain useful priors about what might be interpretable to humans. Broadly speaking, we might e.g. use such models as part of our fitness function over environments (as in DreamCraft’s repurposing of a text-guided NeRF with a VLM-based loss function to generate MineCraft layouts), or as part of our mutation operator (as in DreamGarden’s dynamic orchestration of VLMs to generate arbitrary simulations in Unreal Engine). To this end, in PuzzleJAX we extend Autoverse to be interoperable with PuzzleScript, a popular DSL and engine for grid-based games. The space of possible environments is similarly unconstrained, centered around symbolic, CA-like pattern rewrite rules, but we gain the benefit of a dataset of thousands of time-tested, human-authored games and levels, visually interpretable sprites, and some additional primitives and control flow that have proven useful to real designers while at the same time remaining as concise as possible.

At the Game Innovation Lab, we’re experimenting with VLM-guided open-ended learning loops in PuzzleJAX.

At Sakana AI, we’re examining the behavior of VLMs within a pared-down and canonical substrate for open-endedness—Picbreeder—and comparing their output with historical human baselines.