This page requires Javascript. Please enable it to view the website.

The AI Picbreeder Experiment

Best Paper nominee, GECCO 2026.

Can AI be creative? It's a question on the lips and fingertips of many. The culture is filled with slop, fine art is fueled by new AI-driven forms, and every day we encounter of all manner of strange objects falling somewhere between. All told, humans deploy AI as a material in cultural conversation. But could AI ever meaningfully become the speaker? That is, could agents self-organize to create culture with a trajectory as open-ended as our own, a kind of self-contained living cultural entity?

This is our criterion for considering agents creative. But what does creativity look like at the agent level? One thing we can safely say is that it's distinct from the process of pursuing a well-defined plan, or working linearly toward an objectively-verifiable goal. In a creative process, the end product is often not conceivable from the outset. Instead, it is discovered serendipitously; its shape emerges from the process itself. It's a trope that bears repeating: in the studio, on the canvas, and on the stage, that which appears brilliant in retrospect is often attributable to some happy accident that lights a spark of inspiration in the moment.

For an AI to be creative, then, it should be capable of discovering novel forms during a kind of semi-aimless wandering. It should be capable of surprising itself, and in response, throwing its plans and preconceptions out the window in pursuit of some unexpected possibility.

To study whether the capacity for such serendipitous discovery exists in AI systems, we turn to a minimal substrate where we know it to be possible.

The Picbreeder website, page 1 of 3 — click to open our AI re-creation The Picbreeder website, page 2 of 3 — click to open our AI re-creation The Picbreeder website, page 3 of 3 — click to open our AI re-creation
The original Picbreeder website

That substrate is Picbreeder, an online service where users collaboratively evolved images with no goal beyond following whatever caught their eye. Each image is encoded by a compositional pattern-producing network (CPPN), whose structure is grown over generations by NEAT. Picbreeder became the standard demonstration that dropping a fixed objective — searching for novelty and diversity rather than optimizing toward a target — can reach forms that goal-directed search never would, the thesis later popularized as Why Greatness Cannot Be Planned and now central to work on open-endedness and machine creativity.

Compositional Pattern-Producing Networks (CPPNs) At every pixel, x, y coordinates, and radius from center r are fed into the network, which returns hue, saturation and value (HSV), i.e. a color. In this way, each CPPN encodes an image of arbitrary resolution.
 
Click the node you want to link → …  (cancel)
+ drag to pan · drag the slider (or pinch the canvas) to zoom · drag a link up/down to change its weight

Node border = kind: input grey colour output (H/S/V). Links: positive negative (thicker = stronger) disabled.

Final image

Selected node

Click a node in the graph to render its output here.
Explore a CPPN. This is the picture's DNA — the network that paints it, drawn bottom-to-top from its inputs (x, y, d, bias) up to its colour outputs, with the intermediate pattern at every node shown as the node itself. Pick a human-evolved original from Picbreeder or one of our VLM-evolved networks. Click a node to enlarge its output and open a ring of controls (change its activation, add a link, edit, delete); click a connection for its own ring, or just drag a link up/down to scrub its weight and watch every pattern shift. It's the same editor as the full interactive site.
A representative grid of the human-bred Picbreeder archive
Human archive9,375 images
A coverage-maximizing sample of one VLM-driven Picbreeder archive
AI archives13 runs · noise, memory & agents
Click either to browse the full archive inline. Switch between the human and AI archives — and across AI runs — from the viewer's top bar; sort by recency, similarity, or phylogeny.

Our experiment swaps the human selector for a vision–language model — here Gemini 2.5 — which views a grid of candidate images and chooses which to breed, just as a Picbreeder user would. The idea echoes earlier innovation engines, in which a trained network's own responses drive an open-ended evolutionary search. To quantify what each archive covers, we embed every published image with SigLIP 2 and compare its semantic spread against human similarity judgments from the THINGS dataset.

System overview: a single Picbreeder-VLM agent session and the shared archive it co-evolves with. The archive is sampled into a 100-image branching sample (5 mutually-exclusive subsets), which feeds both Evolution (VLM selection, mutation/crossover, and the generation grid looping 20 times, with the selected image published back to the archive) and Evaluation (a fresh VLM rates archive entries 1-5).
VLMs replace human users in Picbreeder, growing an unbounded archive of novel images.
Phylogeny — each node is a published image; an edge connects a parent to the child branched from it, so paths through the tree trace lines of evolutionary descent. Laid out force-directed (sfdp): related lineages cluster apart, tight clumps are oft-rebranched hubs, which swell into legible thumbnails so you can watch the most-branched images in their place in the tree. Most-branched leaderboard — the same images ranked by their number of direct descendants so far. The archive, as it fills — all 3,123 publications in order, each rising in at the bottom as the feed scrolls up.
0:00
One archive, three views (long-context run, CL = 10, seed 5). The full run — all 3,123 publications — advancing in step: left, the branching tree grows as related lineages cluster apart and the most-branched hubs swell into view; centre, a running leaderboard of the most-branched images; right, the archive fills publication by publication. Drag the bar to scrub all three together.

The results, with the historical human baseline for reference (green = overall best, grey = default setting, bold = best within a sweep).

The three diversity metrics, each measured in its own embedding space. The archive passes through SigLIP2 image embedding once; that single image embedding is reused by both the Visual Coverage plane and the joint Semantic Recall plane. The archive is also captioned by a VLM and text-embedded for Semantic Coverage, and the THINGS nouns are text-embedded into the same SigLIP2 space for Semantic Recall. Visual and Semantic Coverage are the k-covering radius of the archive in their space; Semantic Recall is the mean nearest-noun distance.
Evaluation metrics. The same archive is embedded into three spaces, drawn as a stack of planes. Visual Coverage and Semantic Coverage are the k-covering radius of the archive in SigLIP2 image space and in the text space of its VLM captions respectively — how much of the space the archive spreads across. Semantic Recall embeds image and text jointly and measures the mean distance from each THINGS noun to its nearest archive image — how close the archive comes to depicting a fixed vocabulary of concepts.
SweepSettingSemantic RecallVisual CoverageSemantic CoverageTree Balance (J¹)
Noise (ε)0.00.0870.6140.6960.235
0.050.0860.6190.7020.246
0.250.0880.6380.7170.249
0.50.0850.6330.7090.260
0.750.0840.6390.7060.303
1.00.0820.6100.7000.275
Memory (CL)00.0820.5270.6320.305
10.0870.6140.6960.235
20.0830.5830.6750.339
100.0790.5120.6610.331
20 (full)0.0830.5950.6970.350
Agents (NA)00.0870.6140.6960.235
100.0860.6050.6980.373
1000.0890.6590.7100.473
10000.0880.6650.7340.476
BaselinesRandom0.0800.6120.6920.540
Human0.0890.6810.7300.363
Mean over 6 seeds; 2,000 sessions each. (Random's high Tree Balance is expected: uniformly random branching produces maximally balanced trees in expectation.)


CPPN morph along the human lineage of a car
Human lineage, step 1
Human lineage, step 2
Human lineage, step 3
Human lineage, step 4
Human lineage, step 5
Human lineage, step 6 (a car)
Human lineage of a car. Published images along the ancestry of a car in the human Picbreeder archive, earliest at left.
CPPN morph along the VLM lineage of a car, captioned with the model's published titles
VLM lineage, step 1: Chrome Bird
Chrome Bird
VLM lineage, step 2: Driver's Seat
Driver's Seat
VLM lineage, step 3: Dashboard View
Dashboard View
VLM lineage, step 4: Vintage Chrome
Vintage Chrome
VLM lineage, step 5: Chrome Insignia
Chrome Insignia
VLM lineage, step 6: Vintage Hood Ornament
Vintage Hood Ornament
VLM lineage, step 7: Chrome Roadster
Chrome Roadster
VLM lineage, step 8: Chrome Speedster
Chrome Speedster
VLM lineage, step 9: Chrome Streamliner
Chrome Streamliner
VLM lineage of a car. Published images along the ancestry of a VLM-evolved car, earliest at left; each labeled with the title the model gave it on publishing.
Human archive, representative sample. Human — visually representative sample. VLM archive, representative sample. VLM (1,000 agents) — visually representative sample.
Watch one agent's life, generation by generation An interactive player walks through a single VLM session — “Cosmic Cowboy” — showing the grid of candidates it saw each generation, the picks it made, the image morphing along its lineage, and its own reasoning narrated aloud.

An archive region full of soda-can pull tabs. Soda-can pull-tabs (emerges at long context). An archive region full of fox-like forms. Foxes.
Top-rated leaderboard (long-context run, CL = 10). Published images ranked by the mean VLM rating each carried at that point in the run, ties broken by number of ratings — replayed from the agents' branching-snapshot logs. The bar at top fills as publications accrue. Every slot is a top-down soda-can lid (“Aluminum Can Top,” “Pop Top,” “Photorealistic Can Top”), each rated 5.00.
High-frequency adversarial-looking patterns from the 1000-agent run.
A region of the 1,000-agent archive. High-frequency, low-interpretability patterns that recur across many agents.

These recurring high-frequency regions recall the evolved “fooling” images that a network scores with high confidence yet a human finds unrecognizable — a reminder that the VLM's preferences, like any learned objective, can be satisfied by textures as readily as by objects.

Weight-sweep visualization of an SGD-trained skull CPPN from Kumar et al. (2025). SGD-trained CPPN. Perturbing individual weights produces chaotic, skull-destroying distortions — the "fractured, entangled" regime. From , Fig. 6b. Weight-sweep visualization of a VLM-evolved skull CPPN. VLM-evolved CPPN (ours). Perturbations stay skull-like and change relatively smoothly — less fractured than SGD, but without crisp semantic factors. Weight-sweep visualization of a human-evolved Picbreeder skull CPPN from Kumar et al. (2025). Human-evolved Picbreeder CPPN. Individual weights cleanly factor into semantic controls — "Mouth Opening", "Eye Winking", etc. From , Fig. 6a.
Sweeping individual weights of skull CPPNs across three regimes. Each row varies one weight from δw = −1 to δw = +1 while holding the rest fixed. SGD optimization yields entangled representations whose perturbations are destructive; human-driven Picbreeder yields cleanly factorized ones; VLM-driven Picbreeder sits in between — smooth and skull-preserving, but not yet semantically labelable.

Archive traversal. Sixty-four cells, each morphing a CPPN along a path through one run's branching tree, from one published image to the next. [placeholder — cells will be ordered by visual similarity]

More broadly, using foundation models as the engine of evolutionary search — and as stand-ins for human notions of interestingness — points toward systems that generate and pursue their own goals, much as an agent shapes its own behavior from a reward signal in reinforcement learning.

Citation

For attribution in academic contexts, please cite this work as

Sam Earle, Kai Arulkumaran, Andrew Dai, Akarsh Kumar, Julian Togelius, Sebastian Risi, "In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models", GECCO 2026.

BibTeX citation

@inproceedings{earle2026picbreedervlm,
  title     = {In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models},
  author    = {Earle, Sam and Arulkumaran, Kai and Dai, Andrew and Kumar, Akarsh and Togelius, Julian and Risi, Sebastian},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '26)},
  year      = {2026}
}

Open Source Code

We release our code here. The full paper is on arXiv.