COMIC: Agentic Sketch Comedy Generation

Abstract

We propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of agents loosely based on real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction of LLM critics aligned with real viewer preferences through the analysis of a corpus of comedy videos on YouTube to automatically evaluate humor. Our experiments show that our framework produces results approaching the quality of professionally produced sketches while demonstrating state-of-the-art performance in video generation.

Featured Sketch

The Conference T-Shirt Mafia

The Cast

Main Characters

Each character is defined by a reference portrait, a voice sample, and a personality description — the system handles the rest.

Haedol Park

A typical PhD student in computer vision and generative AI. Always drinks lattes, finding Americanos too dull. Enjoys attending conferences and collecting free t-shirts.

Melissa Swift

A devilish fashion magazine editor who wears Prada and treats assistants like disposable coffee cups. Can destroy careers with a whisper.

Eric Cross

An ambitious field operative who treats impossible missions like grocery shopping. Runs everywhere because walking is for quitters. Regards physics as a mere suggestion.

Rex Ironbolt

A beer-powered robot and self-proclaimed greatest research assistant. Spends more time stealing lab equipment than doing research. Hobbies include gambling and bending things.

Pipeline

How COMIC Works

COMIC is loosely modeled on human production studios, with agentic counterparts for each role — writers, critics, and directors. Two core loops drive quality: an island-based writing loop for scripts and a rendering loop for video, each using competition and iteration to produce breadth and depth of output.

🎭

Characters

Portraits, voices & personas

🔍

Humor Critics

YouTube-aligned preference alignment

✍️

Writing Loop

Island-based script competition & refinement

🎬

Scene Director

Shot breakdown with continuity

🎥

Rendering Loop

Iterative video generation & critic refinement

🎞️

Final Sketch

Assembled comedy video

01

YouTube-Aligned Critics

Humor critics are derived by analyzing a corpus of YouTube comedy sketch videos and their viewer engagement, enabling automatic evaluation that correlates with real audience preferences.

02

Writing Loop

Multiple distinct islands of scripts are maintained, each governed by critic committees representing different comedic philosophies. Scripts improve through round-robin tournaments where losers are refined using winner feedback.

03

Sequential Shot Rendering

Scene directors break each script into shots with specific setups — characters, dialogue, expressions, and backgrounds. Shots are produced consecutively, referencing a memory bank and previous shots for continuity.

04

Rendering Loop

Each shot is evaluated by script-conditioned rendering critics that embody diverse interpretations of the narrative, then refined based on their feedback through depth- and breadth-wise competition.

Gallery

Generated Comedy Sketches

All sketches below were fully generated by COMIC — scripts, voices, visuals, and editing — with zero human intervention.

SKETCH 01

The Free T-Shirt Fashion Week

SKETCH 02

Extreme Grocery Shopping

SKETCH 03

The Great Lab Equipment Heist

SKETCH 04

The Deep Learning Makeover

SKETCH 05

DMV Line Protocol

Evaluation

Quantitative Results

Human evaluation of baseline methods across multiple criteria (1–7 scale).
Method	Funniness ↑	Watch More ↑	vs. Human ↑	Script ↑	Narrative ↑	Realism ↑	Consistency ↑
Veo 3.1	2.32	2.36	2.27	2.18	3.32	4.91	5.05
Sora 2	2.73	2.73	2.32	2.45	3.36	5.73	5.50
VGoT	1.18	1.27	1.14	1.00	1.23	2.00	2.32
MovieAgent	1.27	1.09	1.18	1.09	1.09	1.27	1.14
COMIC (Ours)	3.45	3.09	3.05	3.32	4.50	4.27	4.50

Automated metrics (win rate and diversity scores), computed against middle-ranked professional videos, averaged across all channels. *Single Best* uses a single top critic; *Channel-Wise Best* aggregates across per-channel best critics.
Method	Single Best			Channel-Wise Best
Method	Win Rate	Inter-Diversity	Intra-Diversity	Win Rate	Inter-Diversity	Intra-Diversity
Veo 3.1	0.010	0.308	0.369	0.105	0.263	0.360
Sora 2	0.075	0.531	0.722	0.175	0.310	0.563
VGoT	0.000	0.000	0.000	0.010	0.105	0.189
MovieAgent	0.000	0.000	0.000	0.130	0.088	0.180
COMIC (Ours)	0.440	0.780	0.682	0.390	0.519	0.693

Self-Improvement

Multi-Island Evolution

Performance improves as the island-based competition loop iterates. Metrics are computed against the initial scripts.

Ablation Study

Impact of Critics

Removing the critic-guided refinement loop produces noticeably weaker sketches. Compare the outputs below — the same characters and prompts, with and without critic feedback.

OURS With Critics

ABLATED Without Critics

Discussion

Broader Implications

Unlike structured domains such as mathematics or coding, comedy has no fixed reward signal—its criteria are shifting, making it a compelling proxy for many open-ended, real-world problems. COMIC’s improvements emerge without parameter updates, gradient-based optimization, or a fixed reward function, suggesting promising directions for other creative domains. For full details, please see our paper.

Citation

BibTeX

@article{hong2026comic,
  title={COMIC: Agentic Sketch Comedy Generation},
  author={Hong, Susung and Curless, Brian and Kemelmacher-Shlizerman, Ira and Seitz, Steve},
  journal={arXiv preprint arXiv:2603.11048},
  year={2026}
}