A multi-agent system that writes, directs, and produces comedy sketch videos — from character references to final cut — rivaling professionally produced content.
We propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of agents loosely based on real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction of LLM critics aligned with real viewer preferences through the analysis of a corpus of comedy videos on YouTube to automatically evaluate humor. Our experiments show that our framework produces results approaching the quality of professionally produced sketches while demonstrating state-of-the-art performance in video generation.
Each character is defined by a reference portrait, a voice sample, and a personality description — the system handles the rest.
COMIC is loosely modeled on human production studios, with agentic counterparts for each role — writers, critics, and directors. Two core loops drive quality: an island-based writing loop for scripts and a rendering loop for video, each using competition and iteration to produce breadth and depth of output.
All sketches below were fully generated by COMIC — scripts, voices, visuals, and editing — with zero human intervention.
| Method | Funniness ↑ | Watch More ↑ | vs. Human ↑ | Script ↑ | Narrative ↑ | Realism ↑ | Consistency ↑ |
|---|---|---|---|---|---|---|---|
| Veo 3.1 | 2.32 | 2.36 | 2.27 | 2.18 | 3.32 | 4.91 | 5.05 |
| Sora 2 | 2.73 | 2.73 | 2.32 | 2.45 | 3.36 | 5.73 | 5.50 |
| VGoT | 1.18 | 1.27 | 1.14 | 1.00 | 1.23 | 2.00 | 2.32 |
| MovieAgent | 1.27 | 1.09 | 1.18 | 1.09 | 1.09 | 1.27 | 1.14 |
| COMIC (Ours) | 3.45 | 3.09 | 3.05 | 3.32 | 4.50 | 4.27 | 4.50 |
| Method | Single Best | Channel-Wise Best | ||||
|---|---|---|---|---|---|---|
| Win Rate | Inter-Diversity | Intra-Diversity | Win Rate | Inter-Diversity | Intra-Diversity | |
| Veo 3.1 | 0.010 | 0.308 | 0.369 | 0.105 | 0.263 | 0.360 |
| Sora 2 | 0.075 | 0.531 | 0.722 | 0.175 | 0.310 | 0.563 |
| VGoT | 0.000 | 0.000 | 0.000 | 0.010 | 0.105 | 0.189 |
| MovieAgent | 0.000 | 0.000 | 0.000 | 0.130 | 0.088 | 0.180 |
| COMIC (Ours) | 0.440 | 0.780 | 0.682 | 0.390 | 0.519 | 0.693 |
Performance improves as the island-based competition loop iterates. Metrics are computed against the initial scripts.
Removing the critic-guided refinement loop produces noticeably weaker sketches. Compare the outputs below — the same characters and prompts, with and without critic feedback.
Unlike structured domains such as mathematics or coding, comedy has no fixed reward signal—its criteria are shifting, making it a compelling proxy for many open-ended, real-world problems. COMIC’s improvements emerge without parameter updates, gradient-based optimization, or a fixed reward function, suggesting promising directions for other creative domains. For full details, please see our paper.
@article{hong2026comic,
title={COMIC: Agentic Sketch Comedy Generation},
author={Hong, Susung and Curless, Brian and Kemelmacher-Shlizerman, Ira and Seitz, Steve},
journal={arXiv preprint arXiv:2603.11048},
year={2026}
}