Full project page coming soon

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

Kaleb Newman · Tyler Zhu · Olga Russakovsky

Princeton University

Video diffusion models commit to a high-level motion plan within the first few denoising steps when solving mazes. We exploit this early plan commitment to build ChEaP, which screens diverse candidate trajectories early and chains them together to solve mazes far beyond the single-generation horizon—improving accuracy from 7% to 67% on long mazes and by 2.5× overall on hard tasks.

📚 arXiv

The Plan Is Decided Early

By decoding intermediate predictions during denoising, we find that the model's motion trajectory is already committed within the first few steps. The remaining steps refine visual details but almost never change the underlying route.

Step 1 / 40

Step 2 / 40

Step 5 / 40

Step 10 / 40

Step 20 / 40

Final (40 / 40)

Screening Diverse Candidate Plans

Since early plans predict final outcomes, we screen many candidates cheaply and only fully render the most promising ones. Different noise seeds produce strikingly different trajectories on the same maze—failed paths appear as gray silhouettes, the successful trajectory in vivid color.

4×4 Maze

6×6 Maze

8×8 Maze

+12%

avg. accuracy gain across
all maze sizes over best-of-N

7%→67%

accuracy on long-horizon
mazes

0.3×

the diffusion steps needed
to match best-of-N

BibTeX

@article{newman2025videomodelsreason,
  title     = {Video Models Reason Early: Exploiting Plan
               Commitment for Maze Solving},
  author    = {Newman, Kaleb and Zhu, Tyler and
               Russakovsky, Olga},
  journal   = {arXiv preprint},
  year      = {2026}
}