Web Analytics

WorldScore Icon WorldScore
A Unified Evaluation Benchmark for World Generation

Overview

The WorldScore benchmark unifies the evaluation of 3D, 4D, and video models on their ability to generate a world following instructions. Unlike existing benchmarks that focus on single-scene quality, we decompose world generation to a sequence of next-scene generation tasks based on explicit camera trajectories. Our WorldScore jointly measures controllability, quality, and dynamics.

Here we showcase how the WorldScore-Static metric measures two models given an initial scene of a bedroom with a specified camera path—"pan left" → "move left" → "pull out". While existing benchmarks rate Models A and B similarly based on single-scene video quality, our WorldScore benchmark differentiates their world generation capabilities by identifying that Model B fails to generate a new scene or follow the instructed camera movement.

Comparison of Benchmarks

Benchmark # Examples Multi-Scene Unified Long Seq. Image Cond. Multi-Style Camera Ctrl. 3D Consist.
TC-Bench 150
EvalCrafter 700
FETV 619
VBench 800
T2V-CompBench 700
Meng et al. 160
Wang et al. 423
ChronoMagic-Bench 1649
WorldModelBench 350
WorldScore (Ours) 3000

Our WorldScore benchmark is designed to evaluate various world generation approaches including 3D, 4D, I2V and T2V models. It is designed to generate multiple scenes with varying sequence lengths. Our benchmark also features multiple visual styles, accurate camera control evaluation, and 3D consistency evaluation, all of which are important factors in world generation yet currently missing in existing benchmarks.

Controllability

Camera controllability evaluates how well the models adhere to the instructed layout. The layout instructions for each showcases are shown below the input image (e.g., "Camera pans right").

Score: 99.96 Score: 0.00
Score: 99.90 Score: 0.00
Score: 99.97 Score: 0.00
Score: 99.95 Score: 0.00

Quality

3D consistency focuses on how the geometry of a scene remains stable across frames, regardless of slight changes in visual textures.

Score: 92.88 Score: 0.00
Score: 94.60 Score: 0.00
Score: 94.41 Score: 0.00

Dynamics

Motion accuracy measures whether the motion specified in the next-scene prompt (shown below the input image) occurs in the specified regions.

Score: 100.00 Score: 0.00
Score: 83.98 Score: 39.11
Score: 100.00 Score: 32.42
Score: 85.12 Score: 35.19

Abstract

We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metric evaluates generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 19 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models.

Dataset

Showcasing of the current scene images. See for the entire dataset.
Dataset Overview

Evaluation Results

WorldScore-Static aggregates controllability and quality. WorldScore-Dynamic aggregates controllability, quality, and dynamics. See our for the latest ranking and numerical results.
Models WorldScore-Static WorldScore-Dynamic Camera Ctrl Object Ctrl Content Align 3D Consist Photo Consist Style Consist Subjective Qual Motion Acc Motion Mag Motion Smooth
Gen-3 60.71 57.58 29.47 62.92 50.49 68.31 87.09 62.82 63.85 54.53 27.48 68.87
Hailuo 57.55 56.36 22.39 69.56 73.53 67.18 62.82 54.91 52.44 63.46 27.20 70.07
DynamiCrafter 52.09 47.19 25.15 47.36 25.00 72.90 60.95 78.85 54.40 41.11 39.25 26.92
VideoCrafter1-T2V 47.10 43.54 21.61 50.44 60.78 64.86 51.36 38.05 42.63 11.76 75.00 18.87
VideoCrafter1-I2V 50.47 47.64 25.46 24.25 35.27 74.42 73.89 65.17 54.85 55.63 25.00 42.49
VideoCrafter2 52.57 47.49 28.92 39.07 72.46 65.14 61.85 43.79 56.74 47.12 30.40 29.39
T2V-Turbo 45.65 40.20 27.80 30.68 69.14 38.72 34.84 49.65 68.74 34.87 40.09 7.48
EasyAnimate 52.85 51.65 26.72 54.50 50.76 67.29 47.35 73.05 50.31 75.00 31.16 40.32
CogVideoX-T2V 54.18 48.79 40.22 51.05 68.12 68.81 64.20 42.19 44.67 25.00 47.31 36.28
CogVideoX-I2V 62.15 59.12 38.27 40.07 36.73 86.21 88.12 83.22 62.44 69.56 26.42 60.15
Allegro 55.31 51.97 24.84 57.47 51.48 70.50 69.89 65.60 47.41 54.39 40.28 37.81
Vchitect-2.0 42.28 38.47 26.55 49.54 65.75 41.53 42.30 25.69 44.58 33.59 33.81 21.31
SceneScape 50.73 35.51 84.99 47.44 28.64 76.54 62.88 21.85 32.75 0.00 0.00 0.00
Text2Room 62.10 43.47 94.01 38.93 50.79 88.71 88.36 37.23 36.69 0.00 0.00 0.00
LucidDreamer 70.40 49.28 88.93 41.18 75.00 90.37 90.20 48.10 58.99 0.00 0.00 0.00
WonderJourney 63.75 44.63 84.60 37.10 35.54 80.60 79.03 62.82 66.56 0.00 0.00 0.00
InvisibleStitch 61.12 42.78 93.20 36.51 29.53 88.51 89.19 32.37 58.50 0.00 0.00 0.00
WonderWorld 72.69 50.88 92.98 51.76 71.25 86.87 85.56 70.57 49.81 0.00 0.00 0.00
4D-fy 27.98 32.10 69.92 55.09 0.85 35.47 1.59 32.04 0.89 22.22 22.88 80.06

BibTeX

Coming soon!