MUSTCV 2026 : The 5th Workshop on Computer Vision for Multimedia Spatial Intelligence through Time (@ECCV)

posted by organizer: gianlucama || 258 views || tracked by 1 users: [display]

MUSTCV 2026 : The 5th Workshop on Computer Vision for Multimedia Spatial Intelligence through Time (@ECCV)

Link: https://sites.google.com/view/mustcv-2026

When	Sep 8, 2026 - Sep 9, 2026
Where	Malmö, Sweden (ECCV)
Submission Deadline	Jul 6, 2026
Notification Due	Jul 23, 2026
Final Version Due	Aug 7, 2026

Categories computer vision spatial intelligence 3d reconstruction ECCV

Call For Papers

In Augmented Reality (AR), Virtual Reality (VR), and spatial computing, computer vision connects digital and physical realities. Understanding, generating, and interacting with complex 3D environments pushes immersive technologies forward. Furthermore, integrating the temporal dimension to handle dynamic, evolving scenes (4D) is rapidly emerging as the crucial next frontier.

Past advancements in multimodal foundation models improved image and video processing, creating a solid baseline. The current challenge is translating these successes into robust Multimedia Spatial Intelligence. This involves interpreting and generating rich spatial data (3D) while accounting for its evolution over time (4D). Integrating diverse inputs (text, audio, and video) allows us to seamlessly create, modify, and interact with these spatio-temporal environments.

To advance these frontiers, we invite authors to submit original research to the MUSTCV 2026 workshop across our archival and non-archival tracks.

The fifth edition of the MUSTCV workshop (formerly CV4Metaverse ) explores the mechanics of spatial and dynamic computing, emphasizing 3D spatial intelligence, cross-modal multimedia generation, and temporal dynamicity.

The areas of interest touch upon, but are not confined to, the following subjects:

- Spatial and Dynamic Scene Understanding: Methods for continuous interaction in static 3D and dynamic 4D environments. Includes spatiotemporal modeling (e.g., 3D/4D reconstruction, depth estimation, tracking)

- Cross-Modal 3D/4D Generation and Synthesis: Utilizing text, audio, and video to generate, edit, or manipulate spatial scenes (e.g., text-to-3D/4D, audio-driven motion). Bridging 2D foundation models with time-aware generation

- Immersive Applications and Datasets: Novel ML applications for AR/VR, digital twins, and interactive multimedia across 3D/4D domains. New datasets and benchmarks for spatial intelligence and evolving scenes.

Invited speakers to the conference:
- prof. Angela Dai (TUM)
- prof. Andrea Vedaldi (University of Oxford)
- prof. Gerard Pons-Moll (University of Tübingen)

Related Resources

CVCI 2027 SPIE--2027 8th International Conference on Computer Vision and Computational Intelligence (CVCI 2027)

IWPR 2026 SPIE--2026 11th International Workshop on Pattern Recognition (IWPR 2026)

SPIE CVCI 2027 SPIE--2027 8th International Conference on Computer Vision and Computational Intelligence (CVCI 2027)

Ei/Scopus-DSSE 2026 2026 International Conference on Data Science and Software Engineering (DSSE 2026)

APAM 2026 2026 Asia-Pacific Conference on Artificial Intelligence and Machine Learning-EI/Scopus

Ei/Scopus-SGGEA 2026 2026 3rd Asia Conference on Smart Grid, Green Energy and Applications (SGGEA 2026)

ICCBB 2026 ACM--2026 10th International Conference on Computational Biology and Bioinformatics (ICCBB 2026)

ITCSAI 2026 15th International Conference on Information Technology Convergence, Services and AI

ICSIPA 2026 IEEE International Conference on Signal and Image Processing Applications

MECE 2026 10th International Conference on Trends in Mechanical Engineering