2nd GenAI Media Generation Challenge Workshop @ CVPR2025
Workshop Overview
This year, we are excited to host the 2nd GenAI Media Generation Challenge Workshop at CVPR 2025. Building on the success of last year's event, which focused on text-to-image and image editing tasks, we are expanding the challenge to include video generation.
We are proud to announce the launch of the 2nd GenAI Media Generation Challenge (MAGIC), featuring a media generation track and auto-evaluation track:
- Media Generation Festival: For the first time, we are organizing a media generation festival with no restrictions on prompts. We would define a few different topics for which submitted media would compete in, and participants can submit their best generated videos or images for those specific topics. For each topic, we run a crowd sourced voting mechanism to determine the winners for each topic .
- Auto Evaluation Challenge: We are introducing an auto evaluation challenge for both text-to-image and text-to-video tasks. Participants can develop and submit their auto evaluation score for a preselect set of images and videos that we will provide and enter into the media generation festival track. Auto evaluation submissions would be to predict the outcomes from the crowd sourced voting mechanism in the media generation festival The auto evaluation method that achieves the best correlation with the final results would be the winners for this challenge..
Submission for "Challenge A - Media Generation Festival" is Open!
Click the link below to share your generated images or videos with us. See Submission Instruction.
Submit NowSubmission for "Challenge B - Auto Evaluation" is Open!
Click the link below to share your auto eval results with us. See Submission Instruction.
Submit NowChallenges Overview
Challenge A - Media Generation Festival
Track A.1 - Video (Short)
For this task, there are no restrictions on prompts or models but limited to based on topics. Participants are free to use any models, including third-party video generation tools, and are encouraged to design their own creative prompts to produce engaging videos. Submissions can be single topic submissions or multiple topic submissions. However, the video length is limited to a maximum of 10 seconds.
For this track, we will have the following set of of topics:
- People
- Animals
- Landscape
Track A.2 - Video (Long)
Similar to track A.1, there are no restrictions on prompts or models but limited to based on topics. For this track though, the video length is limited to a maximum of 5 minutes.
For this track, we will have the following set of of topics:
- Action
- History
- Sci-Fi
- Fantasy
- Comedy
Track A.3 - Images
For this task, there are no restrictions on prompts or models but limited to based on topics. Participants are free to use any models, including third-party video generation tools, and are encouraged to design their own creative prompts to produce engaging videos.
For this track, we will have the following set of of topics:
- People
- Animals
- Landscape
Evaluation Protocol
All submitted video/images will be uploaded to our crowdsourcing platform for public voting. The top three submissions with the highest votes for each topic will be declared winners. Beyond the winners per topic, we also have joint submission winners where we would have the best overall performing solution. We would use the Elo ranking system to compute the final ranking between different models.
On automatic evaluation, we will leverage existing methods that have developed automatic metrics to help in assessing the outputs of the image based on the prompt.
For the voting setup, we would use a pairwise comparison setup, showing two videos or images along with the topic, and ask the question "Which one would you prefer?". The selections would be 3-scale win/tie/lose rating.
Submission Instruction
For this challenge, we do not have much limitations excepts: categories (see above) and the video length (less than 10s for Track A.1, 5mins for Track A.2). Please upload your videos or images indicating the category and track. We do not limit the number of submissions of each team.Challenge B - Results Prediction with Auto Evaluation
In this challenge, we will use provided (text, image) pairs and (text, video) pairs for which participants would run their own auto evaluation metrics on. Participants will submit scores of these media which can be used to rank different media, or simply provide a ranking of all the videos in their submission.Track B.1 - Video Generation Auto Eval
For this track, we use Movie Gen Video Bench for benchmarking. Each participant will be asked to download the 1003 videos and prompts from Movie Gen Video Bench and run their auto eval models to get a ranking of the 1003 videos.
Track B.2 - Image Generation Auto Eval
For this track, we use Emu_1k for benchmarking. Each participant can download the 1000 emu-generated images and prompts from the benchmark and run their auto eval models to get a ranking of the 1000 images.
Evaluation Protocol
All submitted video/images will be uploaded to our crowdsourcing platform for public voting. The top three submissions with the highest votes for each topic will be declared winners. Beyond the winners per topic, we also have joint submission winners where we would have the best overall performing solution. We would use the Elo ranking system to compute the final ranking between different models.
On automatic evaluation, we will leverage existing methods that have developed automatic metrics to help in assessing the outputs of the image based on the prompt.
For the voting setup, we would use a pairwise comparison setup, showing two videos or images along with the topic, and ask the question "Which one would you prefer?". The selections would be 3-scale win/tie/lose rating.
Submission Instruction
To participate in Track B.1, please follow these steps:
- Download the 1003 videos, indexed from 0 to 1002, from the Movie Gen Video Bench.
- Run your auto-evaluation model to generate the rankings for these videos. You can also utilize the prompts and meta information provided in the Movie Gen Video Generation benchmark to enhance your evaluation.
- To submit your results, prepare a text file with 1003 lines. Each line should represent the ranking of the corresponding video, starting with the first video, "0.mp4".
217 0 ... 115In this submission,
- The first video (
0.mp4
) ranks 218th, so the first line is217
. - The second video (
1.mp4
) ranks 1st, so the second line is0
. - The last video (
1002.mp4
) ranks 116th, so the last line is115
.
For Track B.2, please follow these instructions:
- Download the "images.zip" file from the Emu 1k benchmark.
- Inside "images.zip", you will find 1000 images along with their corresponding prompts. For example, "000000.jpg" has its prompts in "000000.txt".
- To submit your results, create a text file with 1000 lines. Each line should indicate the ranking (starts with 0) of the corresponding image, starting with "000000.jpg".
217 0 ... 115In this submission,
- The first image (
000000.jpg
) ranks 218th, so the first line is217
. - The second image (
000001.jpg
) ranks 1st, so the second line is0
. - The last image (
000999.jpg
) ranks 116th, so the last line is115
.
Leaderboard
Important Dates
Description | Date |
Submission opens for all Challenges. | 3/3/2025 |
Submission closes for Challenge A. | 4/14/2025 |
Crowd-sourced polling opens | 4/21/2025 |
Submission closes for Challenge B. | 6/2/2025 |
Crowd-sourced poll ends. | 6/2/2025 |
Workshop Date | TBD |
Workshop Schedule
Invited Speakers
Björn Ommer Dr. Björn Ommer is a full professor at LMU where he heads the Computer Vision & Learning Group (previously Computer Vision Group Heidelberg). Before he was a full professor at the Department of Mathematics and Computer Science of Heidelberg University and also served as a one of the directors of the Interdisciplinary Center for Scientific Computing (IWR) and of the Heidelberg Collaboratory for Image Processing (HCI). He has served as program chair for GCPR, as Senior Area Chair and Area Chair for multiple CVPR, ICCV, ECCV, and NeurIPS conferences, and as workshop and tutorial organizer at these venues.
Jun-Yan Zhu Dr. Jun-Yan Zhu is an Assistant Professor with The Robotics Institute in the School of Computer Science of Carnegie Mellon University. He also holds affiliated faculty appointments in the Computer Science Department and Machine Learning Department. He studies computer graphics, computer vision, and computational photography. Prior to joining CMU, he was a Research Scientist at Adobe Research. He did a postdoc at MIT CSAIL, working with William T. Freeman, Josh Tenenbaum, and Antonio Torralba. He obtained my Ph.D. from UC Berkeley, under the supervision of Alexei A. Efros. He received his B.E. from Tsinghua University, working with Zhuowen Tu, Shi-Min Hu, and Eric Chang.
Sergey Tulyakov Dr. Sergey Tulyakov is a Principal Research Scientist heading the Creative Vision team at Snap Research. His work focuses on creating methods for manipulating the world via computer vision and machine learning. This includes 2D and 3D methods for photorealistic object manipulation and animation, video synthesis, prediction and retargeting. His work has been published as 30+ top conference papers, journals and patents resulting in multiple tech transfers, including Snapchat Pet Tracking and Real-time Neural Lenses (gender swap, baby face, real-time try-on and many others).
Saining Xie Dr. Saining Xie is an Assistant Professor of Computer Science at NYU Courant and part of the CILVR group. He is also affiliated with NYU Center for Data Science. Before that I was a research scientist at Facebook AI Research (FAIR), Menlo Park. He received my Ph.D. and M.S. degrees from CSE Department at UC San Diego, advised by Zhuowen Tu. During his PhD study, he also interned at NEC Labs, Adobe, Facebook, Google, DeepMind. Prior to that, he obtained his bachelor degree from Shanghai Jiao Tong University. His primary areas of interest in research are computer vision and machine learning.
Organizers

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI

Meta AI
Senior Advisors

Meta AI

Meta AI

Snap

Meta AI

Meta AI

Meta AI
Contact
To contact the organizers please use gen_ai_media_generation_challenge_cvpr_workshop@meta.com
Acknowledgments
Thanks to languagefor3dscenes for the webpage format.