Open-source video generators are heating up and giving closed-source behemoths a run for their money.

They‘re more customizable, less restricted, uncensored, even, free to use—and now producing high-quality videos, with three models (Wan, Mochi, and Hunyuan) ranking among the top 10 of all AI video generators.

The latest breakthrough comes in extending video duration beyond the typical few seconds, with two new models demonstrating the ability to generate content lasting minutes instead of seconds.

In fact, SkyReels-V2, released this week, claims it can generate scenes of potentially infinite duration while maintaining consistency throughout. Framepack gives users with lower-end hardware the ability to create long videos without burning out their PCs.

SkyReels-V2: Infinite Video Generation

SkyReels-V2 represents a significant advance in video generation technology, tackling four critical challenges that have limited previous models. It describes its system, which synergizes multiple AI technologies, as an "Infinite-Length Film Generative Model."

The model achieves this through what its developers call a "diffusion forcing framework," which allows seamless extension of video content without explicit length constraints.

It works by conditioning on the last frames of previously generated content to create new segments, preventing quality degradation over extended sequences. In other words, the model looks at the final frames it just created to decide what comes next, ensuring smooth transitions and consistent quality.

This is the main reason why video generators tend to stick with short videos of around 10 seconds; anything longer, and the generation tends to lose coherence.

The results are pretty impressive. Videos uploaded to social media by developers and enthusiasts show that the model is actually pretty coherent, and the images don’t lose quality.

Subjects remain identifiable throughout the long scenes, and backgrounds don’t warp or introduce artifacts that could damage the scene.

SkyReels-V2 incorporates several innovative components, including a new captioner that combines knowledge from general-purpose language models with specialized "shot-expert" models to ensure precise alignment with cinematic terminology. This helps the system better understand and execute professional film techniques.

The system uses a multi-stage training pipeline that progressively increases resolution from 256p to 720p, providing high-quality results while maintaining visual coherence. For motion quality—a persistent weakness in AI video generation—the team implemented reinforcement learning specifically designed to improve natural movement patterns.

The model is available to try at Skyreels.AI. Users get enough credits to generate only one video; the rest requires a monthly subscription, starting at $8 per month.

However, those willing to run it locally will need a God-tier PC. “Generating a 540P video using the 1.3B model requires approximately 14.7GB peak VRAM, while the same resolution video using the 14B model demands around 51.2GB peak VRAM,” the team says on GitHub.

FramePack: Prioritizing Efficiency

Potato PC owners can rejoice, as well. There’s something for you too.

FramePack offers a different approach to Skyreel’s technique, focusing on efficiency rather than just length. Using FramePack nodes can generate frames at impressive speeds—just 1.5 seconds per frame when optimized—while requiring only 6 GB of VRAM.

“To generate 1-minute video (60 seconds) at 30fps (1800 frames) using 13B model, the minimal required GPU memory is 6GB. (Yes, 6 GB, not a typo. Laptop GPUs are okay),” the research team said in the project’s official GitHub repo.

This low hardware requirement represents a potential democratization of AI video technology, bringing advanced generation capabilities within reach of consumer-grade GPUs.

With a compact model size of just 1.3 billion parameters (compared to tens of billions in other models), FramePack could enable deployment on edge devices and wider adoption across industries.

FramePack was developed by researchers at Stanford University. The team included Lvmin Zhang, who is better known in the generative AI community as illyasviel, the dev-influencer behind many open-source resources for AI artists like the different Control Nets and IC Lights nodes that revolutionized image generation during the SD1.5/SDXL era.

FramePack‘s key innovation is a clever memory compression system that prioritizes frames based on their importance. Rather than treating all previous frames equally, the system assigns more computational resources to recent frames while progressively compressing older ones.

Using FramePack nodes under ComfyUI (the interface used to generate videos locally) provides very good results—especially considering how low hardware is required. Enthusiasts have generated 120 seconds of consistent video with minimal errors, beating SOTA models that provide great quality but severely degrade when users push their limits and extend videos to more than a few seconds

Framepacks is available for local installation via its official GitHub repository. The team emphasized that the project has no official website, and all other URLs using its name are scam sites not affiliated with the project.

“Do not pay money or download files from any of those websites,” the researchers warned.

The practical benefits of FramePack include the possibility of small-scale training, higher-quality outputs due to "less aggressive schedulers with less extreme flow shift timesteps," consistent visual quality maintained throughout long videos, and compatibility with existing video diffusion models like HunyuanVideo and Wan.

Edited by Sebastian Sinclair and Josh Quittner

Your Email