Credit where it's due: AI is changing everything.
A 1GB video can now be transmitted using only about 200KB of data—and still look sharp, smooth, and full of detail. That's a compression ratio of 0.02%, achieved without sacrificing perceived quality.
Skeptical? Consider this scenario: You're on a cargo ship in the middle of the Pacific with barely one or two bars of satellite signal. Normally, even loading a social media feed feels glacial. But with this new AI-powered approach, you could stream a crystal-clear World Cup match live.

The innovation comes from Generative Video Compression (GVC), developed by China Telecom's Artificial Intelligence Research Institute (TeleAI). As a major state-owned telecom giant with infrastructure spanning sea, land, air, and space, China Telecom is uniquely positioned to fuse cutting-edge AI with real-world networking challenges. This "cloud-network convergence + AI-native" strength is what makes GVC viable beyond the lab—in extreme settings like oceangoing vessels or disaster zones.

Today's video streaming—whether Netflix, Bilibili, or WeChat calls—relies on traditional codecs like HEVC (H.265) or VVC (H.266). These work by meticulously analyzing which pixels stay the same and which move, then squeezing as much pixel data as possible into limited bandwidth.
It performs beautifully with plenty of bandwidth, but in ultra-low-bandwidth situations, it falls apart fast: encoders discard high-frequency details, resulting in blocky, blurry, or frozen video.

Instead of sending the image itself, GVC transmits compact "instructions" for recreating it. Think of it this way:
- Traditional compression → Take a photo of the Mona Lisa, compress it heavily, and send it. Bad network? It arrives as colorful mush.
- GVC → Send a concise description ("mysterious smiling woman, mountain-and-river backdrop, light from the left") plus precise details like the exact curve of her smile. On the receiving end, a powerful AI generative model acts as an instant painter and recreates the scene from scratch.
In reality, it's far more sophisticated than text prompts—the transmitted data consists of highly compressed Tokens carrying the video's essence.
Inside GVC: What Gets Sent?
The system has two main parts: a Neural Encoder on the sender side and a Generative Video Decoder (powered by a diffusion model) on the receiver side.
What travels over the wire is a tiny packet of compressed Tokens encoding:
- Semantic Information — Scene type, presence of people/objects, overall structure (the "skeleton").
- Motion Dynamics — How things move next: object trajectories, wind, wheel rotation (the "soul").
Tests show these Tokens can be squeezed down to 0.005–0.008 bits per pixel (bpp)—compared to 0.1+ bpp for typical HD video. That's a reduction of two orders of magnitude.
At the receiver, the diffusion model draws on its vast pre-trained world knowledge (it already "knows" what waves, soccer balls, etc. look like) and uses the Tokens to generate realistic video.
This marks a major shift in communication theory, moving from the Shannon-Weaver model's Level A (technical accuracy: did every bit arrive correctly?) to Level C (effectiveness: does it accomplish the goal for humans and machines?).

Hard Numbers: It Actually Works
On the standard MCL-JCV dataset, at ~0.005 bpp:
- Traditional HEVC collapses into mosaic artifacts, with poor LPIPS scores (lower = better perceptual quality).
- GVC delivers clear textures and structure, with dramatically better LPIPS.
Key takeaway: To match GVC's visual quality, traditional methods need over 6× more bandwidth. On a terrible connection, GVC might let you read Ronaldo's expression; HEVC shows a moving blob.
Importantly, it's not just pretty pictures. Tests on DAVIS2017 video segmentation show GVC preserves accurate semantics—even at 0.01 bpp—outperforming HEVC in J&F metrics. Key objects (people, balls, vehicles) stay precisely located, supporting downstream AI analysis.
Practical Enough for Consumer Hardware
Generative models are compute-heavy, but TeleAI optimized via model miniaturization and knowledge distillation. On an RTX 4090, generating 29 frames takes just 0.95–1.35 seconds—usable for non-real-time or slightly delayed live streams.
Beyond Watching Sports
The 0.02% figure is eye-catching, but the real impact lies in extreme scenarios:

GVC isn't standalone—it's built on TeleAI's AI Flow framework, which treats communication as intelligent distribution rather than raw data transfer. At WAIC last year, CTO and TeleAI head Prof. Xuelong Li introduced AI Flow's three core laws:
- Information Capacity Law — Measures model knowledge density via compression.
- Homologous Law — Guides efficient creation of model families at different scales.
- Integration Law — Enables emergent intelligence through multi-model collaboration.
Under AI Flow, communication evolves into distributing intelligence. When bandwidth bottlenecks appear, GVC burns compute to unlock freedom.
Video compression is undergoing a paradigm shift—from pixel shuttling to semantic generation—much like feature phones gave way to smartphones.
As a flagship AI+telecom fusion from a major state player, GVC delivers practical solutions for remote comms, emergency ops, and edge intelligence. It paves the way for a future internet where networks carry condensed intelligence and instructions, not bloated raw data.