Meteor Lake Architecture Revealed: AI, Tiles And The Future Of Intel Core CPUs


Intel Meteor Lake Architecture Deep Dive: New Media Engine, Display Pipeline, And Graphics

Media And Display Engine Improvements

intel meteor lake graphics media
We will get to the Graphics Tile in a moment, but first there are some key improvements to the Media Engine and Display Engine that we should cover.

intel meteor lake encode decode

The Media Engine is fixed function hardware for accelerating particular codecs, which allows it to operate very efficiently. Meteor Lake’s implementation should make content creators quite happy as it supports AVC, HEVC, and the increasingly popular AV1 for both decode and encode. The details about just what extents of these are supported are detailed in the slide above. For instance, the AV1 implementation offers 10 bit depth with 4:2:0 chroma subsampling, but HEVC can give full 10 bit 4:4:4 chroma subsampling, if needed.

intel meteor lake media engine

The Media Engine features MFX blocks which each contain a decoder and encoder. There is a video scaler and color space converter for each MFX block, and then the Engine shares the video enhancer, HDR tone mapper, and Bayer processor (e.g. to demosaic a raw image from a sensor from a grid of red/blue/green subpixels).

display panel self refresh

The Display Engine also brings some interesting optimizations that shouldn’t be overlooked. It offers four display pipes, two of which are low power optimized. Intel is building on technologies like Burst Fill to decrease memory demands and Panel Self Refresh (PSR) which skips fetch and generation for repeated frames to further reduce resource demands.

display burst decode

The new power optimization trick is Burst Decode, or Selective Update and Hardware Queuing. This allows the Display Engine to look ahead and queue up to 16 frames and decode them at once, then distribute those queued frames as needed. This approach does not need to wake the cores outside of the Burst Decode blocks, thereby allowing power management to kick in more often. This is combined with techniques like PSR and Selected Fetch so the repeated frames on refresh don’t need to consume any resources, and the newly presented frames only need to access memory and the display pipe.

Full Featured Xe-LPG Engine In The Graphics Tile

intel meteor lake graphics scaling
Meteor Lake’s Graphics Tile fuses the foundation of Xe-LP with the full feature set of Xe-HPG to yield Xe-LPG. This is being touted as a 2x performance per watt uplift over Xe-LP with an approximate 2x overall performance uplift as well.

intel meteor lake graphics lpg vs lp

Its capabilities are scaled over Xe-LP through a higher clock frequency, overall larger GPU configuration, and the aforementioned architectural efficiency improvements, so let’s step through them.

intel meteor lake graphics frequency

Xe-LPG yields higher clock speeds at every voltage point of the curve. This gives it higher overall clocks, but with the benefit of a lower minimum voltage. Intel used AI to fine tune to timing closure for Meteor Lake, which it says delivered a 20% improvement over traditional human best-efforts. Intel also credits TSMC’s N5 process for helping it achieve these speeds.

intel meteor lake graphics wide

The GPU configuration is wider, which is to say it can do more in parallel than Xe-LP which came before it. It now has 8 Xe-cores that amount to 128 Vector Engines between two Render Slices. The number of Samplers has increased from six to eight and there are now four pixel backends, one Sampler per core and two pixel backends per slice. Because these are full Xe-Cores, they also contain a Ray Tracing Unit, making for a total of eight RTUs. That’s right, ray tracing is supported on integrated graphics here.

intel meteor lake graphics xe core

Architecturally, Meteor Lake supports 16 x 256-bit vector processing with 192KB of shared L1 cache per Xe-core.

intel meteor lake graphics vector engine

The vector engines can run dedicated FP execution at 16 FP32 operations per clock or 32 FP16 operations per clock while the shared execution port can handle 64 INT8 operations per clock, 2 extended math operations per clock, or a single FP64 operation per clock. That’s not blazing fast FP64 support, mind you, but it is there if needed for software compatibility. Since FP and INT/EM are broken apart now, it can also handle co-issued instructions using both in parallel. Most importantly, this uses the same pipeline as discrete Arc GPUs to allow for faster software development and better overall compatibility and stability.

The Intel Graphics Software Stack

intel meteor lake graphics power optimization
A large part of the Intel Graphics Team’s focus has been in reducing energy consumption, mainly by reducing unnecessary CPU overhead. For example, the latest DX9 driver optimizations have reduced API overhead from 326 mJoules per frame to 226 mJoules per frame, a reduction of nearly 31%. It also offers its XeSS upscaling feature which can be looked at as a “power saving” mode in that the example shown drops energy usage from 863 mJoules per frame with native rendering by nearly 40% to 526 mJoules per frame.

intel meteor lake graphics xess overview

XeSS operates the same way with Meteor Lake, but it is worth reviewing. It renders out low-resolution frames with raster, lighting, and post-processing, then applies motion vectors to synthesize a high-resolution image. It also feeds back details from frame history to improve future super sampling and finally applies additional post-processing to refine the image displayed on-screen.

intel meteor lake graphics xess performance

In practice, this significantly shortens the rendering phase of the pipeline. The anti-aliasing phase becomes AA+upscaling which takes a little longer than strictly antialiasing, but nowhere near as long as the time savings from rendering at a lower resolution, all while leaving final post-processing effects effectively unaltered.

intel meteor lake graphics endurance gaming

The Intel Arc Control panel has also introduced a feature called Endurance gaming. Users have control over whether this is enabled or not, and there are different presets – Relaxed, Balanced, and MaxBattery – which target 60FPS, 45FPS, and 30FPS, respectively.

intel meteor lake graphics endurance gaming power

Endurance Gaming reduces total package power consumption. The example shown depicts Rocket League playing at a total SoC power draw of just 10 watts, of which only about 1 watt is being consumed by the GPU. Sure, the framerate won’t be unconstrained, but it gives gamers the option to choose battery life when necessary.

Related content