Realm

Vision
Images and video. Straight to the model.

From raw media to model-ready tensors.

Image Processing
01

Image Processing

Resize, normalize, crop, flip, rotate, and blur — all dispatched in a single pipeline. Batch preprocessing for training datasets runs without leaving the accelerator.

Video Decode
02

Video Decode

Hardware-accelerated decode for H.264, H.265, and AV1. Frames land directly as ML-ready tensors. No staging copies. No intermediate formats the model never asked for.

ML Bridge
03

ML Bridge

Every decoded frame converts to a normalised BF16 NCHW tensor in one call. The same source frame feeds model input and preview output simultaneously.

Cross-Vendor
04

Cross-Vendor

Runs on NVIDIA, AMD, Intel, and Qualcomm. Hardware YCbCr conversion is used when available; a compute fallback activates automatically on devices that do not expose it.

Everything between your media and your model.

01

GPU Image Transforms

Resize with bilinear interpolation, channel normalisation, Gaussian blur, crop, flip, and rotation — each available as a standalone call or chained in a pipeline. Inputs and outputs are tensors; no CPU roundtrip required.

02

JPEG & PNG Ingest

Load a file path or in-memory buffer and receive a device tensor in one call. Optional resize and ImageNet normalisation happen on the way in. Batch variants process entire dataset shards in a single submission.

03

Hardware Video Decode

H.264, H.265, and AV1 streams decode through the GPU video engine. The decoded frame never touches system memory on the fast path — it transitions directly to shader-readable layout for conversion.

04

Frame-to-Tensor Conversion

A single call converts any decoded frame to a `[1, 3, H, W]` BF16 tensor with optional ImageNet normalisation. Hardware colour-space conversion is preferred; a shader-based fallback covers the rest.

05

Capability Queries

Query which codecs the device supports, maximum resolution per codec, and whether hardware colour conversion is available — before opening a session. No surprises at decode time.

06

Preview Output

The same decoded frame that feeds a model can be routed to an RGBA output for display. Aspect-fit, crop-fill, and integer-scale modes are compositor push constants — no decoder restart required.

Clean surface. No ceremony.

Each operation is a direct call. No pipeline objects to configure before you can resize an image. No session lifecycle for a normalisation pass. Stateful resources — decoders, encoders — exist where they earn their keep.

Image_ingest

auto image = OaJpegDecoder::DecodeFileToGpu(rt, path);
auto resized = OaFnVision::Resize(rt, image, 224, 224);
auto input = OaFnVision::Normalize(rt, resized, imagenet);
// input → [1, 3, 224, 224] BF16 — ready for inference
01

Load and preprocess an image

One call from file path to normalised GPU tensor. Resize and ImageNet normalisation happen on the device.

Video_decode

auto decoder = OaVideoDecoder::Create(rt, {
.Codec = OaVideoCodec::H264,
.Width = 1920, .Height = 1080,
});
auto input = decoder.DecodeFrameToBf16(accessUnit);
// input → [1, 3, 1080, 1920] BF16
02

Decode video to ML tensor

Open a decoder session, submit a compressed access unit, receive a normalised tensor in the same call.

Standard formats. Hardware decode where available.

The Vision runtime queries the device at startup and selects the optimal path per format — hardware decode engine, hardware colour conversion, or shader-based fallbacks — without any configuration on your end.

H.264 / AVC

Hardware decode

H.265 / HEVC

Hardware decode

AV1

Hardware decode

JPEG / PNG

CPU decode, GPU upload

Zero-copy decode path

Decoded frames stay on the device from the moment they leave the video engine. The frame transitions directly to shader-readable layout — no read-back, no re-upload.

Hardware colour conversion

YCbCr-to-RGB conversion uses the GPU sampler hardware when available. The compute fallback activates automatically on devices where the hardware path is absent or restricted.

Simultaneous ML and preview

One decoded frame can produce both a normalised ML input tensor and an RGBA preview frame in the same dispatch sequence. No second decode, no duplicate memory.