
Product
Vision
Images and video. Straight to the model.
A complete image and video runtime for machine learning. Load, decode, transform, and normalise — on the accelerator, without copies. Every frame arrives as a tensor, ready for inference or training.
7
Image Transforms
H.264 / H.265 / AV1
Video Codecs
BF16 NCHW
Output Format
Zero
CPU Copies
What it covers
From raw media to model-ready tensors.

Image Processing
Resize, normalize, crop, flip, rotate, and blur — all dispatched in a single pipeline. Batch preprocessing for training datasets runs without leaving the accelerator.

Video Decode
Hardware-accelerated decode for H.264, H.265, and AV1. Frames land directly as ML-ready tensors. No staging copies. No intermediate formats the model never asked for.

ML Bridge
Every decoded frame converts to a normalised BF16 NCHW tensor in one call. The same source frame feeds model input and preview output simultaneously.

Cross-Vendor
Runs on NVIDIA, AMD, Intel, and Qualcomm. Hardware YCbCr conversion is used when available; a compute fallback activates automatically on devices that do not expose it.
Capabilities
Everything between your media and your model.
GPU Image Transforms
Resize with bilinear interpolation, channel normalisation, Gaussian blur, crop, flip, and rotation — each available as a standalone call or chained in a pipeline. Inputs and outputs are tensors; no CPU roundtrip required.
JPEG & PNG Ingest
Load a file path or in-memory buffer and receive a device tensor in one call. Optional resize and ImageNet normalisation happen on the way in. Batch variants process entire dataset shards in a single submission.
Hardware Video Decode
H.264, H.265, and AV1 streams decode through the GPU video engine. The decoded frame never touches system memory on the fast path — it transitions directly to shader-readable layout for conversion.
Frame-to-Tensor Conversion
A single call converts any decoded frame to a `[1, 3, H, W]` BF16 tensor with optional ImageNet normalisation. Hardware colour-space conversion is preferred; a shader-based fallback covers the rest.
Capability Queries
Query which codecs the device supports, maximum resolution per codec, and whether hardware colour conversion is available — before opening a session. No surprises at decode time.
Preview Output
The same decoded frame that feeds a model can be routed to an RGBA output for display. Aspect-fit, crop-fill, and integer-scale modes are compositor push constants — no decoder restart required.
API
Clean surface. No ceremony.
Each operation is a direct call. No pipeline objects to configure before you can resize an image. No session lifecycle for a normalisation pass. Stateful resources — decoders, encoders — exist where they earn their keep.
Image_ingest
auto image = OaJpegDecoder::DecodeFileToGpu(rt, path);auto resized = OaFnVision::Resize(rt, image, 224, 224);auto input = OaFnVision::Normalize(rt, resized, imagenet);// input → [1, 3, 224, 224] BF16 — ready for inference
Load and preprocess an image
One call from file path to normalised GPU tensor. Resize and ImageNet normalisation happen on the device.
Video_decode
auto decoder = OaVideoDecoder::Create(rt, {.Codec = OaVideoCodec::H264,.Width = 1920, .Height = 1080,});auto input = decoder.DecodeFrameToBf16(accessUnit);// input → [1, 3, 1080, 1920] BF16
Decode video to ML tensor
Open a decoder session, submit a compressed access unit, receive a normalised tensor in the same call.
Media Support
Standard formats. Hardware decode where available.
The Vision runtime queries the device at startup and selects the optimal path per format — hardware decode engine, hardware colour conversion, or shader-based fallbacks — without any configuration on your end.
H.264 / AVC
Hardware decode
H.265 / HEVC
Hardware decode
AV1
Hardware decode
JPEG / PNG
CPU decode, GPU upload
Zero-copy decode path
Decoded frames stay on the device from the moment they leave the video engine. The frame transitions directly to shader-readable layout — no read-back, no re-upload.
Hardware colour conversion
YCbCr-to-RGB conversion uses the GPU sampler hardware when available. The compute fallback activates automatically on devices where the hardware path is absent or restricted.
Simultaneous ML and preview
One decoded frame can produce both a normalised ML input tensor and an RGBA preview frame in the same dispatch sequence. No second decode, no duplicate memory.
Built on the same stack

