Product

Vision
Images and video. Straight to the model.

A complete image and video runtime for machine learning. Load, decode, transform, and normalise through one GPU-first media runtime. Every frame arrives as a tensor, ready for inference or training.

What it covers

From raw media to model-ready tensors.

Image Processing

Comprehensive image transformations for machine learning. Resize, normalize, crop, and more — all optimized for GPU workloads.

Video Decode

Vulkan Video decoding for H.264, H.265, AV1, and VP9 when the device exposes the required profiles and formats.

ML Bridge

Seamless integration with machine learning pipelines. Video frames become model-ready tensors in a single operation.

Cross-Vendor

Runs on all major GPU vendors. Automatic adaptation to available device capabilities.

Capabilities

Everything between your media and your model.

GPU Image Transforms

Resize, normalize, crop, flip, rotate, and blur — each available as a standalone call or chained in a pipeline. All operations run on the accelerator.

JPEG & PNG Ingest

Load images from files or memory buffers and receive device tensors in one call. Optional preprocessing happens automatically. Batch processing available.

Hardware Video Decode

H.264, H.265, AV1, and VP9 streams decode through Vulkan Video when the device supports the required codec path.

Frame-to-Tensor Conversion

Single call converts decoded frames to tensors with optional normalisation. Automatic adaptation to device capabilities.

Capability Queries

Query supported codecs, maximum resolutions, and available features before opening a session. Know what your device can do.

Preview Output

The same decoded frame can feed a model and display output simultaneously. Multiple display modes available without restart.

API

Clean surface. No ceremony.

Each operation is a direct call. No pipeline objects to configure before you can resize an image. No session lifecycle for a normalisation pass. Stateful resources — decoders, encoders — exist where they earn their keep.

Image_ingest

auto planes = OaImagePlanes::LoadFile(engine, path).Unwrap();
auto image = planes.ToMatrix();

auto resized = OaFnImage::Resize(image, 224, 224);
auto input = OaFnImage::Normalize(resized, imageNetParams);
// input keeps the selected OaMatrix precision route

IMAGE

Load and preprocess an image

One call from file path to normalised GPU tensor. Resize and ImageNet normalisation happen on the device.

Video_decode

auto video = OaVideo::Open(engine, {
  .Uri = "clip.mp4",
  .Loop = false,
}).Unwrap();

auto input = video.CurrentFrameToMatrix(true).Unwrap();
// decoded frame → normalized OaMatrix

VIDEO

Decode video to ML tensor

Open a decoder session, submit a compressed access unit, receive a normalised tensor in the same call.

Media Support

Standard formats. Hardware decode where available.

The Vision runtime queries the device at startup and selects the optimal path per format — hardware decode engine, hardware colour conversion, or shader-based fallbacks — without any configuration on your end.

H.264 / AVC

Hardware decode

H.265 / HEVC

Hardware decode

AV1

Hardware decode

JPEG / PNG

CPU decode, GPU upload

Zero-copy decode path

Decoded frames stay on the device from the moment they leave the video engine. The frame transitions directly to shader-readable layout — no read-back, no re-upload.

Hardware colour conversion

YCbCr-to-RGB conversion uses the GPU sampler hardware when available. The compute fallback activates automatically on devices where the hardware path is absent or restricted.

Simultaneous ML and preview

One decoded frame can produce both a normalised ML input tensor and an RGBA preview frame in the same dispatch sequence. No second decode, no duplicate memory.

Built on the same stack