Machine Learning

Train, evaluate, and deploy.
One Vulkan runtime.

A GPU-first C++ and Python stack for automatic differentiation, reusable neural modules, model artifacts, and measured end-to-end workloads.

Execution model

From input representation to a reloadable model artifact.

The same matrix type moves through preprocessing, modules, loss functions, gradients, optimizers, validation, and application inference. Boundaries are visible and testable.

DATA

Tokenization and input

Byte, character, and deterministic BPE recipes share one controlled corpus and evaluation contract.

MODEL

Composable modules

Recurrent, attention, Transformer, sparse MoE, VQ, and experimental SSM components build on OaMatrix operators.

TRAIN

Autograd and validation

Gradient recording, optimizers, validation metrics, checkpoint selection, and GPU timing use one training surface.

DEPLOY

Portable artifacts

.oam artifacts carry architecture identity, weights, optimizer state, and the metadata required for a verified reload.

Native training

A small public surface over a complete training graph.

Modules register parameters. The tape records backward rules. The training iterator owns optimizer stepping, synchronized metrics, validation, and checkpoint policy.

Train_step.cpp

auto params = model->AllParameterPtrs();
auto optimizer = OaMakeUniquePtr<OaAdamW>(params, learningRate);

optimizer->ZeroGrad();
OaGradientTape tape;
auto logits = model->Forward(batchX);
auto loss = OaFnLoss::CrossEntropy(logits, batchY);
tape.Backward(loss);
training.Loop.Next(loss);

Verified workloads

Evidence before architecture claims.

The controlled NLP matrix covers five architectures across three tokenization families. Desktop and Android use the same corpus, dimensions, training budget, prompt, and generation length for a real cross-device comparison.

Vulkan compute

Model systems

Capabilities

GPU-first execution

Operators, gradients, optimizer work, and model inference are recorded through the Vulkan compute runtime. Host boundaries remain explicit for data and synchronized metrics.

C++ and Python parity

Python composes the same native modules and kernels. The controlled NLP suite verifies matching parameter counts, convergence, generation, and checkpoint behavior.

Dense and sparse models

Transformer blocks accept dense, MoE, or hybrid feed-forward routes without replacing the surrounding architecture.

Measured performance

OA reports wall time, GPU time, latency percentiles, sample throughput, token or byte throughput, loss, and task-specific evaluation.

Capability routing

Precision and kernel routes are selected from device capabilities. FP32 remains the verified reference on the Intel Iris Xe development system.

Application models

The Animation Language Model combines text conditioning, a learned motion tokenizer, a dense Transformer prior, and USD generation in one artifact.

Train, evaluate, and deploy.One Vulkan runtime.