
AI Infrastructure
AI infrastructure.
Purpose-built.
Vendor-neutral training and inference on a single compute substrate. The same compiled kernels deploy across Intel, NVIDIA, AMD, and multi-device configurations without modification. No vendor-specific SDK. No framework dependency. One binary.
9.6M
Reference params
77K tok/s
Train throughput
Vulkan
Compute API
~10%
VRAM utilization
Pipeline
End-to-end on a single compute substrate.
Training and inference execute entirely through Vulkan compute dispatches. Forward pass, gradient accumulation, and optimizer steps remain on the accelerator. The host manages data loading and orchestration. Built on the Oa compute substrate.

Byte Embedding
Fixed 256-token vocabulary eliminates external tokenization. Raw byte input accepts text, code, and binary data through a single interface. No preprocessing dependencies.

Parallel Branch States
Multiple hypothesis branches execute concurrently. Each evaluates an independent reasoning path. An internal scoring mechanism selects the optimal result before output.

Vendor-Neutral Compute
OaEngine dispatches training and inference as compiled Vulkan compute kernels. Identical SPIR-V executes on Intel, NVIDIA, AMD, and multi-device configurations. No vendor-specific SDK dependency.

Deploy & Resume
Self-contained .oam artifact includes model weights and compiled compute kernels. One file deploys to any Vulkan-capable device. Deterministic resume from the exact training step.
Capabilities
Engineered for mission-critical throughput.
Hardware-optimized memory operations. Topology-aware threading with work-stealing schedulers. Vendor-neutral compute kernels. Post-quantum cryptographic primitives. The complete Oa HPC substrate underpins the AI pipeline.
GPU-Native Execution
All compute-intensive operations execute on the accelerator through Vulkan dispatches. The host manages orchestration, data loading, and checkpoint I/O. No secondary GPU API. No interpreter on the critical path.
Constant-Time Inference
Fixed-size state representation compresses the full context window. Per-token compute cost remains constant regardless of sequence position. No KV cache growth. No memory scaling with context length.
Single Binary Deployment
OaEngine compiles the complete training and inference pipeline into a single statically-linked binary with embedded compute kernels. No runtime interpreter. No dynamic dispatch overhead. 80-160x measured throughput advantage over equivalent Python frameworks.
Streaming Data Ingestion
Zero-copy data pipeline with hardware-optimized memory operations. Datasets stream directly to the accelerator without intermediate staging. Constant memory footprint from 1 GB to multi-TB corpora.
Configuration-Driven Scaling
Identical codebase operates from 140K to 10B+ parameters. Architecture adapts through configuration, not code modification. Validated at 9.6M parameters with coherent generation. Production scaling path to 8B+.
Deterministic Recovery
Metric-driven checkpointing with automated rotation and exact-step resume. Training resumes from the precise interruption point after any failure condition. Zero compute loss. Zero manual intervention.

Reference Architecture
OaLlm
Byte-level language model operating on a fixed 256-token vocabulary. No external tokenizer. No vocabulary mismatch across languages or modalities. Text, code, and binary data enter through a single input interface.
Parallel hypothesis branches execute concurrent reasoning paths. An internal scoring mechanism evaluates each branch and selects the optimal result. Adaptive compute depth allocates resources proportional to token complexity. Validated at 9.6M parameters. Production scaling path to 8B+.
Implemented in C++ on OaEngine. Executes on NVIDIA, AMD, and Intel accelerators without modification. Training and inference dispatch through Vulkan compute. Reference configuration operates within a fraction of available device memory.
9.6M
Reference parameters
256 bytes
Input vocabulary
GPU-resident
Compute path
~10%
VRAM utilization
API Examples
Verified tutorials. Real hardware numbers.
Both examples run end-to-end on Vulkan — no CUDA dependency, no Python runtime. Numbers are measured on an RTX 5090 Laptop GPU using hardware timestamps. Full source and reproducible configs are in the developer docs.
Fashion-MNIST
784 → 128 (ReLU) → 10 MLP. 86.2% test accuracy on held-out data. 4,685 steps, batch 64. AdamW. OaModule + OaLinear + OaFnMatrix.
Autograd
// Fashion-MNIST: 86.2% test accuracy, <1s// RTX 5090 Laptop — 900K samples/s (wall) · 1.3M samples/s (GPU)class OaMnistClassifier : public OaModule {public:OaMnistClassifier() {Fc1_ = Register(OaMakeShared<OaLinear>(784, 128));Fc2_ = Register(OaMakeShared<OaLinear>(128, 10));}OaTensor Forward(const OaTensor& x) override {auto h = OaFnMatrix::Scale(x, 1.0f / 255.0f);h = OaFnMatrix::Relu(Fc1_->Forward(h));return Fc2_->Forward(h);}private:OaSharedPtr<OaLinear> Fc1_, Fc2_;};int main() {auto rt = OaEngine::Create({.AppName = "Mnist"}).Unwrap();OaMnistClassifier model;OaAdamW opt(model.Parameters(), 0.001f);for (OaI32 step = 0; step < 4685; ++step) {auto logits = model.Forward(batchImages);auto loss = OaFnMatrix::CrossEntropyLoss(logits, labels);OaFnGrad::Backward(loss);opt.Step();opt.ZeroGrad();}model.Save("mnist.oam");}
Char-level LM
Embedding → Tanh MLP → vocab logits. 300 steps, batch 32. Final loss 0.041. 96.9% batch accuracy. OaEmbedding + OaFnMatrix::Tanh + OaAdamW.
Graph
// Char-level text model — Level 1 autograd// RTX 5090 Laptop — 320K samples/s (wall) · 820K samples/s (GPU)// Final loss 0.041 · 96.9% batch accuracyclass OaTextModel : public OaModule {public:OaTextModel() {Embed_ = Register(OaMakeShared<OaEmbedding>(27, 16));Fc1_ = Register(OaMakeShared<OaLinear>(128, 64));Head_ = Register(OaMakeShared<OaLinear>(64, 27));}OaTensor Forward(const OaTensor& x) override {auto e = Embed_->Forward(x); // [batch*8, 16]auto h = OaFnMatrix::Tanh(Fc1_->Forward(e)); // [batch, 64]return Head_->Forward(h); // [batch, 27]}private:OaSharedPtr<OaEmbedding> Embed_;OaSharedPtr<OaLinear> Fc1_, Head_;};int main() {auto rt = OaEngine::Create({.AppName = "Text"}).Unwrap();OaTextModel model;OaAdamW opt(model.Parameters(), 0.01f);for (OaI32 step = 0; step < 300; ++step) {auto logits = model.Forward(input);auto loss = OaFnMatrix::CrossEntropyLoss(logits, targets);OaFnGrad::Backward(loss);opt.Step();opt.ZeroGrad();}model.Save("text.oam");}

Performance
Measured on hardware.
Fashion-MNIST classification on RTX 5090 Laptop GPU and Intel ARL iGPU. Identical model architecture, identical batch size, same convergence target. Measured with hardware timestamps using Vulkan 1.4. Full reproducible configs in the developer reference.
RTX 5090 Laptop · Fashion-MNIST · v0.6.34
Wall SPS
900K
samples/s
GPU SPS
1.3M
samples/s
Accuracy
86.2%
test set
vs PyTorch
8.6x
wall-clock
Early Access
Vendor-neutral AI infrastructure.
Managed training and inference services on the Oa compute substrate. Byte-level language models with post-quantum security. Seeking infrastructure partners and early integration programs.
Get in Touch