Realm

AI infrastructure.
Purpose-built.

End-to-end on a single compute substrate.

Training and inference execute entirely through Vulkan compute dispatches. Forward pass, gradient accumulation, and optimizer steps remain on the accelerator. The host manages data loading and orchestration. Built on the Oa compute substrate.

Byte Embedding
01

Byte Embedding

Fixed 256-token vocabulary eliminates external tokenization. Raw byte input accepts text, code, and binary data through a single interface. No preprocessing dependencies.

Parallel Branch States
02

Parallel Branch States

Multiple hypothesis branches execute concurrently. Each evaluates an independent reasoning path. An internal scoring mechanism selects the optimal result before output.

Vendor-Neutral Compute
03

Vendor-Neutral Compute

OaEngine dispatches training and inference as compiled Vulkan compute kernels. Identical SPIR-V executes on Intel, NVIDIA, AMD, and multi-device configurations. No vendor-specific SDK dependency.

Deploy & Resume
04

Deploy & Resume

Self-contained .oam artifact includes model weights and compiled compute kernels. One file deploys to any Vulkan-capable device. Deterministic resume from the exact training step.

Engineered for mission-critical throughput.

Hardware-optimized memory operations. Topology-aware threading with work-stealing schedulers. Vendor-neutral compute kernels. Post-quantum cryptographic primitives. The complete Oa HPC substrate underpins the AI pipeline.

01

GPU-Native Execution

All compute-intensive operations execute on the accelerator through Vulkan dispatches. The host manages orchestration, data loading, and checkpoint I/O. No secondary GPU API. No interpreter on the critical path.

02

Constant-Time Inference

Fixed-size state representation compresses the full context window. Per-token compute cost remains constant regardless of sequence position. No KV cache growth. No memory scaling with context length.

03

Single Binary Deployment

OaEngine compiles the complete training and inference pipeline into a single statically-linked binary with embedded compute kernels. No runtime interpreter. No dynamic dispatch overhead. 80-160x measured throughput advantage over equivalent Python frameworks.

04

Streaming Data Ingestion

Zero-copy data pipeline with hardware-optimized memory operations. Datasets stream directly to the accelerator without intermediate staging. Constant memory footprint from 1 GB to multi-TB corpora.

05

Configuration-Driven Scaling

Identical codebase operates from 140K to 10B+ parameters. Architecture adapts through configuration, not code modification. Validated at 9.6M parameters with coherent generation. Production scaling path to 8B+.

06

Deterministic Recovery

Metric-driven checkpointing with automated rotation and exact-step resume. Training resumes from the precise interruption point after any failure condition. Zero compute loss. Zero manual intervention.

ReaLLM Foundation Model

OaLlm

Byte-level language model operating on a fixed 256-token vocabulary. No external tokenizer. No vocabulary mismatch across languages or modalities. Text, code, and binary data enter through a single input interface.

Parallel hypothesis branches execute concurrent reasoning paths. An internal scoring mechanism evaluates each branch and selects the optimal result. Adaptive compute depth allocates resources proportional to token complexity. Validated at 9.6M parameters. Production scaling path to 8B+.

Implemented in C++ on OaEngine. Executes on NVIDIA, AMD, and Intel accelerators without modification. Training and inference dispatch through Vulkan compute. Reference configuration operates within a fraction of available device memory.

9.6M

Reference parameters

256 bytes

Input vocabulary

GPU-resident

Compute path

~10%

VRAM utilization

Verified tutorials. Real hardware numbers.

Both examples run end-to-end on Vulkan — no CUDA dependency, no Python runtime. Numbers are measured on an RTX 5090 Laptop GPU using hardware timestamps. Full source and reproducible configs are in the developer docs.

Image Classification

Fashion-MNIST

784 → 128 (ReLU) → 10 MLP. 86.2% test accuracy on held-out data. 4,685 steps, batch 64. AdamW. OaModule + OaLinear + OaFnMatrix.

Autograd

// Fashion-MNIST: 86.2% test accuracy, <1s
// RTX 5090 Laptop — 900K samples/s (wall) · 1.3M samples/s (GPU)
class OaMnistClassifier : public OaModule {
public:
OaMnistClassifier() {
Fc1_ = Register(OaMakeShared<OaLinear>(784, 128));
Fc2_ = Register(OaMakeShared<OaLinear>(128, 10));
}
OaTensor Forward(const OaTensor& x) override {
auto h = OaFnMatrix::Scale(x, 1.0f / 255.0f);
h = OaFnMatrix::Relu(Fc1_->Forward(h));
return Fc2_->Forward(h);
}
private:
OaSharedPtr<OaLinear> Fc1_, Fc2_;
};
int main() {
auto rt = OaEngine::Create({.AppName = "Mnist"}).Unwrap();
OaMnistClassifier model;
OaAdamW opt(model.Parameters(), 0.001f);
for (OaI32 step = 0; step < 4685; ++step) {
auto logits = model.Forward(batchImages);
auto loss = OaFnMatrix::CrossEntropyLoss(logits, labels);
OaFnGrad::Backward(loss);
opt.Step();
opt.ZeroGrad();
}
model.Save("mnist.oam");
}
Text Generation

Char-level LM

Embedding → Tanh MLP → vocab logits. 300 steps, batch 32. Final loss 0.041. 96.9% batch accuracy. OaEmbedding + OaFnMatrix::Tanh + OaAdamW.

Graph

// Char-level text model — Level 1 autograd
// RTX 5090 Laptop — 320K samples/s (wall) · 820K samples/s (GPU)
// Final loss 0.041 · 96.9% batch accuracy
class OaTextModel : public OaModule {
public:
OaTextModel() {
Embed_ = Register(OaMakeShared<OaEmbedding>(27, 16));
Fc1_ = Register(OaMakeShared<OaLinear>(128, 64));
Head_ = Register(OaMakeShared<OaLinear>(64, 27));
}
OaTensor Forward(const OaTensor& x) override {
auto e = Embed_->Forward(x); // [batch*8, 16]
auto h = OaFnMatrix::Tanh(
Fc1_->Forward(e)); // [batch, 64]
return Head_->Forward(h); // [batch, 27]
}
private:
OaSharedPtr<OaEmbedding> Embed_;
OaSharedPtr<OaLinear> Fc1_, Head_;
};
int main() {
auto rt = OaEngine::Create({.AppName = "Text"}).Unwrap();
OaTextModel model;
OaAdamW opt(model.Parameters(), 0.01f);
for (OaI32 step = 0; step < 300; ++step) {
auto logits = model.Forward(input);
auto loss = OaFnMatrix::CrossEntropyLoss(logits, targets);
OaFnGrad::Backward(loss);
opt.Step();
opt.ZeroGrad();
}
model.Save("text.oam");
}
GPU infrastructure

Measured on hardware.

Fashion-MNIST classification on RTX 5090 Laptop GPU and Intel ARL iGPU. Identical model architecture, identical batch size, same convergence target. Measured with hardware timestamps using Vulkan 1.4. Full reproducible configs in the developer reference.

RTX 5090 Laptop · Fashion-MNIST · v0.6.34

Wall SPS

900K

samples/s

GPU SPS

1.3M

samples/s

Accuracy

86.2%

test set

vs PyTorch

8.6x

wall-clock

Vendor-neutral AI infrastructure.

Managed training and inference services on the Oa compute substrate. Byte-level language models with post-quantum security. Seeking infrastructure partners and early integration programs.

Get in Touch