Platform Documentation
Architecture decisions, technology choices, and development roadmap for the Ricche research platform.
Platform Architecture
Ricche is being built as a multi-layered research infrastructure where each layer handles a specific workload and scales independently. The architecture follows a five-stage pipeline from raw market data to validated research outputs.
End-to-End Pipeline
Market Data Ingestion → Data Infrastructure → ML Research Environment → Simulation Framework → Validation & Governance
Data Ingestion Layer
Designed to connect to real-time and historical market data feeds from global exchanges including NYSE, CME, LSE, Eurex, HKEX, and SGX. The pipeline architecture supports tick-level order book data, OHLCV pricing, corporate actions, economic indicators, and alternative data sources. Built for sub-second ingestion latency with exactly-once delivery guarantees and automatic gap detection.
GPU Compute Layer
NVIDIA GPU clusters purpose-built for financial machine learning workloads. The storage architecture uses distributed columnar formats (Apache Parquet, Arrow) for petabyte-scale datasets with intelligent tiering across NVMe, SSD, and object storage. Auto-scaling compute allocation with workload-aware priority scheduling ensures optimal GPU utilisation.
Research & Experimentation Layer
CUDA-accelerated PyTorch environments with comprehensive experiment tracking, model versioning, and automated hyperparameter optimisation. Every experiment is reproducible by design — all code, data, parameters, and results are versioned and auditable. Supports deep learning, gradient boosting, transformers, and ensemble approaches.
Simulation & Validation
GPU-parallelised Monte Carlo and agent-based simulation engines that stress-test candidate models across thousands of market scenarios before progression. Formal validation stages with statistical significance testing, walk-forward analysis, and peer review ensure only genuinely robust models advance.
Monitoring & Observability
The Control Room is being developed as a unified operations dashboard providing real-time visibility into GPU memory allocation, data pipeline health, experiment progress, and infrastructure telemetry. Built on Prometheus, Grafana, and OpenTelemetry.
Technology Choices
Every tool in our stack was selected for a specific reason — performance, reliability, or researcher productivity. Below are the key decisions and the reasoning behind them.
Why NVIDIA CUDA
Financial ML workloads are compute-bound. Training a transformer on tick-level data across multiple instruments can take days on CPUs. CUDA-accelerated training on NVIDIA H200 GPUs with 141 GB HBM3e memory and 400G InfiniBand NDR interconnects compresses this to hours. We plan to use the full NVIDIA ecosystem: CUDA for compute, RAPIDS cuDF for GPU-accelerated data engineering, TensorRT for production inference, and NIM for model serving.
Why KDB+/q for Time-Series
KDB+ remains the gold standard for financial time-series storage and querying. Its columnar architecture and vector-native query language handle the analytical patterns common in quant research — windowed aggregations, temporal joins, and tick replay — with unmatched efficiency. Arctic provides the Python interface layer.
Why Redpanda Over Kafka
Redpanda delivers Kafka-compatible streaming with 10x lower tail latency, zero JVM overhead, and a single binary deployment. For a platform processing high-frequency market data, every millisecond of pipeline latency matters. Redpanda's thread-per-core architecture maps directly to our NUMA-aware infrastructure.
Why Rust for Critical Paths
Latency-critical components — order book reconstruction, feed handlers, protocol parsers — are written in Rust. Zero-cost abstractions and compile-time memory safety eliminate an entire class of production bugs without sacrificing performance. Python handles research workflows; Rust handles the hot paths.
Why PyTorch Over TensorFlow
PyTorch's eager execution model and Pythonic API make it the natural choice for research-heavy workflows where rapid iteration matters more than static graph optimisation. Combined with JAX for differentiable programming and FlashAttention-3 for efficient transformer training, this gives researchers maximum flexibility.
Why Kubernetes
GPU workloads are inherently bursty — a researcher may need 8 GPUs for a training run, then none for a week. Kubernetes with NVIDIA GPU Operator handles scheduling, resource allocation, and auto-scaling. Argo CD manages GitOps-based deployment. Cilium provides eBPF-powered networking with fine-grained security policies.
Design Principles
These principles guide every architecture and engineering decision at Ricche.
Reproducibility First
Every experiment must be reproducible. All code, data versions, hyperparameters, random seeds, and environment configurations are captured automatically. If a result cannot be reproduced, it is not a result — it is noise.
No Look-Ahead Bias
The most dangerous bug in quantitative research is invisible: using future information to make past decisions. Our data pipelines enforce strict temporal discipline at every stage. Walk-forward splits, point-in-time datasets, and temporal validation are mandatory, not optional.
GPU-Native, Not GPU-Bolted
We don't take CPU pipelines and add GPU acceleration as an afterthought. The entire data path — from ingestion through feature engineering to training and inference — is designed for GPU execution. Data stays on the GPU between stages. This eliminates the CPU-GPU transfer bottleneck that undermines most "GPU-accelerated" platforms.
Validate Before You Trust
No model advances without surviving structured validation: out-of-sample testing, walk-forward analysis, Monte Carlo stress testing across regime changes and tail events, and formal peer review. The goal is to kill bad ideas early, not to ship models fast.
Infrastructure as Code
Every component — from GPU cluster configuration to data pipeline definitions to monitoring dashboards — is defined in code, version-controlled, and deployed through automated pipelines. No manual configuration, no snowflake servers, no "it works on my machine."
Development Roadmap
Ricche is under active development. Below is our phased approach to building the full platform. We share this transparently because we believe our partners and collaborators deserve to know exactly where we are.
- Platform vision, architecture design, and technology selection
- Website and public documentation
- Institutional whitepaper and technical collateral
- Initial partner and investor outreach
- Data ingestion pipeline — exchange feed connectivity, normalisation, and quality checks
- GPU compute environment — NVIDIA CUDA cluster provisioning and job scheduling
- Experiment tracking — versioning, reproducibility, and audit trail
- Storage layer — KDB+/q time-series store, Parquet data lake, feature store
- ML research environment — PyTorch training pipelines with CUDA acceleration
- Simulation framework — GPU-parallelised Monte Carlo and agent-based engines
- Validation governance — structured review, walk-forward testing, peer review workflows
- Research SDK and CLI for experiment submission and results retrieval
- Control Room dashboard — real-time GPU, pipeline, and experiment monitoring
- Auto-scaling and workload-aware resource allocation
- Multi-asset class coverage expansion
- Partner onboarding and collaborative research workflows
Interested in our progress or want to discuss partnership opportunities? Get in touch — we welcome conversations at any stage.