Continual Audio Embeddings API · RFE v87 · March 2026

Add any new sound class
in one second.
No retraining. Ever.

RippleRank turns audio into searchable, classifiable, anomaly-detectable embeddings that learn new classes in real time — powered by pure wave physics. 92.22% crystal probe accuracy. 10/10 architectural milestones closed. Record once. Crystallize. Deploy forever.

92.22% crystal probe · no leakage 10/10 GAP closures 730K parameters One-shot 64% mean accuracy Edge-ready ONNX WebSocket anomaly stream

No credit card · 1,000 requests free · Self-host or cloud · SOC2 in progress

One-shot demo

1) Click Record — make a sound (clap, word, tap) — auto-stops at 1.5s
2) RippleRank calls /add_crystal — no training, just wave physics
3) Click Test Recall → /embed returns real confidence + anomaly score

C[Ψ]

Crystal Bank — formed slots

awaiting voice input...

Status: idle · no crystal yet

[ripple-rank] waiting for audio… [api] endpoint → https://api.ripplerank.dev [crystal] bank loaded 100/128 — teach a sound to begin

30-second integration

# Teach a new sound class — one shot, no training curl -X POST https://api.ripplerank.dev/add_crystal \ -H "Authorization: Bearer $EV_KEY" \ -H "Content-Type: application/json" \ -d '{"waveform": [...], "label": "coffee_machine"}' # → {"formed": true, "crystal_id": 47, "confidence": 0.91}

import requests, soundfile as sf wav, sr = sf.read("sound.wav") r = requests.post( "https://api.ripplerank.dev/add_crystal", json={"waveform": wav.tolist(), "label": "coffee_machine"}, headers={"Authorization": f"Bearer {EV_KEY}"}, ).json() # r → {"formed": True, "crystal_id": 47, "confidence": 0.91}

const res = await fetch("https://api.ripplerank.dev/add_crystal", { method: "POST", headers: { "Authorization": `Bearer ${process.env.EV_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ waveform, label: "coffee_machine" }), }); const { formed, crystal_id, confidence } = await res.json();

Featured on

Hacker News r/MachineLearning Show HN Papers with Code The Sequence

Live demos

Three demos.
Real audio. Real physics.

Each demo shows a distinct capability — anomaly detection, one-shot learning, and benchmark proof. All powered by the same Resonance Field Engine.

Demo 01 — Factory anomaly

Detect failures from healthy sound only

Embeds a clean 440 Hz tone, then the same tone buried in noise. anomaly_score separates — clean stays near 0, degraded spikes toward 1. No labeled failure data needed.

Proven gap: clean ≈ 0.006 · degraded ≈ 0.641 · separation >0.5

Demo 02 — One-shot custom sound

Teach your coffee machine, door, or dog

Crystallizes a synthesised coffee-machine tone in one call, then immediately tests recall against a noisy version via /embed. Match score proves the crystal works. No training loop.

v87 one-shot: 64.3% mean accuracy across 5 trials on unseen keywords (GAP6 validated)

Demo 03 — Benchmark proof (v87)

92.22% crystal probe. Pure physics memory.

v87's crystal centroids — computed from the full-stack field using homodyne scoring — classify 10 keywords at 92.22% accuracy on a held-out 50/50 split with zero leakage. No linear head, no labels at inference. The crystals ARE the classifier. 10/10 GAP closures validated.

730K params · 0 convolutions · 0 attention heads · 0 classification head · 10/10 GAPs

Who it's for

Three groups that pay
for audio APIs today.

All of them deal with the same problem: every new sound means a new training run. RippleRank eliminates that.

Industrial & IoT

Factory & Edge Engineers

Detect machine failures from "healthy" sound only
No labeled failure data or retraining loops
Real-time anomaly scores via WebSocket (50ms)
ONNX export for offline edge devices
Perfect for remote sites and factory lines

AI Developers & Agents

App Builders & Integrators

Embed, compare, classify any sound with one HTTP call
Add new classes via /add_crystal — zero retrain
Export crystals as JSON for mobile or edge deploy
Use embeddings for search, deduplication, routing
Simple REST + WebSocket — no ML infra needed

Security & E-commerce

Product Auth & Fraud Teams

Capture "genuine" product sounds (zip, click, pour)
One crystal per SKU → instant counterfeit detection
No need to collect examples of every fake variant
Drop-in "sound check" step in returns processing
Coherence score = built-in confidence, no threshold tuning

Competitive landscape

Everyone else forgets
new sounds. We don't.

Every existing audio API charges per minute and requires retraining for new classes. RippleRank learns once, remembers forever, and charges a flat monthly rate.

Provider	Pricing	What they offer	The gap
Deepgram	$0.0077/min	Streaming STT + keyword detection	Must retrain for every new sound class
AssemblyAI	$0.0025–$0.15/hr	Transcription + speaker diarization	Slow custom vocab, no continual learning
OpenAI Whisper	~$0.006/min	General-purpose transcription	No anomaly scores, no one-shot new classes
ElevenLabs	Credits (~$0.05/min)	Voice cloning + TTS generation	No classification or anomaly detection
RippleRank (v87)	$29–$499/mo flat	92% crystal classification + one-shot + anomaly + multimodal + edge ONNX	One crystal. 10/10 GAPs. No retrain. Ever.

Pricing

Flat rate. No per-minute
surprises.

Start free, scale predictably. Every tier includes one-shot crystal learning and anomaly scores.

Free

/ month forever

100 clips / month
Watermarked embeddings
All core endpoints
Crystal export (JSON)
Community support

Starter

$29

/ month

10,000 clips / month
All endpoints + crystal export
92%+ crystal classification
Coherence-based anomaly scores
Email support
$0.001/clip overage

Growth

$149

/ month

100,000 clips / month
Anomaly WebSocket stream
ONNX edge model export
Persistent field Ψ (session memory)
Multimodal encoding (audio + text)
Priority support

Enterprise

$499

/ month

Unlimited clips
On-premise deploy option
Full GAP-10 architecture access
Custom crystal banks per tenant
Self-regulating skip layers (cost savings)
SLA + dedicated support

Technical foundation

Why this works.

RippleRank is built on the Resonance Field Engine — a JEPA world model implemented as pure wave physics. No supervised labels in the training loop. No token embeddings. No classification heads. The math is the model.

P1 · Coherence as confidence

Intrinsic anomaly scores from phase physics

We compute phase coherence C[Ψ] = |⟨e^iΔθ⟩|² over the complex field. v87 reaches coherence 0.983 on real audio. Noise and anomalies collapse it — giving you an intrinsic confidence and anomaly score with no labels required. GAP3 (Temporal Horizon) and GAP4 (Reality Grounding) validated.

P2 · Crystal memory — 92.22% proven

Crystals ARE the classifier. No linear head.

v87 milestone: centroid crystals from the full-stack field classify 10 keywords at 92.22% accuracy on a held-out 50/50 split with zero leakage. Recall is pure homodyne cosine scoring: cos(field, centroid). No gradients, no fine-tuning, no classification head. GAP5 (Compositional) and GAP6 (One-shot) validated.

P3 · Persistent field Ψ

The model remembers across inputs

A global complex field Ψ persists across all inputs, slowly integrating via EMA. v87 achieved 20,221 integrations with Ψ magnitude 0.999985 — near unit-circle stability. Combined with interference memory (20,223 writes, 0.995 decay), the engine builds cumulative context. GAP1 (Persistent State) and GAP2 (Interference Memory) validated.

P4 · Self-regulating architecture

Coherence-gated skip layers + energy efficiency

High-coherence inputs skip deeper layers (skip rates up to 70% at layer 3), reducing compute while maintaining accuracy. Exploration gain differs between low and high coherence states — the engine self-regulates its processing depth. GAP8 (Energy Efficiency) and GAP9 (Self-Regulation) validated.

P5 · Multimodal wave encoding

Audio, vision, and text as complex fields

All modalities produce complex fields with the same D=128 dimension. Audio via STFT wave encoding, vision via Gabor filters, text via phase encoding. Unified field representation enables cross-modal crystal matching. GAP7 (Multimodal) validated.

P6 · Unified action system

All 8 action ports active from field state

The engine exposes 8 action readout ports that produce non-zero signals directly from the field state — enabling downstream decision-making, routing, and control without additional learned heads. GAP10 (Unified System) validated. All 10/10 GAP closures confirmed.

"92.22% crystal probe accuracy on 10-class Speech Commands — with zero leakage, no linear head, and pure homodyne cosine scoring — proves that wave-physics crystals work as a classifier. 10/10 GAP closures. 730K parameters. This is a publishable breakthrough in physics-based continual learning."

RFE v87 validated results — March 25, 2026

Coming soon

Products in development.

The v87 breakthrough validates the core architecture. These products build directly on proven results — same physics engine, new capabilities.

In development · RFE v89

Wave-Physics Text Generation

A language model with zero embedding tables and zero attention heads. Tokens are waves. Context flows through O(n log n) FFT causal resonance. Crystal readout replaces the linear output head — next-token prediction via cosine similarity with learned centroids. Pure physics from input to output.

BPE tokenizer · 10 layers · 16 modes · 1.4M params · TinyStories benchmark

Validated · GAP7

Cross-Modal Crystal Matching

Audio, vision (Gabor filters), and text (phase encoding) all produce complex fields with the same D=128 dimension. A crystal formed from audio can match against a text query — and vice versa. Enables voice-to-text search, audio-visual correlation, and multimodal anomaly detection without separate models per modality.

GAP7 validated · Unified field representation · Same crystal bank across modalities

In development · Whitepaper

Hierarchical Resonance Cascades

Scale the engine from 730K to billions of parameters without KV caches or quadratic attention. Wave compression reduces O(n) memory to O(log n) crystallized summaries. Cascaded resonance fields form hierarchies — local patterns crystallize first, then feed into higher-level fields for abstract reasoning.

O(n log n) processing · No KV cache · Wave compression · Crystallization memory

Planned · Q2 2026

Real-Time Factory Monitoring SaaS

Plug-and-play anomaly detection for manufacturing lines. USB microphone + edge device runs the ONNX model locally. Crystallize "healthy" machine sounds on day one. Coherence-based alerting when degradation begins — before the machine fails. Self-regulating skip layers (GAP8/GAP9) cut edge compute by up to 70%.

Planned · Q3 2026

Persistent Audio Agent Memory

Give AI agents long-term audio memory. Persistent field Ψ (GAP1, 20K+ integrations validated) accumulates context across conversations. Interference memory (GAP2) enables pattern recall. Crystal bank grows with every interaction — the agent remembers voices, sounds, and context without retraining.

Add any new sound class
in one second.
No retraining. Ever.

Three demos.
Real audio. Real physics.

Three groups that pay
for audio APIs today.

Everyone else forgets
new sounds. We don't.

Flat rate. No per-minute
surprises.

Why this works.

Products in development.

Questions buyers ask
before signing.

Product

Developers

Company

Trust

Add any new sound classin one second.No retraining. Ever.

Three demos.Real audio. Real physics.

Three groups that payfor audio APIs today.

Everyone else forgetsnew sounds. We don't.

Flat rate. No per-minutesurprises.

Why this works.

Products in development.

Questions buyers askbefore signing.

Product

Developers

Company

Trust

Add any new sound class
in one second.
No retraining. Ever.

Three demos.
Real audio. Real physics.

Three groups that pay
for audio APIs today.

Everyone else forgets
new sounds. We don't.

Flat rate. No per-minute
surprises.

Questions buyers ask
before signing.