Continual Audio Embeddings API · RFE v87 · March 2026

Add any new sound class
in one second.
No retraining. Ever.

RippleRank turns audio into searchable, classifiable, anomaly-detectable embeddings that learn new classes in real time — powered by pure wave physics. 92.22% crystal probe accuracy. 10/10 architectural milestones closed. Record once. Crystallize. Deploy forever.

92.22% crystal probe · no leakage 10/10 GAP closures 730K parameters One-shot 64% mean accuracy Edge-ready ONNX WebSocket anomaly stream
No credit card  ·  1,000 requests free  ·  Self-host or cloud  ·  SOC2 in progress
One-shot demo

1) Click Record — make a sound (clap, word, tap) — auto-stops at 1.5s
2) RippleRank calls /add_crystal — no training, just wave physics
3) Click Test Recall → /embed returns real confidence + anomaly score

C[Ψ]
Crystal Bank — formed slots
awaiting voice input...
Status: idle · no crystal yet
[ripple-rank] waiting for audio… [api] endpoint → https://api.ripplerank.dev [crystal] bank loaded 100/128 — teach a sound to begin
30-second integration
# Teach a new sound class — one shot, no training curl -X POST https://api.ripplerank.dev/add_crystal \ -H "Authorization: Bearer $EV_KEY" \ -H "Content-Type: application/json" \ -d '{"waveform": [...], "label": "coffee_machine"}' # → {"formed": true, "crystal_id": 47, "confidence": 0.91}
import requests, soundfile as sf wav, sr = sf.read("sound.wav") r = requests.post( "https://api.ripplerank.dev/add_crystal", json={"waveform": wav.tolist(), "label": "coffee_machine"}, headers={"Authorization": f"Bearer {EV_KEY}"}, ).json() # r → {"formed": True, "crystal_id": 47, "confidence": 0.91}
const res = await fetch("https://api.ripplerank.dev/add_crystal", { method: "POST", headers: { "Authorization": `Bearer ${process.env.EV_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ waveform, label: "coffee_machine" }), }); const { formed, crystal_id, confidence } = await res.json();
Featured on
Hacker News r/MachineLearning Show HN Papers with Code The Sequence
92.22%
Crystal probe accuracy
50/50 split · no leakage
10/10
GAP closures passed
Full architecture validated
730K
Parameters total
No convolutions or attention
1 pass
To learn any new class
Zero retraining required
0.983
Peak coherence C[Ψ]
Real audio vs. noise gap
Live demos

Three demos.
Real audio. Real physics.

Each demo shows a distinct capability — anomaly detection, one-shot learning, and benchmark proof. All powered by the same Resonance Field Engine.

Demo 01 — Factory anomaly
Detect failures from healthy sound only

Embeds a clean 440 Hz tone, then the same tone buried in noise. anomaly_score separates — clean stays near 0, degraded spikes toward 1. No labeled failure data needed.

Proven gap: clean ≈ 0.006 · degraded ≈ 0.641 · separation >0.5
Demo 02 — One-shot custom sound
Teach your coffee machine, door, or dog

Crystallizes a synthesised coffee-machine tone in one call, then immediately tests recall against a noisy version via /embed. Match score proves the crystal works. No training loop.

v87 one-shot: 64.3% mean accuracy across 5 trials on unseen keywords (GAP6 validated)
Demo 03 — Benchmark proof (v87)
92.22% crystal probe. Pure physics memory.

v87's crystal centroids — computed from the full-stack field using homodyne scoring — classify 10 keywords at 92.22% accuracy on a held-out 50/50 split with zero leakage. No linear head, no labels at inference. The crystals ARE the classifier. 10/10 GAP closures validated.

730K params · 0 convolutions · 0 attention heads · 0 classification head · 10/10 GAPs
Who it's for

Three groups that pay
for audio APIs today.

All of them deal with the same problem: every new sound means a new training run. RippleRank eliminates that.

Industrial & IoT
Factory & Edge Engineers
  • Detect machine failures from "healthy" sound only
  • No labeled failure data or retraining loops
  • Real-time anomaly scores via WebSocket (50ms)
  • ONNX export for offline edge devices
  • Perfect for remote sites and factory lines
AI Developers & Agents
App Builders & Integrators
  • Embed, compare, classify any sound with one HTTP call
  • Add new classes via /add_crystal — zero retrain
  • Export crystals as JSON for mobile or edge deploy
  • Use embeddings for search, deduplication, routing
  • Simple REST + WebSocket — no ML infra needed
Security & E-commerce
Product Auth & Fraud Teams
  • Capture "genuine" product sounds (zip, click, pour)
  • One crystal per SKU → instant counterfeit detection
  • No need to collect examples of every fake variant
  • Drop-in "sound check" step in returns processing
  • Coherence score = built-in confidence, no threshold tuning
Competitive landscape

Everyone else forgets
new sounds. We don't.

Every existing audio API charges per minute and requires retraining for new classes. RippleRank learns once, remembers forever, and charges a flat monthly rate.

Provider Pricing What they offer The gap
Deepgram $0.0077/min Streaming STT + keyword detection Must retrain for every new sound class
AssemblyAI $0.0025–$0.15/hr Transcription + speaker diarization Slow custom vocab, no continual learning
OpenAI Whisper ~$0.006/min General-purpose transcription No anomaly scores, no one-shot new classes
ElevenLabs Credits (~$0.05/min) Voice cloning + TTS generation No classification or anomaly detection
RippleRank (v87) $29–$499/mo flat 92% crystal classification + one-shot + anomaly + multimodal + edge ONNX One crystal. 10/10 GAPs. No retrain. Ever.
Pricing

Flat rate. No per-minute
surprises.

Start free, scale predictably. Every tier includes one-shot crystal learning and anomaly scores.

Free
$0
/ month forever
  • 100 clips / month
  • Watermarked embeddings
  • All core endpoints
  • Crystal export (JSON)
  • Community support
Growth
$149
/ month
  • 100,000 clips / month
  • Anomaly WebSocket stream
  • ONNX edge model export
  • Persistent field Ψ (session memory)
  • Multimodal encoding (audio + text)
  • Priority support
Enterprise
$499
/ month
  • Unlimited clips
  • On-premise deploy option
  • Full GAP-10 architecture access
  • Custom crystal banks per tenant
  • Self-regulating skip layers (cost savings)
  • SLA + dedicated support
Technical foundation

Why this works.

RippleRank is built on the Resonance Field Engine — a JEPA world model implemented as pure wave physics. No supervised labels in the training loop. No token embeddings. No classification heads. The math is the model.

P1 · Coherence as confidence
Intrinsic anomaly scores from phase physics

We compute phase coherence C[Ψ] = |⟨e^iΔθ⟩|² over the complex field. v87 reaches coherence 0.983 on real audio. Noise and anomalies collapse it — giving you an intrinsic confidence and anomaly score with no labels required. GAP3 (Temporal Horizon) and GAP4 (Reality Grounding) validated.

P2 · Crystal memory — 92.22% proven
Crystals ARE the classifier. No linear head.

v87 milestone: centroid crystals from the full-stack field classify 10 keywords at 92.22% accuracy on a held-out 50/50 split with zero leakage. Recall is pure homodyne cosine scoring: cos(field, centroid). No gradients, no fine-tuning, no classification head. GAP5 (Compositional) and GAP6 (One-shot) validated.

P3 · Persistent field Ψ
The model remembers across inputs

A global complex field Ψ persists across all inputs, slowly integrating via EMA. v87 achieved 20,221 integrations with Ψ magnitude 0.999985 — near unit-circle stability. Combined with interference memory (20,223 writes, 0.995 decay), the engine builds cumulative context. GAP1 (Persistent State) and GAP2 (Interference Memory) validated.

P4 · Self-regulating architecture
Coherence-gated skip layers + energy efficiency

High-coherence inputs skip deeper layers (skip rates up to 70% at layer 3), reducing compute while maintaining accuracy. Exploration gain differs between low and high coherence states — the engine self-regulates its processing depth. GAP8 (Energy Efficiency) and GAP9 (Self-Regulation) validated.

P5 · Multimodal wave encoding
Audio, vision, and text as complex fields

All modalities produce complex fields with the same D=128 dimension. Audio via STFT wave encoding, vision via Gabor filters, text via phase encoding. Unified field representation enables cross-modal crystal matching. GAP7 (Multimodal) validated.

P6 · Unified action system
All 8 action ports active from field state

The engine exposes 8 action readout ports that produce non-zero signals directly from the field state — enabling downstream decision-making, routing, and control without additional learned heads. GAP10 (Unified System) validated. All 10/10 GAP closures confirmed.

"92.22% crystal probe accuracy on 10-class Speech Commands — with zero leakage, no linear head, and pure homodyne cosine scoring — proves that wave-physics crystals work as a classifier. 10/10 GAP closures. 730K parameters. This is a publishable breakthrough in physics-based continual learning."
RFE v87 validated results — March 25, 2026
Coming soon

Products in development.

The v87 breakthrough validates the core architecture. These products build directly on proven results — same physics engine, new capabilities.

In development · RFE v89
Wave-Physics Text Generation

A language model with zero embedding tables and zero attention heads. Tokens are waves. Context flows through O(n log n) FFT causal resonance. Crystal readout replaces the linear output head — next-token prediction via cosine similarity with learned centroids. Pure physics from input to output.

BPE tokenizer · 10 layers · 16 modes · 1.4M params · TinyStories benchmark
Validated · GAP7
Cross-Modal Crystal Matching

Audio, vision (Gabor filters), and text (phase encoding) all produce complex fields with the same D=128 dimension. A crystal formed from audio can match against a text query — and vice versa. Enables voice-to-text search, audio-visual correlation, and multimodal anomaly detection without separate models per modality.

GAP7 validated · Unified field representation · Same crystal bank across modalities
In development · Whitepaper
Hierarchical Resonance Cascades

Scale the engine from 730K to billions of parameters without KV caches or quadratic attention. Wave compression reduces O(n) memory to O(log n) crystallized summaries. Cascaded resonance fields form hierarchies — local patterns crystallize first, then feed into higher-level fields for abstract reasoning.

O(n log n) processing · No KV cache · Wave compression · Crystallization memory
Planned · Q2 2026
Real-Time Factory Monitoring SaaS

Plug-and-play anomaly detection for manufacturing lines. USB microphone + edge device runs the ONNX model locally. Crystallize "healthy" machine sounds on day one. Coherence-based alerting when degradation begins — before the machine fails. Self-regulating skip layers (GAP8/GAP9) cut edge compute by up to 70%.

Planned · Q3 2026
Persistent Audio Agent Memory

Give AI agents long-term audio memory. Persistent field Ψ (GAP1, 20K+ integrations validated) accumulates context across conversations. Interference memory (GAP2) enables pattern recall. Crystal bank grows with every interaction — the agent remembers voices, sounds, and context without retraining.

"

We replaced a 14M-parameter classifier and a weekly retraining pipeline with one RippleRank endpoint. New defect sounds get added in seconds — by the operators, not the ML team.

Design partner  ·  Industrial monitoring  ·  March 2026
Frequently asked

Questions buyers ask
before signing.

If your blocker isn't here, email [email protected] — we reply within a day.

How does "no retraining" actually work?
A new sound becomes a 128-dim complex field via one forward pass through the wave-physics encoder. We store it as a "crystal" and match against it with a phase-invariant inner product. No gradients, no backprop, no fine-tuning — just physics.
What's the inference latency?
<50ms per second of audio on a single CPU core. ONNX edge build runs on a Raspberry Pi 4 in real time. No GPU required for inference or for adding new classes.
Can I self-host?
Yes. The Pro tier ships a Docker image with the full v87 model, FastAPI server, and ONNX edge build. Air-gapped deployments are supported on the Enterprise tier with on-prem licensing.
How is this different from Picovoice / Nyckel?
Picovoice requires you to build keyword models in their console. Nyckel requires labeled training examples and a training run. RippleRank learns a new class from a single example, in production, with no console step and no training job.
What about my training data — do you keep it?
No. We never store your raw audio. Crystals are 128-dim complex vectors with no recoverable signal. EU data residency available on Enterprise. SOC2 Type I in progress, Type II planned for Q4 2026.
Does it work for speech / voice / music / non-speech?
All of the above. The architecture is modality-agnostic at the field level — speech commands, machine sounds, animal vocalizations, music phrases all crystallize the same way. 92.22% on Speech Commands; design partners use it for industrial, ecological, and medical audio.
What happens if a crystal is "wrong"?
Delete it with one DELETE call, or shadow it with a corrected example. The bank holds 100 active crystals by default, expandable to 1024 on Pro. EMA refinement smooths drift across repeat additions.
Is there a free tier?
Yes — 1,000 requests/month, no credit card. Hobby tier ($49/mo) covers most personal projects. Self-hosting is free for non-commercial use under the source-available license.