Live benchmarks · Updated hourly

Live benchmarks.
Full dataset open-source.

Most KYC vendors hide behind "contact sales for accuracy." We publish ours hourly — on 1,000 labeled synthetic cases anyone can download and verify.

Last run: Pending first run 1,000 synthetic cases 30 jurisdictions
Next run in:
FlowAI — current run
Overall accuracy
OFAC catch rate
False positive rate
p50 latency
p95 latency
$—
Per 1K decisions (Growth)
4-vendor comparison
Metric FlowAI ★ Best Sardine Unit21 Persona
Overall accuracy Not disclosed Not disclosed Not disclosed
OFAC catch rate Not disclosed Not disclosed Not disclosed
False positive rate Not disclosed Not disclosed Not disclosed
p50 latency 3000ms 5s 2500ms
p95 latency 8s 15s 6s
$/1K decisions at 50K/mo $— ~$1,450 ~$2,000 ~$300

Competitor latency from published SLAs. Competitor accuracy not publicly disclosed — they don't publish it. Cost estimated from public pricing or contact-sales minimums. Full methodology →

FlowAI per-category accuracy
Clean Approvals
600 cases
Low-risk applicants that should APPROVE
OFAC Sanctions
50 cases
Matches against SDN / OFAC blocked lists
PEP Matches
50 cases
Politically exposed persons — ESCALATE
Document Fraud
100 cases
Altered or counterfeit identity docs
Velocity / Structuring
50 cases
Structuring and velocity-based AML signals
Geographic Risk
100 cases
FATF grey-list + high-risk jurisdictions
Edge Cases
50 cases
Multi-signal ambiguous — hardest category

Download the dataset

1,000 labeled synthetic cases. CC-BY-4.0. Use it to test any vendor, retrain a model, or verify our numbers yourself. No account required.

✓ Your download is starting now. Check your email for a copy of the link.
Click here if it doesn't start automatically →

Or grab it directly (no gate):

Direct CSV Direct JSON
Run it yourself
Quickstart docs Node & Python SDKs Get sandbox key
Methodology

How this works

Every hour, FlowAI's triage agent runs against all 1,000 cases in the synthetic dataset. Each case has a ground-truth label (APPROVE / ESCALATE / REJECT) that was assigned based on documented compliance criteria — not by the model being tested. The agent's output is compared to that label and accuracy computed per category.

The dataset

1,000 synthetic KYC cases built for breadth and realism:

Cases span 30 jurisdictions and 5 ID types. Dataset is fully reproducible (deterministic seed). Download to inspect →

Where competitor numbers come from

⚠️ Disclosure: Competitor accuracy numbers are not publicly disclosed by Sardine, Unit21, or Persona. We have not tested their APIs against this dataset — we don't have access, and claiming otherwise would be dishonest. What you see in the table for competitors is: latency from their published SLA documentation, and cost estimated from their public pricing pages or published minimums. Sources:

Why publish this

Most KYC vendors hide accuracy numbers behind an NDA. We think that's backwards. Compliance teams are making high-stakes decisions with these tools — they deserve verifiable numbers. Publishing ours publicly, with the dataset to reproduce them, is how we earn trust. If our numbers are wrong, we want to know.