LIVE · phase-0 evidence chain · Singapore (Neon) + APAC (R2)

The AI benchmark
you can audit yourself.

Every score on GetAI is anchored to a public Merkle root. Pull any bundle, recompute the SHA-256, walk the proof to the daily root — no GetAI infrastructure needed. Trust nothing. Verify everything.

verified SHA-256 manifest account_tree Daily Merkle root lock Per-workspace RLS bolt Edge runtime verify
runs
evidence bundles
artifact blobs
postgres tables
RLS policies
infra spend / mo
$0
How it works

One pipeline. Five proofs.

Every model invocation that lands on the leaderboard travels the same path. Each step is independently verifiable; we publish the cryptographic glue between them so you don't have to take our word for anything.

  1. terminal step 1

    Sandboxed call

    Deterministic params, captured headers, header-hash baseline.

  2. grading step 2

    Predicate eval

    8-axis scorers + 3-judge ensemble (Phase 1).

  3. inventory_2 step 3

    Evidence bundle

    Canonical orjson, SHA-256 manifest, signatures.

  4. account_tree step 4

    Merkle anchor

    Daily root published 00:00 UTC, RFC 6962-style tree.

  5. verified_user step 5

    Public verify

    CLI, edge function, third party — same answer.

Why GetAI

Built for procurement-grade decisions.

Most AI benchmarks publish a number. GetAI publishes the number, the prompt, the response bytes, the judge verdicts, the cost snapshot, and a cryptographic proof you can replay six months from now.

workspaces

Tenant-private eval

Distill your support tickets into a private benchmark pack in 48 hours. NDA-bound, RLS-isolated, never on the public board.

radar

Silent update probe

2-of-3 fusion (header hash + fingerprint cosine + vendor notes) catches model swaps your dashboard misses for weeks.

hub

Verifiable evidence chain

SHA-256 content-addressable storage + daily Merkle root + envelope encryption + GDPR tombstones. Every byte accountable.

translate

繁中 vertical packs

Real Taiwan workloads — 發票 OCR · 健保勞保公文 · 客服理賠 · 法遵 — not translated MMLU.

Live leaderboard

Phase 0 baseline. Full cohort joins Phase 1.

One model is currently being measured against the tw-coding-daily-v1 smoke pack. The queued rows below ship in Phase 1 (Q3 2026) under the D8 three-judge ensemble. Phase 0 scores are provisional and not eligible for public ranking until then.

#
Vendor
Model
Score
Bundles
Last seen
Status
Launch-Gate 12.7

14 days of consecutive Merkle roots.

One of the hard pre-GA gates: every day for 14 consecutive days the daily Merkle root must be published and resolvable. Each green cell is a day with at least one bundle anchored.

Trailing 14 days · UTC
/ 14 consecutive days
day with anchored bundle no bundle
Evidence stream

Every bundle, downloadable, replay-verified.

10 most recent bundles in the chain. Click any row to see the full integrity check run live at the edge — Cloudflare fetches the ZIP from R2, recomputes the SHA-256, and reports verified / tampered / missing.