Citation Certification

Built for the post-Mercy v. Mankind bar.

Every citation NyayLens shows you has passed a six-tier verification pipeline against an 18.8M-judgment Indian corpus — 25 High Courts + the Supreme Court of India. Every miss is flagged before you read the answer — never silently dropped.

18,824,596

High Court judgments

All 25 Indian HCs, 1950 – today

39,158

Supreme Court judgments

1950 – today, daily refresh

165k+

Parallel-citation crosswalk

AIR · SCC · SCR · INSC · JT · SCALE

1,188

Citator treatment verdicts

overruled · distinguished · doubted · partially overruled · good law

6,315

Judge profiles

Designation · parent HC · tenure · 28 historical CJIs

6 tiers

Verification pipeline

Per Tier breakdown below

The verification stack

Six tiers run on every /research call. Each tier is recorded in citation_audit so a future inquiry can replay every decision the system made.

Tier 1a
Strict-norm exact match
NFKC unicode normalisation + lowercased + dot-stripped + bracket-unified citation string compared byte-for-byte against the citation_norm column.
Triggered: Every claimed citation, regardless of confidence.
Tier 1b
Reporter-alt match
Same canonical form matched against the citation_alt_norm GIN-indexed array (covers AIR / SCC / SCR / INSC / etc. alternates of the same judgment).
Triggered: When strict-norm misses.
Tier 2
pg_trgm fuzzy on citation_norm
pg_trgm similarity over the normalised citation form with threshold 0.85. Typos in reporter format ("[2008] 13 SCC" vs "(2008) 13 SCC") still match.
Triggered: When Tier 1 misses and the verifier is allowed fuzzy fallback.
Tier 3
Score-banded title fuzzy
pg_trgm similarity over lower(title) with threshold 0.35. ≥0.65 → VERIFIED_BY_TITLE; 0.35–0.65 + year-proximity (±1 yr to decision_date) → upgraded to verified; 0.35–0.65 without year corroboration → PROBABLE_BY_TITLE (amber, never auto-verified).
Triggered: When Tier 1 + Tier 2 miss. Catches case-name citations like "Sadanandan Bhadran v. Madhavan Sunil Kumar" against reporter notation in the corpus.
Tier 4
Stricter-prompt retry
If the verifier flags >20% of citations as unverified, the model is re-called ONCE with a correction system message naming each unverified citation and instructing it to either replace the citation with one drawn ONLY from the retrieved chunks or rewrite the answer without that proposition. Retry budget capped at 1× — bounds Anthropic spend.
Triggered: hallucinationRate > 0.20 on the initial pass.
Tier 5
Citator V1 treatment overlay
Every verified citation is checked against the 1,188-row Citator treatment table built on top of the 18.8M-judgment graph (KeyCite/Shepard's convention). Distinguished / Doubted / Overruled / Partially overruled / Per Incuriam citations carry a visible Citator pill (Tier-1) with a hover-card showing which judgment treated it and on what point (Tier-2), and a full 3-tier lineage timeline on click. Good Law is the default.
Triggered: Every citation with a resolved corpus row.
Tier 6
Append-only audit
Each citation verification writes one row to citation_audit with HMAC-keyed claimed-citation hash + status + supportScore + action. Rows are append-only at the database role level; redactions never delete (DPDPA right-to-be-forgotten replaces personal fields and stamps redactedAt/By/reason).
Triggered: Every /research call.

Why this matters for the bar

On 27 February 2026 the Hon'ble Supreme Court of India declared that citing AI-hallucinated case law is professional misconduct — escalated from an error to a disciplinary issue with sanction stakes. The trigger was Mercy v. Mankind: a fake judgment cited in a real petition.

Every Indian legal-AI tool now positions around “citation safety”. Most stop at retrieval-augmented generation and trust the model. NyayLens treats a model-claimed citation as untrusted input until it has been resolved into a corpus row, classified by treatment status, and recorded in a tamper-evident audit log.

If a citation produced by NyayLens is later challenged, the lawyer can demonstrate (a) the tool explicitly flagged unverified citations (the citation either resolved against the 18.8M+ Indian judgment corpus or was annotated [citation unverified]), (b) every citation came with a verifier verdict and a Citator treatment badge (good law · distinguished · doubted · partially overruled · overruled) in the audit modal, and (c) the audit log row exists in append-only storage. The framing differs in kind, not degree, from “the lawyer used ChatGPT”.

What we don't claim

•We do not claim “zero hallucinations”. We claim every citation is marked, every miss is flagged, and the lawyer is the final authority — same standard the SC White Paper (Nov 2025) and the Kerala HC (Jul 2025) ruling require.
•Coverage maps to what each Indian court has published openly via the AWS Open Data registry — not to actual case volume. Allahabad publishes ~2.82M; smaller jurisdictions (Sikkim, Meghalaya, Tripura) publish fewer because they hear fewer matters. Tribunals (NCLT, ITAT, CESTAT, NGT, CAT, DRT) are not in this release; citations to those forums resolve to [corpus_scope] and are flagged.
•Statute corpus carries 1,807 sections across 29 central Acts (BNS 358/358, BNSS 531/531, BSA 170/170, plus IPC, CrPC, IEA, Constitution, and 20 civil/commercial Acts). Sub-section text that has not yet been digitised is flagged.
•Treatment status reflects the most recent corpus refresh (timestamp visible per-citation in the bulk-audit modal). Cases overruled the day before a hot judgment ships may not yet be in the table.

Audit retention

Every /research call writes one row to citation_audit with: request id, rubric version, CRAG decision, claimed/verified/notFound counts, hallucination rate, verifier latency, and a JSON details field. The claimed-citation strings are HMAC-hashed (DPDPA risk: a knowable hash + finite citation space → rainbow table) — only the server with the audit key can group hallucinations on identical claimed citations.

Retention: 90 days for free-tier orgs, 365 days for paid orgs. Right-to-be-forgotten requests redact personal fields rather than delete (preserves the chain integrity); a redaction marker hashes differently and the verifier knows about it.

The verification stack

Strict-norm exact match

Reporter-alt match

pg_trgm fuzzy on citation_norm

Score-banded title fuzzy

Stricter-prompt retry

Citator V1 treatment overlay

Append-only audit

Why this matters for the bar

What we don't claim

Audit retention