Skip to main content
Back to NyayLens

Citation Certification

Built for the post-Mercy v. Mankind bar.

Every citation NyayLens shows you has passed a six-tier verification pipeline against an 18.8M-judgment Indian corpus — 25 High Courts + the Supreme Court of India. Every miss is flagged before you read the answer — never silently dropped.

18,824,596
High Court judgments
All 25 Indian HCs, 1950 – today
39,158
Supreme Court judgments
1950 – today, daily refresh
165k+
Parallel-citation crosswalk
AIR · SCC · SCR · INSC · JT · SCALE
1,188
Citator treatment verdicts
overruled · distinguished · doubted · partially overruled · good law
6,315
Judge profiles
Designation · parent HC · tenure · 28 historical CJIs
6 tiers
Verification pipeline
Per Tier breakdown below

The verification stack

Six tiers run on every /research call. Each tier is recorded in citation_audit so a future inquiry can replay every decision the system made.

  1. Tier 1a

    Strict-norm exact match

    NFKC unicode normalisation + lowercased + dot-stripped + bracket-unified citation string compared byte-for-byte against the citation_norm column.

    Triggered: Every claimed citation, regardless of confidence.

  2. Tier 1b

    Reporter-alt match

    Same canonical form matched against the citation_alt_norm GIN-indexed array (covers AIR / SCC / SCR / INSC / etc. alternates of the same judgment).

    Triggered: When strict-norm misses.

  3. Tier 2

    pg_trgm fuzzy on citation_norm

    pg_trgm similarity over the normalised citation form with threshold 0.85. Typos in reporter format ("[2008] 13 SCC" vs "(2008) 13 SCC") still match.

    Triggered: When Tier 1 misses and the verifier is allowed fuzzy fallback.

  4. Tier 3

    Score-banded title fuzzy

    pg_trgm similarity over lower(title) with threshold 0.35. ≥0.65 → VERIFIED_BY_TITLE; 0.35–0.65 + year-proximity (±1 yr to decision_date) → upgraded to verified; 0.35–0.65 without year corroboration → PROBABLE_BY_TITLE (amber, never auto-verified).

    Triggered: When Tier 1 + Tier 2 miss. Catches case-name citations like "Sadanandan Bhadran v. Madhavan Sunil Kumar" against reporter notation in the corpus.

  5. Tier 4

    Stricter-prompt retry

    If the verifier flags >20% of citations as unverified, the model is re-called ONCE with a correction system message naming each unverified citation and instructing it to either replace the citation with one drawn ONLY from the retrieved chunks or rewrite the answer without that proposition. Retry budget capped at 1× — bounds Anthropic spend.

    Triggered: hallucinationRate > 0.20 on the initial pass.

  6. Tier 5

    Citator V1 treatment overlay

    Every verified citation is checked against the 1,188-row Citator treatment table built on top of the 18.8M-judgment graph (KeyCite/Shepard's convention). Distinguished / Doubted / Overruled / Partially overruled / Per Incuriam citations carry a visible Citator pill (Tier-1) with a hover-card showing which judgment treated it and on what point (Tier-2), and a full 3-tier lineage timeline on click. Good Law is the default.

    Triggered: Every citation with a resolved corpus row.

  7. Tier 6

    Append-only audit

    Each citation verification writes one row to citation_audit with HMAC-keyed claimed-citation hash + status + supportScore + action. Rows are append-only at the database role level; redactions never delete (DPDPA right-to-be-forgotten replaces personal fields and stamps redactedAt/By/reason).

    Triggered: Every /research call.

Why this matters for the bar

On 27 February 2026 the Hon'ble Supreme Court of India declared that citing AI-hallucinated case law is professional misconduct — escalated from an error to a disciplinary issue with sanction stakes. The trigger was Mercy v. Mankind: a fake judgment cited in a real petition.

Every Indian legal-AI tool now positions around “citation safety”. Most stop at retrieval-augmented generation and trust the model. NyayLens treats a model-claimed citation as untrusted input until it has been resolved into a corpus row, classified by treatment status, and recorded in a tamper-evident audit log.

If a citation produced by NyayLens is later challenged, the lawyer can demonstrate (a) the tool explicitly flagged unverified citations (the citation either resolved against the 18.8M+ Indian judgment corpus or was annotated [citation unverified]), (b) every citation came with a verifier verdict and a Citator treatment badge (good law · distinguished · doubted · partially overruled · overruled) in the audit modal, and (c) the audit log row exists in append-only storage. The framing differs in kind, not degree, from “the lawyer used ChatGPT”.

What we don't claim

  • We do not claim “zero hallucinations”. We claim every citation is marked, every miss is flagged, and the lawyer is the final authority — same standard the SC White Paper (Nov 2025) and the Kerala HC (Jul 2025) ruling require.
  • Coverage maps to what each Indian court has published openly via the AWS Open Data registry — not to actual case volume. Allahabad publishes ~2.82M; smaller jurisdictions (Sikkim, Meghalaya, Tripura) publish fewer because they hear fewer matters. Tribunals (NCLT, ITAT, CESTAT, NGT, CAT, DRT) are not in this release; citations to those forums resolve to [corpus_scope] and are flagged.
  • Statute corpus carries 1,807 sections across 29 central Acts (BNS 358/358, BNSS 531/531, BSA 170/170, plus IPC, CrPC, IEA, Constitution, and 20 civil/commercial Acts). Sub-section text that has not yet been digitised is flagged.
  • Treatment status reflects the most recent corpus refresh (timestamp visible per-citation in the bulk-audit modal). Cases overruled the day before a hot judgment ships may not yet be in the table.

Audit retention

Every /research call writes one row to citation_audit with: request id, rubric version, CRAG decision, claimed/verified/notFound counts, hallucination rate, verifier latency, and a JSON details field. The claimed-citation strings are HMAC-hashed (DPDPA risk: a knowable hash + finite citation space → rainbow table) — only the server with the audit key can group hallucinations on identical claimed citations.

Retention: 90 days for free-tier orgs, 365 days for paid orgs. Right-to-be-forgotten requests redact personal fields rather than delete (preserves the chain integrity); a redaction marker hashes differently and the verifier knows about it.

Last reviewed . Verification stack and corpus stats reflect the build live in production at the time of review — 18,824,596 High Court judgments + 39,158 Supreme Court judgments + 1,807 statute sections + 6,315 judge profiles + 1,188 Citator treatment verdicts.

Questions? Contact contact@courtnetra.com.