HDK: Tamper-Evident LLM Audit Trails

Every time someone uses an LLM through an API, both sides are operating on trust. The user trusts that the provider's logs are accurate. The provider trusts that the user hasn't tampered with their records. Neither side has cryptographic proof of what actually happened. HDK fixes this.

HDK (Hedera Detection Key) is a Python middleware library that sits between your application and any LLM provider. It wraps each API call with three layers of cryptographic provenance: a hierarchical hash chain that encodes the full context of every interaction, a canary system that detects unauthorized access to the middleware itself, and periodic anchoring of Merkle roots on the Hedera Consensus Service — creating an immutable, publicly verifiable record of what was generated, when, and in what context.

The problem

Application logs are operator-controlled. Whoever runs the database can modify or delete records after the fact. This is fine for debugging. It's not fine when LLM outputs become legal evidence, compliance records, or the basis for published reporting.

Existing solutions address pieces of this: VeriLLM verifies that GPU computation was performed correctly. Prove AI tracks training data provenance. EQTY Lab provides hardware-level attestation for AI workflows. But none of them address the communication layer — the actual exchange between a user and a model. That's where HDK operates.

What stays private, what goes public

HDK's first design principle is data private, proof public. The actual content of prompts, responses, and user identities never leaves the user's infrastructure. Only cryptographic hashes — meaningless without the original data — reach the public ledger.

Fig 1 — Data zones. Content stays private. Only cryptographic hashes reach the public ledger. The LLM provider sees exactly what it would see without HDK.

This is the key point: Hedera never sees your prompts, responses, or user data. It receives a 32-byte Merkle root and a handful of metadata fields. You can't reconstruct content from a SHA-256 hash. Anyone can verify that your records haven't been tampered with — without accessing any private data.

How it works

HDK organizes every interaction into a five-level hierarchy: PROJECT → DOCUMENT → CONVERSATION → QUERY → RESPONSE. Each level's hash recursively embeds every ancestor hash above it.

Fig 2 — Hierarchical hash genealogy. Each level embeds all ancestors.

This means the RESPONSE hash isn't just a hash of the response text — it's a cryptographic commitment to the entire context: which project, which document, which conversation, and what prompt produced it. Change any ancestor, and every descendant hash breaks.

The canary

Hash chains detect modification after the fact. But what about unauthorized access that doesn't modify anything — someone reading data through a compromised session, an SSRF attack, or an API exploit?

HDK includes a canary commitment scheme: a secret-bound counter that increments with every legitimate middleware operation. The current commitment is periodically checkpointed to Hedera. If an attacker accesses the middleware, they face a dilemma: read without updating the counter (counter diverges → detected) or update the counter without knowing the secret (commitment diverges → detected). Either way, the next checkpoint reveals the intrusion.

Limitation: The canary operates at the application layer. It detects access through the HDK middleware. It does not detect passive copying at the OS or infrastructure level — classical bits can be duplicated without altering application state. Single-event protection comes from the hash genealogy and Hedera anchoring, not from the canary alone.

Anchoring

Periodically (default: every 1,000 events), HDK computes a Merkle root over all un-anchored events and submits it to the Hedera Consensus Service. Hedera adds an aBFT consensus timestamp, a sequence number, and a running hash — creating an immutable reference point that no single party can alter.

Fig 3 — Anchoring timeline. Each anchor implicitly confirms all predecessors through genealogy chains.

This creates an emergent property we call cascading cross-confirmation: events in Anchor 2 have parent chains reaching into Anchor 1's range. If any event under Anchor 1 were modified after the fact, the parent chains in Anchor 2 would diverge — producing a Merkle root that doesn't match the immutable Hedera record. The longer the system runs, the harder retroactive tampering becomes.

Cost

A single HCS message costs $0.0001. Batching determines what each individual interaction costs:

Batch interval	Cost / interaction	% of LLM cost	Use case
N = 1,000	$0.0000001	0.0001%	Standard
N = 100	$0.000001	0.001%	High-security
N = 1	$0.0001	0.1%	Per-event anchoring

Reference: a typical LLM API call costs $0.01–$0.10. At the default batch size, HDK adds 0.0001% to that.

Integration

from hdk import HDK

hdk = HDK(
    hedera_account="0.0.XXXXX",
    hedera_key="..."
)

# Wraps any LLM provider — same interface, plus audit metadata
response = hdk.complete(
    provider="openai",
    model="gpt-4",
    messages=[{"role": "user", "content": "..."}]
)

# response.audit → node_hash, merkle_root, canary_status, anchor_url
      

Three lines to add cryptographic provenance to any LLM workflow. The response object includes standard LLM output plus audit metadata. For existing systems that can't wrap API calls, a Bridge API hooks into event flows without changing the integration pattern.

Security model

HDK makes explicit what it protects against and what it doesn't.

What works: Any modification to any historical event — at any level of the hierarchy — deterministically breaks the hash chain, and the divergence is detectable against immutable Hedera anchors. The canary detects unauthorized middleware access in real time. Cascading cross-confirmation makes retroactive tampering computationally infeasible after multiple anchor cycles.

What doesn't: The canary is application-layer only — it won't detect passive OS-level data copying. Random sampling audit catches mass modification but not surgical single-event attacks (for those, the deterministic genealogy chain is the defense). HDK assumes at least one of three layers remains uncompromised: user-side data, Hedera anchors, or cryptographic secrets. Full compromise of all three voids all guarantees.

Origin

HDK grew out of work on RE::DACT, an AI-powered platform for investigative journalism where every AI interaction needs to be independently verifiable. Building that system meant thinking hard about what "tamper-evident" actually requires — and realizing that application logs, even encrypted ones, aren't enough when the operator controls the database.

The canary commitment scheme draws conceptual inspiration from an earlier paper on physics-based AGI containment, where unauthorized interaction with a quantum substrate produces measurable state divergence. HDK transposes that principle to classical infrastructure: the canary secret replaces the quantum state, and the commitment counter replaces the purity measure. Different physics, same invariant — unauthorized access through the defined interface produces deterministic, detectable state change.

The formal treatment — definitions, proofs, threat model, cost analysis — is in the full paper.

Read the full paper

HDK: Tamper-Evident Audit Trails for LLM Interactions

The problem

What stays private, what goes public

How it works

The canary

Anchoring

Cost

Integration

Security model

Origin