Skip to content

ADR-0001. Kafka-driven async ledger generation via a self-loop topic

FieldValue
StatusAccepted
Date2026-03-30
Decidersledger-team
Supersedes

Context

  • Generating a ledger is slow and bursty: data fetch + Typst PDF render + ExcelJS XLSX render + AES encryption + S3 upload, easily seconds per document, multiplied across a full-year batch.
  • A synchronous HTTP request cannot hold open for that long, and a crashed request would leave partial S3 files and an ambiguous status.
  • Generation must be retriable and survive worker crashes mid-pipeline.

Decision

We will decouple enqueue from execution with a single Kafka topic ledger.generate that the service both produces to (api role) and consumes from (worker role). The HTTP request returns immediately with a LedgerJob in PENDING; the worker executes handleGeneration(ledgerId) and reports progress via WebSocket.

The consumer runs with autocommit: false and commits only after upload + finalize succeed. Job state is a separate LedgerJob machine (PENDING → PROCESSING → COMPLETED|REJECTED) claimed via atomic conditional UPDATE.

Consequences

ProsCons
Fast HTTP response; long render off the request pathEventual consistency — client must poll/subscribe for status
Idempotent enqueue on (merchantId, type, period)A committed message is never auto-replayed; recovery needs explicit retry or the stall sweep
Horizontal scale via consumer count / worker replicasOperators must understand the self-loop (no external producer/consumer)
Crash recovery via RecoveryComponent re-enqueue of stalled jobsSlightly more moving parts than a BullMQ queue

Alternatives Considered

OptionProsConsWhy rejected
Synchronous HTTP generationSimplestLong-held connections, no crash recovery, partial filesUnworkable for batch/large ledgers
BullMQ queueBuilt-in retries/backoffAnother infra surface; Kafka already in the stackReused existing Kafka rather than add Redis-queue semantics
Auto-replay on consume failureSelf-healingRisk of poison-message storms on deterministic parse failuresManual retry + bounded stall sweep is safer

References

  • ledger/src/services/ledger-queue.service.ts (handleEnqueueGeneration)
  • ledger/src/services/ledger-worker.service.ts (handleGeneration)
  • ledger/src/components/kafka.component.ts (consumer autocommit: false)
  • ledger/src/components/recovery.component.ts (stall sweep)
  • Generation Pipeline

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.