ADR-0003. Async issuance over 3-partition BullMQ with deterministic order hashing
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-04-15 |
| Deciders | invoice-team |
| Supersedes | — |
Context
- Provider issuance is I/O-bound (HTTP to VNIS/VNPAY/T-VAN) and can fail transiently — it must not block the Kafka consumer or the REST request.
- The same order must never be issued concurrently across workers (double-issuance risk).
- Throughput must scale horizontally while preserving per-order serialization.
- Retries need bounded backoff; permanent (4xx) errors must not retry.
Decision
We will issue invoices asynchronously through BullMQ with 3 partitions per queue type (issuance, claim-expiry). An order is routed to a partition by getPartitionByKey(orderId) — a deterministic Java-hashCode mod 3 — so the same order always lands on the same partition. Issuance jobs use jobId = orderId for idempotency.
Retry policy comes from InvoiceProviderConfig.retryMetadata (default maxRetryCount = 3, retryDelayMinutes = [5, 15, 60]). Permanent 4xx (≠429) errors short-circuit to FAILED; exhausted/DLQ jobs flip the invoice to FAILED and write an audit row. Issuance worker concurrency is APP_ENV_INVOICE_ISSUANCE_WORKER_CONCURRENCY (default 10); claim-expiry is fixed at 3.
Consequences
| Pros | Cons |
|---|---|
| Per-order serialization without global locks | Partition count (3) is a fixed constant |
| Horizontal scale via concurrency knob | Rebalancing partitions later changes order→partition mapping |
| Bounded backoff; permanent errors fail fast | Retry state lives on the invoice row (retryCount, metadata) |
| Claim-expiry as delayed jobs (no polling) | DLQ handling is bespoke per worker type |
Alternatives Considered
| Option | Pros | Cons | Why rejected |
|---|---|---|---|
| Synchronous issuance in the Kafka handler | Simplest | Blocks consumer; no retry isolation | Transient provider failures stall the pipeline |
| Single (unpartitioned) queue | Simple routing | No per-order affinity; concurrency risks double-issue | Loses serialization guarantee |
| Distributed lock per order | Explicit mutual exclusion | Lock contention + leak risk under failures | Partition hashing achieves it for free |
| Cron-poll for pending invoices only | No queue infra | High latency for REAL_TIME mode | Kept only for SCHEDULED mode |
References
src/common/queues.ts(InvoiceQueuePartitions,getPartitionByKey, definitions)src/components/invoice-queue/component.ts(partitioned queues/workers, DLQ)src/services/invoice-issuance-queue.service.ts(enqueueIssuance,_handleIssuanceFailure)- See also: API Events §3