Real-World Distributed Transactions & Saga Pattern Scenarios (Java Edition)

This is a step-by-step guide that teaches distributed transactions and Sagas the way a senior architect would explain to a new engineer. We keep the language simple and show concrete Java code for every idea.

1️⃣ Payment succeeded, inventory failed — how to keep data consistent?

Problem (plain words)

Order → Payment → Inventory. Payment captured successfully. Inventory reservation failed. We must not leave the system in a half-done state.

Why Saga, not a global transaction?

Global (XA/2PC) locks are fragile and slow in distributed systems.
Sagas use local transactions per service and “undo” steps (compensations) if something downstream fails.

How it works (orchestrated Saga)

Create order as PENDING.
Capture payment.
Reserve inventory.
If inventory fails, refund payment and cancel order.

Orchestrator state (mental model)

PENDING → PAID → RESERVED → CONFIRMED ↘ (if fail) REFUNDING → CANCELLED

Java code — Order Orchestrator (Spring Boot style)

@Service
@RequiredArgsConstructor
public class OrderSagaOrchestrator {
  private final PaymentClient paymentClient;
  private final InventoryClient inventoryClient;
  private final OrderRepository orderRepo;
  private final SagaStepLogRepository stepLogRepo;

  @Transactional
  public void startSaga(UUID orderId) {
    Order order = orderRepo.findById(orderId).orElseThrow();
    logStep(orderId, "CreateOrder", "DONE");

    try {
      paymentClient.capture(orderId, order.getAmount());
      logStep(orderId, "ProcessPayment", "DONE");

      inventoryClient.reserve(orderId, order.getItems());
      logStep(orderId, "ReserveInventory", "DONE");

      order.markConfirmed();
      orderRepo.save(order);
    } catch (Exception ex) {
      compensate(order);
    }
  }

  @Transactional
  protected void compensate(Order order) {
    UUID orderId = order.getId();
    if (wasDone(orderId, "ProcessPayment") && !wasDone(orderId, "RefundPayment")) {
      paymentClient.refund(orderId);
      logStep(orderId, "RefundPayment", "COMPENSATED");
    }
    order.markCancelled("Inventory failed");
    orderRepo.save(order);
  }

  private boolean wasDone(UUID orderId, String step) {
    return stepLogRepo.existsByOrderIdAndStepAndStatus(orderId, step, "DONE");
  }

  private void logStep(UUID orderId, String step, String status) {
    stepLogRepo.save(new SagaStepLog(orderId, step, status));
  }
}

Saga step log table (tracks progress)

CREATE TABLE saga_step_log (
  id UUID PRIMARY KEY,
  order_id UUID NOT NULL,
  step TEXT NOT NULL,
  status TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT now()
);

2️⃣ Kafka event published twice — consumers acted twice. What now?

Problem

At-least-once delivery can produce duplicates. We must make the consumer idempotent.

Approach

Each message has a stable eventId (UUID).
Consumer stores processed IDs in a table (or cache) and skips duplicates.

Java code — Idempotent Consumer (Spring Kafka)

@Component
@RequiredArgsConstructor
public class InventoryConsumer {
  private final ProcessedEventRepository processedRepo;
  private final InventoryService inventoryService;

  @KafkaListener(topics = "payment-events", groupId = "inventory")
  public void onPaymentCaptured(PaymentCapturedEvent evt) {
    if (processedRepo.existsByEventId(evt.getEventId())) return; // duplicate

    try {
      inventoryService.reserve(evt.getOrderId(), evt.getItems());
      processedRepo.save(new ProcessedEvent(evt.getEventId()));
    } catch (Exception e) {
      throw e; // let retry/DLQ handle
    }
  }
}

Transactional Outbox (producer side)

Write DB change and outbox row in the same transaction. A CDC tool publishes to Kafka.

@Entity
@Table(name = "outbox")
public class OutboxEvent {
  @Id UUID eventId;
  String aggregateId; // orderId or paymentId
  String type;        // e.g., PaymentCaptured
  @Lob String payload; // JSON
  Instant createdAt;
}

3️⃣ Three services, one business transaction — 2PC or Saga?

Prefer Saga. 2PC hurts availability and performance.
Use orchestration if you need clear control and monitoring.
Use choreography if steps are simple and loosely coupled (each service reacts to events).

Java code — Choreography example (events only)

// Payment publishes PaymentCaptured → Inventory listens and reserves → Order listens and confirms

4️⃣ Rollback step times out — how to reach a consistent end state?

Strategy

Retry with backoff and jitter.
After N failures, send to DLQ.
Keep Saga step status so operators can replay compensations from DLQ safely.

Java code — Retry and DLQ (Spring Kafka)

@Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaFactory(
    ConsumerFactory<String, String> cf, KafkaTemplate<String, String> template) {
  var factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
  factory.setConsumerFactory(cf);
  factory.setCommonErrorHandler(new DefaultErrorHandler(
      new DeadLetterPublishingRecoverer(template),
      new ExponentialBackOffWithMaxRetries(3)
  ));
  return factory;
}

5️⃣ Partial failure made data inconsistent — how to detect and fix?

Detect

Nightly job that compares authoritative sources (e.g., Orders vs Payments vs Inventory).
Audit log or event store to trace truth.

Fix

Produce corrective events (e.g., OrderCancelled, RefundPayment).
For hard cases, provide a small admin tool to mark and replay compensations.

Java code — Simple reconciliation job

@Component
@RequiredArgsConstructor
public class ReconciliationJob {
  private final OrderRepository orderRepo;
  private final PaymentRepository paymentRepo;
  private final InventoryRepository inventoryRepo;

  @Scheduled(cron = "0 0 * * * *")
  public void reconcile() {
    var orders = orderRepo.findAllPendingOlderThan(Duration.ofHours(1));
    for (Order o : orders) {
      boolean paid = paymentRepo.existsCaptured(o.getId());
      boolean reserved = inventoryRepo.existsReserved(o.getId());
      if (paid && !reserved) {
        // trigger compensation: publish event or call orchestrator to refund
      }
    }
  }
}

6️⃣ Orchestrator is a single point of failure — how to mitigate?

Run orchestrator instances behind a queue; each Saga instance is owned by one worker (sharding by sagaId).
Persist Saga state in DB so another instance can continue if one dies.
Support message replay; handlers must be idempotent.

Java code — Persisted Saga state (simplified)

@Entity
public class SagaInstance {
  @Id UUID sagaId;
  UUID orderId;
  String state; // PENDING, PAID, RESERVED, CONFIRMED, CANCELLED
  Instant updatedAt;
}

7️⃣ “Exactly once” across services — what is realistic?

Aim for “effectively once”.

Producer: Outbox + CDC.
Consumer: Idempotency + dedup table.
Optional: Store consumed offset with your write in one local DB transaction.

Java code — Store offset with business write

@Transactional
public void handle(Event evt) {
  if (offsetRepo.exists(evt.getPartition(), evt.getOffset())) return;
  businessWrite(evt);
  offsetRepo.save(new ConsumedOffset(evt.getPartition(), evt.getOffset()));
}

Extra scenario prompts (practice)

Compensation arrives out of order — enforce sequence numbers per aggregate and detect gaps.
Non-compensatable step (email sent) — use forward-only Saga; send correction email if needed.
Hot partition causes retry storms — add circuit breakers, rate limits, and per-key backoff.
Replay overwhelms downstream — throttle replay, separate catch-up queues.

End-to-end Java example (mini blueprint)

1) Outbox entity and save helper

@Entity
@Table(name = "outbox")
public class OutboxEvent {
  @Id UUID eventId;
  String type;
  String aggregateId;
  @Lob String payload;
  Instant createdAt = Instant.now();
}

@Transactional
public void capturePaymentAndOutbox(UUID orderId, BigDecimal amount) {
  paymentRepo.markCaptured(orderId, amount);
  OutboxEvent evt = new OutboxEvent();
  evt.eventId = UUID.randomUUID();
  evt.type = "PaymentCaptured";
  evt.aggregateId = orderId.toString();
  evt.payload = toJson(Map.of("orderId", orderId.toString()));
  outboxRepo.save(evt);
}

2) CDC publishes to Kafka (Debezium config not shown).

3) Inventory consumer (idempotent)

@KafkaListener(topics = "payment-events", groupId = "inventory")
public void onPaymentCaptured(String json) {
  PaymentCapturedEvent evt = parse(json);
  if (processedRepo.existsByEventId(evt.getEventId())) return;
  inventoryService.reserve(evt.getOrderId(), evt.getItems());
  processedRepo.save(new ProcessedEvent(evt.getEventId()));
}

4) Orchestrator state transitions stored in DB so crash recovery is safe.

If you remember three things: prefer Sagas over global locks, make every step idempotent, and keep an audit trail (outbox + logs). That’s how you build reliable distributed systems.

Real-World Distributed Transactions & Saga Pattern Scenarios