Smart retries with Stripe
|

Implementing Smart Retries with Stripe

Network and connection issues can occur for various reasons, such as timeouts or intermittent server availability. Automatic recovery from network issues plays an important role when integrating with any API. When dealing with financial transactions, it’s vital to handle network issues and other errors gracefully. Stripe recommends implementing smart retry logic, especially for payment declines. In this blog post we’ll build reliable retries with exponential backoff and look at how to prevent charging a customer more than once.

This article is part of a comprehensive series. Be sure to check out the previous post that helps you understand the most common errors and exceptions when handling payments with Stripe.

Disclaimer: I am not affiliated with Stripe. All insights shared in this article are based on my personal experience and opinions.

Table of Contents

Exponential Backoff with Jitter

Suppose you’ve just tried to charge for a purchase and there was a network glitch. Instead of giving up and throwing an error it’s better to retry with a short delay up to a maximum number of attempts. The wait time between these attempts increases exponentially, with an added random jitter to avoid overloading the server.

Here’s an example implementation. To promote good coding practices and create a setup for easy testing let’s have three separate components:

  • A reusable wrapper that allows any business logic be retried with a reasonable number of attempts.
  • A client that communicates with Stripe via an SDK
  • A service that leverages the client and translates Stripe responses into custom model objects.

Creating a Retry Wrapper

The wrapper takes a function as a parameter and executes in in a safe manner. Should the execution fail, the wrapper retries with a short delay number of times. Eventually the execution either succeeds, or fails with a runtime exception. The retry strategy begins with a brief pause, progressively extending the wait period before each subsequent attempt, optimizing the chance for any intermittent problems to resolve naturally.

public <T> T withRetry(
  int maxRetries, 
  int initialDelay, 
  StripeSupplier<T> supplier
) {
  int attempt = 0;
  try (var scheduler = Executors.newScheduledThreadPool(1)) {
    while (true) {
      try {
        return supplier.get();
      } catch (StripeException e) {
        if (++attempt >= maxRetries) {
          throw new RuntimeException(
            "Operation failed after " + attempt + " attempts"
          );
        }
        long waitTime = calculateDelay(initialDelay, 2.0, attempt, 0.1);
        try {
          scheduler.schedule(() -> 
            System.out.println("Waiting for " + waitTime + " ms"),
            waitTime, TimeUnit.MILLISECONDS
          );
        } catch (RejectedExecutionException ex) {
          throw new RuntimeException("Retry attempt failed", ex);
        }
      }
    }
  }
}

Instead of using Thread.sleep(), I prefer to schedule the task to be executed after a certain delay via ScheduledExecutorService. This is more efficient and provides a more flexible way of managing multiple tasks.

Implementing Exponential Backoff

There are several variables at play:

  • initial delay: The delay for the first retry in milliseconds.
  • factor: The exponential factor by which the delay is multiplied after each attempt.
  • attempt: It ensures a progressively increasing delay.
  • jitter: A factor (between 0 and 1) to calculate the range for the random part of the delay.

Implementation:

private long calculateDelay(
  long initialDelay, 
  double factor, 
  int attempt, 
  double jitter
) {
  // Calculate exponential delay
  long delay = (long) (initialDelay * Math.pow(factor, attempt - 1));

  // Apply jitter
  long jitterValue = (long) (
    delay * jitter * (random.nextDouble() - 0.5) * 2
  );
  return Math.max(0, delay + jitterValue);
}

This method calculates the delay based on the initial delay, the exponential factor, the current attempt count, and the jitter to introduce variability.

The Math.pow(factor, attempt - 1) part calculates the delay’s exponential growth based on the current attempt (not counting the first try, hence attempt - 1).
The jitter calculation has been adjusted with (random.nextDouble() - 0.5) * 2 to produce a range from -1 to 1 for the jitter factor. This adds or subtracts up to a certain percentage of the original delay time, offering a variable wait time around the exponentially growing delay.


This approach offers more flexibility in adjusting how the retry mechanism responds to consecutive failures, allowing for both aggressive and conservative retry strategies based on the context and specific needs of your application.

Testing

The retry wrapper entails quite a bit of complex logic. It’s a good idea to write automated tests to ensure that everything works as intended. I’ll skip implementation details for brevity, but you are welcome to review the tests in my GitHub repository for a deeper understanding.

Here’s an example to prove that delays progressively increase, with an added element of randomness.

Waiting for 101 ms
Waiting for 207 ms
Waiting for 404 ms
Waiting for 728 ms

Ensuring Payments Happen “Exactly Once”

Stripe ensures that if a request to create a payment is made multiple times due to network hiccups or other failures, the charge will only be performed once. This prevents duplicate charges which could occur if an API call is inadvertently made multiple times either by retries or due to a mistake.

Since shielding against duplicates makes sense only in specific cases, Stripe’s API requests aren’t idempotent by default. To use this feature in the Java SDK, you simply include an idempotent key in the headers of your request. Here’s an example of how you could do this when creating a payment.

var idempotencyKey = UUID.randomUUID().toString();
var options = new RequestOptions.RequestOptionsBuilder()
                .setIdempotencyKey(idempotencyKey)
                .build();

An idempotency key must be unique, and the Stripe SDK automatically passes it to the Idempotency-Key header on your behalf. However, it’s crucial to be cautious with retries! You may be inclined to generate a new key immediately before each payment request as shown in the example above. This approach, however, becomes problematic during retries. If fresh keys are generated within a retry loop, each attempt is treated as a separate legitimate charge attempt due to the uniqueness of the keys. Therefore, it’s essential that you generate the key upfront and pass its value as a parameter to the retry loop. By doing so, Stripe can reliably identify duplicated requests and ensure exactly once semantics.

Handling Declined Payments

Efficient handling of declined payments is crucial for maintaining a good user experience on any e-commerce platform or online payment scenario. Stripe classifies the most common declines and makes them available in their API as enumerated codes.

Payment declines are unrecoverable exceptions, hence we need to discern them from other errors and exit the retry loop.

try {
  return supplier.get();
} catch (CardException e) {
  // No point in retrying if the error is due to the card
  // Capture as many or as few card errors as needed
  switch (e.getCode()) {
    case "approve_with_id":
      throw new RuntimeException("The payment can’t be authorised.", e);
    case "expired_card":
      throw new RuntimeException("The card has expired.", e);
    case "card_not_supported":
      throw new RuntimeException(
        "The card does not support this type of purchase.", e
      );
    default:
      throw new RuntimeException("Unexpected card error", e);
  }
}

Summary

In this blog post we’ve delved into critical aspects of managing payments reliably with Stripe. Intermittent network issues pose a threat that can jeopardise user experience and, more importantly, undermine your customer’s trust in case of outstanding charges. We’ve learned how to mitigate this risk with smart retry strategies and idempotent requests. We’ve also looked at when it makes sense to exit the retry loop, a card decline is a good reason to do so. Additionally, we explored how to identify various reasons for declines and respond accordingly.

We’ve made it almost to the end of our journey of learning Stripe. Next time we will conclude the series by expanding on best practices and scalability. Thank you for following along.

The source code included in this post is available on GitHub. Happy coding!

Similar Posts

One Comment

  1. Can you explain how using ScheduledExecutorService, instead of Thread.sleep, here can make delays in executing the successive business function calls (i.e. supplier.get())?

Comments are closed.