When building web applications and services in Go, it‘s important to handle failed HTTP requests gracefully. Whether it‘s due to network issues, server outages, or rate limiting, your application needs to be resilient and able to recover from errors. One common technique is to retry the failed request, usually with some form of backoff.
In this guide, we‘ll take an in-depth look at how to implement retries for failed HTTP requests in Go. We‘ll cover the different types of errors you may encounter, how to use exponential backoff, setting a maximum number of retries, logging and monitoring, and some best practices to keep in mind. By the end, you‘ll have a solid understanding of how to make your Go applications more fault-tolerant.
Handling Different Types of Errors
The first step is distinguishing between the different types of errors that can occur when making an HTTP request:
Network errors – These include things like "no such host", "connection refused", or "network is unreachable". They indicate an issue with establishing a connection to the server.
Timeouts – If the server takes too long to respond, the client may time out the request. You can configure timeouts on the client side.
Server errors (5xx status codes) – A 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, etc. This means the server encountered an error in processing the request.
Rate limiting – Some APIs limit the number of requests a client can make in a certain period of time. Exceeding the limit may result in a 429 Too Many Requests error.
How you handle a failed request depends on the type of error. Server errors and timeouts are often good candidates for a retry, since the error may be temporary. Rate limiting errors may require more complex handling, like using an exponential backoff.
Exponential Backoff
Exponential backoff is an algorithm that progressively increases the wait time between retries. The idea is to give the system time to recover from the failure before sending another request. A simple exponential backoff implementation starts with an initial wait time, then doubles it for each subsequent retry.
Here‘s an example in Go:
func retry(fn func() error, maxRetries int) error {
var attempts int
for {
err := fn()
if err == nil {
return nil
}
attempts++
if attempts >= maxRetries {
return err
}
time.Sleep(time.Duration(math.Pow(2, float64(attempts))) * time.Second)
}
}
This retry
function takes another function fn
, which performs the actual HTTP request. It calls fn
and if it returns an error, it sleeps for a duration that doubles with each attempt, then tries again. If the number of attempts exceeds maxRetries
, it gives up and returns the last error.
You‘d use it like this:
err := retry(func() error {
resp, err := http.Get("https://api.example.com/data")
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("got status %d", resp.StatusCode)
}
return nil
}, 5)
Setting a Maximum Number of Retries
It‘s crucial to set a maximum number of retries to avoid infinitely retrying a permanently failed request. How many retries you should allow depends on your specific use case and requirements around availability vs cost/resources.
For user-facing requests where responsiveness is important, you may want a lower number of retries to fail fast. For background or batch processing jobs, you might allow more retries since a delay is more acceptable. Consider how a retry storm could affect your system resources.
A common pattern is to use an exponential backoff up to a maximum wait time, then retry at that fixed interval. For example:
func retry(fn func() error, maxRetries int, maxWait time.Duration) error {
var attempts int
for {
err := fn()
if err == nil {
return nil
}
attempts++
if attempts >= maxRetries {
return err
}
backoff := time.Duration(math.Pow(2, float64(attempts))) * time.Second
if backoff > maxWait {
backoff = maxWait
}
time.Sleep(backoff)
}
}
Logging and Monitoring
When implementing retries, logging is your friend. You should log each failed attempt, including metadata like the attempt number, wait time, error, and request details. This will help with debugging and understanding your system‘s behavior.
func retry(fn func() error, maxRetries int) error {
var attempts int
for {
err := fn()
if err == nil {
return nil
}
attempts++
log.Printf("Attempt %d failed: %s", attempts, err)
if attempts >= maxRetries {
return fmt.Errorf("max retries exceeded: %s", err)
}
time.Sleep(time.Duration(math.Pow(2, float64(attempts))) * time.Second)
}
}
In addition to logging, you should set up monitoring and alerts. Track metrics like the number of failed requests, number of retries, and total time spent retrying. Alert on sudden spikes in failures or if the retry rate exceeds a threshold. This can indicate issues with a downstream dependency that may require further investigation.
Alternatives to Retries
While retries are a valuable tool, they‘re not always the best solution. Here are some alternatives to consider:
Circuit breakers – A circuit breaker "opens" when failures exceed a threshold, preventing further requests to the failing service for a period of time. This gives the system time to recover and can help prevent cascading failures.
Fallbacks – If a request fails, you may be able to provide a fallback response instead. For example, if a recommendation service is down, you could return a default set of recommendations.
Returning cached data – If you have cached data available, you could return that instead of making a new request that‘s likely to fail.
Best Practices
Here are some best practices to keep in mind when implementing retries:
Idempotency – Retrying a request should be safe and not cause any unintended side effects. Ensure your requests are idempotent, meaning they can be safely repeated without changing the result.
Jitter – To avoid many clients retrying simultaneously, add some randomness to the backoff duration. This spreads out the retries and can help prevent spikes in traffic.
Client-side throttling – Track the rate of failures and if it exceeds a threshold, proactively start throttling requests on the client side. This can help avoid overloading a struggling service.
Conclusion
Retrying failed HTTP requests is an important technique for building resilient Go applications. By handling different types of errors, using exponential backoff, setting a maximum number of retries, logging and monitoring, and following best practices, you can greatly improve your application‘s fault tolerance.
However, retries are not a silver bullet. They should be used judiciously and in combination with other resiliency techniques like circuit breakers, fallbacks, and caching. The key is to strike a balance that maximizes availability while minimizing resource usage and potential downtime.
By taking the time to implement retries thoughtfully and holistically, you can create Go applications that are reliable, responsive, and able to weather the inevitable failures and outages that are part of running systems in the real world.