Infinite loops can bring any Node.js application grinding to a halt. One minute your service is humming along, and the next CPU is spiked at 100% and requests are timing out. Every developer dreads the infinite loop crashing production.
In this comprehensive guide, you‘ll learn how to debug these tricky issues by:
- Attaching a debugger to print stack traces of running Node processes
- Using Kubernetes liveness probes to automatically trigger debugging
- Persisting stack traces by mounting networked storage
Follow along and you‘ll gain the skills to diagnose and resolve even the most stubborn infinite loops in your Node.js production environment.
The Nightmare: An Infinite Loop Takes Down Production
Recently our team ran into an infinite loop causing chaos in our production environment.
We were running a Node.js express app in stateless Docker containers orchestrated by Kubernetes. Traffic flowed through an NGINX ingress.
Internet
|
|------------|
| NGINX LB |
|------------|
|
[PODs]
/ \
Pod 1 Pod 2
App 1 App 2
Our production architecture before the incident
Suddenly we noticed load balancer latency spiking. The app was still responding to health checks, but consumer requests were timing out.
Checking the pods, one container‘s CPU usage had peaked at 100%. An infinite loop!
| Looping |
| Pod 1 |
| <CPU 100%> |
| |
An infinite loop crashed Pod 1
Our usual debugging tools like --prof
and --cpu-prof
weren‘t enough – the loop was caused non-deterministically, so we needed to dig deeper.
We had to find a way to breakpoint the running container and print the stack trace at the exact place the loop occurs.
Debugging with GDB
The solution turned out to be the GNU debugger (GDB). By attaching it to our Node process, we could:
- Breakpoint the running JavaScript code
- Print a stack trace for diagnosing the loop
Here‘s how to use GDB to debug Node.js in production:
1. Install GDB
First, install gdb
in your application Dockerfile:
RUN apt-get update && apt-get install -y gdb
This bundles it into your app container image.
2. Attach GDB to Your Node Process
Next, exec into the running container and attach gdb
:
docker exec -it <container> bash
gdb -p <PID>
Replace <container>
with your container ID or name, and <PID>
with your Node process ID.
Attaching opens an interactive GDB session connected to your live Node process.
3. Set a Breakpoint at the Stack Overflow
Within GDB, set a breakpoint to pause execution at the V8 stack overflow:
break v8::Isolate::HandleStackOverflow
continue
This stops the Node process right when the infinite recursion triggers a stack overflow.
4. Print the Stack Trace
With execution paused, print the stack trace to see where in your code the loop originates:
bt
The backtrace will show the call stack leading to the infinite loop.
For example, if your code has:
function leakMemory() {
while(true) {
// Infinite loop!
}
}
leakMemory();
Your GDB session would capture:
#0 loop (cpu=100%)
#1 leakMemory()
#2 main()
Now you‘ve pinpointed exactly where the loop is coming from!
A Quick GDB Example
Let‘s walk through a simple hands-on example:
-
Create an infinite loop in Node:
// index.js const infiniteLoop = () => { while(true) {} } infiniteLoop();
-
Run it in Docker:
docker build -t debug-demo . docker run -it debug-demo
-
Attach GDB in another terminal:
docker ps # Get container ID docker exec -it <ID> bash gdb -p <PID> break v8::Isolate::HandleStackOverflow continue bt # Print stack trace
Once you breakpoint and bt
, GDB will print something like:
#0 v8::Isolate::HandleStackOverflow
#1 infiniteLoop() at index.js:4
#2 main() at index.js:8
GDB pauses the execution and prints the call stack, showing infiniteLoop
is causing the issue.
Now you have hands-on experience with debugging Node.js processes with GDB! Let‘s look at how to apply these techniques at scale in Kubernetes.
Leveraging Liveness Probes
With GDB we can breakpoint single Node processes. But how do you automatically debug across a cluster of containers?
The answer lies in Kubernetes liveness probes. By combining liveness checks with our GDB script, we can auto-debug infinite loops across pods.
Here‘s how to set it up:
1. Understanding Liveness Probes
Kubernetes uses liveness probes to track container health. For example:
livenessProbe:
httpGet:
path: /healthz
port: 3000
periodSeconds: 5 # Check every 5 sec
failureThreshold: 4 # Allow 3 failures
This does an HTTP GET /healthz
every 5 seconds, allowing 3 failures before restarting the pod.
When the 4th failure occurs, Kubernetes considers the pod "unhealthy" and kills it.
2. Using Probe Failures to Trigger GDB
We can use these failure events to automatically run GDB!
For example, create a wrapper script for your probe:
#!/bin/bash
# Do normal health check
response=$(curl -s http://localhost:3000/healthz)
if [ $? != 0 ]; then
failures=$((failures+1))
if [ "$failures" -ge 4 ]; then
# Attach GDB on 4th failure
gdb -p <PID> -x /gdbscript.gdb
fi
fi
- Do the normal health check
- If it fails, increment a counter
- On 4 failures, run the GDB script
Now liveness probe failures trigger gdb to attach and debug your app!
3. Updating the Liveness Probe Definition
Point your livenessProbe
to this wrapper script:
livenessProbe:
exec:
command:
- /bin/bash
- /wrapperscript.sh
# Other probe parameters
Now Kubernetes will run your script containing the GDB debugging logic.
Whenever the loop causes failures, GDB launches automatically!
Full Example
Here‘s a full example livenessProbe
wrapping GDB debugging:
gdbscript.gdb:
break v8::Isolate::HandleStackOverflow
continue
bt
liveness.sh:
#!/bin/bash
response=$(curl -s http://localhost:3000/healthz || exit 1)
if [ $? != 0 ]; then
failures=$((failures+1))
if [ "$failures" -ge 4 ]; then
gdb -p <PID> /gdbscript.gdb
fi
fi
deployment.yaml:
livenessProbe:
exec:
command:
- /bin/bash
- /liveness.sh
periodSeconds: 5
failureThreshold: 4
Now your liveness probe automatically debugs any infinite loops!
Persisting Stack Traces
With the above setup, if your app enters an infinite loop the liveness probe will trigger GDB and print a stack trace.
But because the pod is killed shortly after, the stack trace output is lost when the container restarts.
To persist the traces, we need to store them externally using networked storage. Kubernetes offers some great options for attaching shared volumes to save your data.
Using PersistentVolumes
A PersistentVolume (PV) provides networked storage for Kubernetes pods. Some options include:
- EBS Volumes (Elastic Block Store)
- EFS (Elastic File System)
- iSCSI
- NFS (Network File System)
For example, to mount an EFS volume:
- Create the EFS filesystem
- Create a PV and PersistentVolumeClaim (PVC)
- Mount the PVC into your pod
apiVersion: v1
kind: PersistentVolume
metadata:
name: gdb-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
storageClassName: efs
csi:
driver: efs.csi.aws.com
volumeHandle: <file-system-id>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gdb-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: efs
resources:
requests:
storage: 1Gi
---
# Pod spec
volumes:
- name: debugger-storage
persistentVolumeClaim:
claimName: gdb-pvc
Now your pod can mount the PV for external storage!
Saving Stack Traces
With networked storage attached, you can save your GDB traces:
# liveness.sh
gdb -p <PID> /gdbscript.gdb | tee /debugger/stacktrace.txt
This pipes the GDB output to an external stacktrace.txt
.
On the next failure, your liveness probe will:
- Attach GDB and print the stack
- Save the trace to your persistent volume
So even when the pod restarts, you keep the debugging data!
Real-World Example
Let‘s walk through a real-world example of how we used these techniques:
Our Kubernetes cluster was hosted on AWS, so we used EFS volumes for storage.
Dockerfile
RUN apt-get update && apt-get install -y gdb
liveness.sh
#!/bin/bash
response=$(curl -s http://localhost:3000/healthz || exit 1)
if [ $? != 0 ]; then
# Increment failure counter
...
# On 4th failure, attach GDB
if [ "$failures" -ge 4 ]; then
gdb -p <PID> /gdbscript.gdb | tee /efs/stacktrace.txt
fi
fi
deployment.yaml
// Container spec
volumeMounts:
- name: debugger-efs
mountPath: /efs
// Pod spec
volumes:
- name: debugger-efs
persistentVolumeClaim:
claimName: efs-pvc # EFS storage
// Liveness probe
livenessProbe:
exec:
command:
- /bin/bash
- /liveness.sh
failureThreshold: 4
This gave us automated debugging powered by liveness checks. Whenever the loop caused failures, it would:
- Trigger GDB with a stack trace
- Pipe the output to persistent EFS storage
We eventually found the culprit was a buggy library import. After patching the dependency, our infinite loop nightmare was finally solved!
Key Takeaways
Here are the key techniques for debugging infinite loops in Node.js production:
- Use GDB to breakpoint and print stack traces of running Node processes
- Leverage liveness probes to automatically trigger GDB on failure
- Persist data by piping traces to networked volumes like EFS
While the examples use Docker and Kubernetes, the same principles apply to any production environment:
- Attach a debugger like GDB to pause and inspect
- Automate debugging with health checks
- Externalize data so it persists across restarts
With these strategies, you can quickly diagnose and resolve even the most painful infinite loops. No more crashes in production!
I hope these techniques help you debug tricky infinite loops. Let me know in the comments if you have any other tips or tricks!