How to Debug an Infinite Loop in Node.js Production Code

Infinite loops can bring any Node.js application grinding to a halt. One minute your service is humming along, and the next CPU is spiked at 100% and requests are timing out. Every developer dreads the infinite loop crashing production.

In this comprehensive guide, you‘ll learn how to debug these tricky issues by:

Attaching a debugger to print stack traces of running Node processes
Using Kubernetes liveness probes to automatically trigger debugging
Persisting stack traces by mounting networked storage

Follow along and you‘ll gain the skills to diagnose and resolve even the most stubborn infinite loops in your Node.js production environment.

The Nightmare: An Infinite Loop Takes Down Production

Recently our team ran into an infinite loop causing chaos in our production environment.

We were running a Node.js express app in stateless Docker containers orchestrated by Kubernetes. Traffic flowed through an NGINX ingress.

     Internet
        |
|------------|
|  NGINX LB  |
|------------|
          | 
       [PODs]
     /      \
   Pod 1   Pod 2 
   App 1    App 2

Our production architecture before the incident

Suddenly we noticed load balancer latency spiking. The app was still responding to health checks, but consumer requests were timing out.

Checking the pods, one container‘s CPU usage had peaked at 100%. An infinite loop!

| Looping  |  
 |  Pod 1  | 
| <CPU 100%> |   
|           |

An infinite loop crashed Pod 1

Our usual debugging tools like --prof and --cpu-prof weren‘t enough – the loop was caused non-deterministically, so we needed to dig deeper.

We had to find a way to breakpoint the running container and print the stack trace at the exact place the loop occurs.

Debugging with GDB

The solution turned out to be the GNU debugger (GDB). By attaching it to our Node process, we could:

Breakpoint the running JavaScript code
Print a stack trace for diagnosing the loop

Here‘s how to use GDB to debug Node.js in production:

1. Install GDB

First, install gdb in your application Dockerfile:

RUN apt-get update && apt-get install -y gdb

This bundles it into your app container image.

2. Attach GDB to Your Node Process

Next, exec into the running container and attach gdb:

docker exec -it <container> bash
gdb -p <PID>

Replace <container> with your container ID or name, and <PID> with your Node process ID.

Attaching opens an interactive GDB session connected to your live Node process.

3. Set a Breakpoint at the Stack Overflow

Within GDB, set a breakpoint to pause execution at the V8 stack overflow:

break v8::Isolate::HandleStackOverflow
continue

This stops the Node process right when the infinite recursion triggers a stack overflow.

4. Print the Stack Trace

With execution paused, print the stack trace to see where in your code the loop originates:

bt

The backtrace will show the call stack leading to the infinite loop.

For example, if your code has:

function leakMemory() {
  while(true) {
    // Infinite loop!
  }
}

leakMemory();

Your GDB session would capture:

#0 loop (cpu=100%) 
#1 leakMemory()
#2 main()

Now you‘ve pinpointed exactly where the loop is coming from!

A Quick GDB Example

Let‘s walk through a simple hands-on example:

Create an infinite loop in Node:

 // index.js

 const infiniteLoop = () => {
   while(true) {}
 }

 infiniteLoop();

Run it in Docker:

 docker build -t debug-demo .
 docker run -it debug-demo

Attach GDB in another terminal:

 docker ps # Get container ID
 docker exec -it <ID> bash

 gdb -p <PID>
 break v8::Isolate::HandleStackOverflow
 continue 

 bt # Print stack trace

Once you breakpoint and bt, GDB will print something like:

#0 v8::Isolate::HandleStackOverflow 
#1 infiniteLoop() at index.js:4
#2 main() at index.js:8

GDB pauses the execution and prints the call stack, showing infiniteLoop is causing the issue.

Now you have hands-on experience with debugging Node.js processes with GDB! Let‘s look at how to apply these techniques at scale in Kubernetes.

Leveraging Liveness Probes

With GDB we can breakpoint single Node processes. But how do you automatically debug across a cluster of containers?

The answer lies in Kubernetes liveness probes. By combining liveness checks with our GDB script, we can auto-debug infinite loops across pods.

Here‘s how to set it up:

1. Understanding Liveness Probes

Kubernetes uses liveness probes to track container health. For example:

livenessProbe:
  httpGet:
    path: /healthz
    port: 3000
  periodSeconds: 5 # Check every 5 sec
  failureThreshold: 4 # Allow 3 failures

This does an HTTP GET /healthz every 5 seconds, allowing 3 failures before restarting the pod.

When the 4th failure occurs, Kubernetes considers the pod "unhealthy" and kills it.

2. Using Probe Failures to Trigger GDB

We can use these failure events to automatically run GDB!

For example, create a wrapper script for your probe:

#!/bin/bash

# Do normal health check 
response=$(curl -s http://localhost:3000/healthz)

if [ $? != 0 ]; then
  failures=$((failures+1))

  if [ "$failures" -ge 4 ]; then
     # Attach GDB on 4th failure
     gdb -p <PID> -x /gdbscript.gdb
  fi
fi

Do the normal health check
If it fails, increment a counter
On 4 failures, run the GDB script

Now liveness probe failures trigger gdb to attach and debug your app!

3. Updating the Liveness Probe Definition

Point your livenessProbe to this wrapper script:

livenessProbe:
  exec:
    command:
      - /bin/bash
      - /wrapperscript.sh 
  # Other probe parameters

Now Kubernetes will run your script containing the GDB debugging logic.

Whenever the loop causes failures, GDB launches automatically!

Full Example

Here‘s a full example livenessProbe wrapping GDB debugging:

gdbscript.gdb:

break v8::Isolate::HandleStackOverflow
continue
bt

liveness.sh:

#!/bin/bash

response=$(curl -s http://localhost:3000/healthz || exit 1)

if [ $? != 0 ]; then
  failures=$((failures+1))

  if [ "$failures" -ge 4 ]; then  
     gdb -p <PID> /gdbscript.gdb
  fi
fi

deployment.yaml:

livenessProbe:
  exec:
    command:
      - /bin/bash
      - /liveness.sh
  periodSeconds: 5
  failureThreshold: 4

Now your liveness probe automatically debugs any infinite loops!

Persisting Stack Traces

With the above setup, if your app enters an infinite loop the liveness probe will trigger GDB and print a stack trace.

But because the pod is killed shortly after, the stack trace output is lost when the container restarts.

To persist the traces, we need to store them externally using networked storage. Kubernetes offers some great options for attaching shared volumes to save your data.

Using PersistentVolumes

A PersistentVolume (PV) provides networked storage for Kubernetes pods. Some options include:

EBS Volumes (Elastic Block Store)
EFS (Elastic File System)
iSCSI
NFS (Network File System)

For example, to mount an EFS volume:

Create the EFS filesystem
Create a PV and PersistentVolumeClaim (PVC)
Mount the PVC into your pod

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gdb-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  storageClassName: efs
  csi:
    driver: efs.csi.aws.com
    volumeHandle: <file-system-id>

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gdb-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs 
  resources:
    requests:
      storage: 1Gi

---  

# Pod spec
volumes:
  - name: debugger-storage
    persistentVolumeClaim:
      claimName: gdb-pvc

Now your pod can mount the PV for external storage!

Saving Stack Traces

With networked storage attached, you can save your GDB traces:

# liveness.sh

gdb -p <PID> /gdbscript.gdb | tee /debugger/stacktrace.txt

This pipes the GDB output to an external stacktrace.txt.

On the next failure, your liveness probe will:

Attach GDB and print the stack
Save the trace to your persistent volume

So even when the pod restarts, you keep the debugging data!

Real-World Example

Let‘s walk through a real-world example of how we used these techniques:

Our Kubernetes cluster was hosted on AWS, so we used EFS volumes for storage.

Dockerfile

RUN apt-get update && apt-get install -y gdb

liveness.sh

#!/bin/bash

response=$(curl -s http://localhost:3000/healthz || exit 1)

if [ $? != 0 ]; then

  # Increment failure counter
  ...

  # On 4th failure, attach GDB
  if [ "$failures" -ge 4 ]; then

    gdb -p <PID> /gdbscript.gdb | tee /efs/stacktrace.txt

  fi

fi

deployment.yaml

// Container spec

volumeMounts:
  - name: debugger-efs 
    mountPath: /efs

// Pod spec 

volumes:
  - name: debugger-efs
    persistentVolumeClaim: 
      claimName: efs-pvc # EFS storage

// Liveness probe  

livenessProbe:
  exec:
    command: 
      - /bin/bash
      - /liveness.sh
  failureThreshold: 4

This gave us automated debugging powered by liveness checks. Whenever the loop caused failures, it would:

Trigger GDB with a stack trace
Pipe the output to persistent EFS storage

We eventually found the culprit was a buggy library import. After patching the dependency, our infinite loop nightmare was finally solved!

Key Takeaways

Here are the key techniques for debugging infinite loops in Node.js production:

Use GDB to breakpoint and print stack traces of running Node processes
Leverage liveness probes to automatically trigger GDB on failure
Persist data by piping traces to networked volumes like EFS

While the examples use Docker and Kubernetes, the same principles apply to any production environment:

Attach a debugger like GDB to pause and inspect
Automate debugging with health checks
Externalize data so it persists across restarts

With these strategies, you can quickly diagnose and resolve even the most painful infinite loops. No more crashes in production!

I hope these techniques help you debug tricky infinite loops. Let me know in the comments if you have any other tips or tricks!