Correlate Heap Spikes with Application Events
This troubleshooting guide shows how to use performance monitoring and trend analysis to correlate a heap spike with application events in the Nodinite JMX Monitoring Agent.
Common Causes of Heap Spikes
Understanding the root cause of a heap spike helps you determine whether action is needed:
| Cause | Heap Pattern | Resolution |
|---|---|---|
| Deployment | Sharp spike then stabilizes | Expected — allow JVM to warm up |
| Batch Job | Spike during job, drops after | Expected — tune Duration threshold |
| Traffic Surge | Gradual increase, then drops | Expected — may need heap tuning |
| Memory Leak | Slow continuous increase, never drops | Action required — investigate code |
| Garbage Collection Pressure | Repeated spikes with minimal drops | Action required — tune GC settings |
| CPU Spike | Heap flat but CPU spikes | Action required — analyze thread dump |
Correlation Workflow
Use the following workflow to investigate a heap spike alert:
Diagram: Workflow for correlating a heap memory spike alert with application events to determine root cause.
Correlation Steps
Step 1: Note the Alert Timestamp
When a heap spike alert fires in Nodinite, note the exact timestamp. This is your anchor point for correlation.
Step 2: Check Deployment and Batch Schedules
Compare the alert timestamp against:
- Scheduled deployments and releases
- Scheduled batch jobs and data processing tasks
- Maintenance windows and JVM restarts
Step 3: Review Historical Heap Trends
Use the Nodinite historical monitoring data to review heap trends over the past hour, day, or week. A continuously rising trend without recovery indicates heap exhaustion or a memory leak.
Step 4: Capture a JVM Thread Dump
If the heap remains elevated and no application event explains the spike, capture a JVM thread dump to identify threads associated with a CPU spike or excessive memory retention:
jstack <PID> > threaddump.txt
Step 5: Analyze Heap Dump if Required
For persistent memory leaks, capture a heap dump for analysis with tools such as Eclipse MAT or VisualVM:
jcmd <PID> GC.heap_dump /tmp/heapdump.hprof
Adjusting Alert Configuration After Correlation
Once you identify the cause:
- Batch job spikes: Increase the Duration setting in the JMX Configuration to filter expected spikes
- Traffic-related spikes: Adjust the Warning Threshold to accommodate peak usage patterns
- Memory leaks: Fix the underlying code issue; temporary threshold adjustments mask the problem
Important
Never increase thresholds to silence an alert without understanding the root cause. Increasing the error threshold because heap usage is high may delay detection of a real OutOfMemoryError.
Next Step
Environment-Specific Thresholds
Polling Frequency
Related Topics
Heap Used vs Committed
Supported Garbage Collectors
JMX Troubleshooting Overview