gpu_corrected_ecc_storm P3 GPU

GPU corrected-ECC level high

GPU corrected-ECC counter is high or single-bit retired pages are non-zero. SBE storms typically precede DBE faults; this rule gives operators time to plan preventive replacement before uncorrected ECC fires.

Remediation

When this rule fires on one of your servers, the dashboard alert detail page renders the full remediation guidance: the command to run, what to verify after, and Furnace's annotation for your specific distro + hardware. Sign in at app.glassmkr.com to see the live alert.