gpu_uncorrected_ecc P0 GPU

GPU uncorrected ECC or DBE retired pages

GPU reports uncorrected ECC errors, double-bit ECC retired pages, or pending retirements. Uncorrected ECC means error correction could not recover; in-flight data may have been corrupted. Pending retirements require a reboot.

Remediation

When this rule fires on one of your servers, the dashboard alert detail page renders the full remediation guidance: the command to run, what to verify after, and Furnace's annotation for your specific distro + hardware. Sign in at app.glassmkr.com to see the live alert.