gpu_thermal_critical P1 GPU

GPU thermal critical

GPU die temperature at or above HW slowdown threshold, or kernel reports thermal throttle engaged. Sustained operation at thermal limits accelerates wear and reduces inference throughput. Boot grace 300s for post-boot sensor stabilisation.

Remediation

When this rule fires on one of your servers, the dashboard alert detail page renders the full remediation guidance: the command to run, what to verify after, and Furnace's annotation for your specific distro + hardware. Sign in at app.glassmkr.com to see the live alert.