DOCS / TROUBLESHOOTING / IPMI
IPMI troubleshooting
How Crucible detects IPMI, why "Not detected" does not always mean broken, how to self-diagnose with glassmkr-crucible doctor ipmi, and what to expect across BMC vendors.
#How Crucible detects IPMI
Detection is capability-based, not vendor-allowlist. Crucible does not look at your BMC vendor string and decide whether to support you; it asks "can I actually talk to the BMC?" and uses the answer.
The probe chain at agent start, and on every re-check:
- Device-node check.
stat /dev/ipmi0(also/dev/ipmi/0and/dev/ipmidev/0). Permission errors here surface aspermission_denied. - ipmitool binary check.
ipmitool -V. Missing binary surfaces asno_ipmitool_binary. - Fast path. If both the device node and the binary are present, Crucible records the capability as available and stops probing.
- Sensor-probe fallback. Only used when the binary exists but the device node did not. Runs
ipmitool sensorand inspects stderr. A "could not open device" message surfaces asno_bmc_device; other errors surface asexecution_failed.
The result is a structured detection.reason field on every snapshot, with one of four values: no_ipmitool_binary, permission_denied, no_bmc_device, or execution_failed. The Dashboard surfaces this reason under "IPMI: Not detected" so you know which fix to apply.
Crucible re-runs detection every hour. If you install ipmitool or load the kernel modules after the agent started, the next hourly re-check picks the change up automatically. No restart required.
#Detection vs collection: they can disagree, by design
It is normal for the Dashboard to report "IPMI: Not detected" on a host where some hardware metrics still appear. This is not a bug: detection and collection use different data sources.
The header IPMI verdict reflects Crucible's BMC probe. The dashboard's CPU temperature, fan, and ECC blocks can also be populated from non-BMC sources:
- CPU temperature often comes from
hwmon(kernel-side, no BMC needed) orlm-sensors. - ECC counters can come from kernel EDAC (
/sys/devices/system/edac/mc/mc*/{ce,ue}_count) on systems where the BIOS exposes them, completely separate from the BMC. - SMART, RAID, network, disk usage are kernel-side and do not depend on IPMI at all.
When the agent cannot probe IPMI at all, the snapshot emits null for ECC and SEL counters; the Dashboard renders that as "no signal (BMC not probed)" instead of the misleading "0 / 0" reading.
#Self-diagnose with glassmkr-crucible doctor ipmi
The doctor subcommand runs the same probes the agent uses and prints actionable guidance for each failure mode. It is read-only and does not modify system state.
sudo glassmkr-crucible doctor ipmi The available case looks like:
IPMI capability check:
Result: [OK] IPMI detected via ipmitool_in_band
ipmitool: 1.8.19
Crucible will collect:
- Sensor readings (temperature, fan, voltage, power)
- SEL events (recent + cumulative ECC counters)
- PSU redundancy state (per-PSU + aggregate) Failure cases print the matching detection.reason plus a fix recipe.
#no_ipmitool_binary
Meaning: the /dev/ipmi0 device exists, but ipmitool is not installed.
Fix: install the package:
- Debian / Ubuntu:
sudo apt install ipmitool - RHEL / Rocky / Alma:
sudo dnf install ipmitool - Arch:
sudo pacman -S ipmitool - Alpine:
sudo apk add ipmitool
No restart needed. The next collection cycle (within ~60 seconds at the default interval) sees the binary, and the next hourly re-check flips detection.available to true. The Dashboard updates on the following ingest.
#permission_denied
Meaning: Crucible cannot open /dev/ipmi0. The device node is mode 0600 owned by root.
Fix: Crucible runs as the non-root glassmkr user; the install script provisions a udev rule granting that user read access. If you customized the service unit, confirm:
systemctl cat glassmkr-crucible | grep '^User='
ls -l /dev/ipmi0 The default install ships a udev rule at /etc/udev/rules.d/99-glassmkr-ipmi.rules that grants the glassmkr group access. If you removed it, restore via the install script or run the agent as root (less preferred).
#no_bmc_device
Meaning: ipmitool is installed and runs, but the kernel has no IPMI device node and the in-band ipmitool probe could not open one. Usually the kernel modules are not loaded.
sudo modprobe ipmi_si ipmi_devintf ipmi_msghandler
ls -l /dev/ipmi0 # should appear after the modules load If /dev/ipmi0 still does not appear, the host may genuinely have no BMC. This is common on consumer hardware, Raspberry Pi, laptops, and virtual machines without IPMI passthrough. In that case set collection.ipmi: false in /etc/glassmkr/collector.yaml to silence the snapshot field; the dashboard stops trying to render IPMI for this host.
#execution_failed
Meaning: ipmitool ran, but the call returned an error other than "could not open device". The BMC is reachable in some sense but not responding the way Crucible expected.
Fix: reproduce by hand and read the error:
sudo ipmitool mc info Common causes:
- The BMC is in a degraded state and dropped the request. Retry; if it persists, escalate via the support path below.
- The in-band interface (KCS or SSIF) is busy. Sustained busy state usually means firmware is mid-task; wait a few minutes and retry.
- The installed ipmitool is too old for the BMC's IPMI 2.0 dialect. Upgrade
ipmitoolvia the distribution package manager.
Do not run sudo ipmitool mc reset cold without first confirming with your hardware vendor. Some BMCs do not recover cleanly from a cold reset and hang past the operation, which on a remote machine is much worse than the original failure.
#Per-vendor notes
Crucible's detection is capability-based, so any BMC that responds to standard IPMI 2.0 commands works. These notes are vendor-specific quirks observed on real hardware, not detection-gating rules.
Supermicro
Usually clean. The BMC reports vendor strings cleanly via ipmitool mc info (Manufacturer Name: Supermicro or Super Micro Computer Inc.). PSU sensors typically appear as PS1 Status / PS2 Status with the discrete-state bitmask in the Reading column.
Gigabyte
The BMC sometimes reports Manufacturer Name: Unknown (0x3C0A) in ipmitool mc info output, even though the IANA manufacturer ID (15370) resolves to Gigabyte. This is a Gigabyte BMC firmware quirk; Crucible does not gate detection on the manufacturer string, so no customer action is needed. PSU sensors typically appear as PS1_Status with an underscore separator.
ASUS
Validated on RS700-E10-RS4U. Detection works correctly when ipmitool is installed; the most common issue is that distributions sometimes ship without ipmitool by default, which surfaces as no_ipmitool_binary in the doctor output. Install via the per-distro command above.
ASRockRack
DMI sys_vendor may read "To Be Filled By O.E.M." on some boards (a known firmware default), but the BMC itself reports vendor cleanly via ipmitool mc info (Manufacturer Name: ASRock Rack Incorporation). PSU sensors appear as PSU1 Status / PSU2 Status.
Dell PowerEdge (iDRAC)
In-band IPMI through iDRAC works without an iDRAC Enterprise license. The license gates out-of-band IPMI over LAN, not the in-band KCS path Crucible uses. PSU sensors appear as PS1 Status / PS2 Status, and iDRAC also exposes an aggregate PS Redundancy sensor that Crucible reads for whole-pair redundancy state.
Dell iDRAC compatibility has not been validated on real hardware in our validation fleet. If you hit a detection or collection issue specific to iDRAC, file a support request with the output of sudo ipmitool mc info and sudo glassmkr-crucible doctor ipmi.
HP ProLiant (iLO)
In-band IPMI via KCS usually works without an iLO Advanced license. The license gates out-of-band iLO features, not in-band IPMI. Some older iLO firmware revisions require ipmitool 1.8.18 or later for IPMI 2.0 compatibility.
HP iLO compatibility has not been validated on real hardware in our validation fleet. Same support-request convention as Dell above.
#A note on PSU monitoring
The isPsuSensor classifier covers Supermicro, Gigabyte, ASRockRack, and ASUS naming conventions, and interprets discrete states as IPMI 2.0 spec table 42-3 hex bitmasks (Failure detected, AC lost, predictive, inactive) in addition to text-status strings.
If a multi-PSU box previously showed two healthy PSUs in the dashboard but one was actually failed or unplugged, that is the bug shape that current Crucible releases catch.
#When to file a support request
Email [email protected] when:
- Your BMC vendor is not in the validated list above, and detection works (the
doctoroutput shows[OK]) but a specific collection path (sensors, SEL, PSU) returns unexpected values. - Detection fails (
doctoroutput shows[FAIL]) butsudo ipmitool mc infoworks fine when you run it interactively. - The
doctorsubcommand returnsexecution_failedwith an error message not covered above.
Attach:
- The doctor output:
sudo glassmkr-crucible doctor ipmi 2>&1 - A successful raw probe:
sudo ipmitool mc info 2>&1 - One hour of agent logs:
sudo journalctl -u glassmkr-crucible --since "1 hour ago" --no-pager > crucible.log - Your server ID from the Dashboard.
Last verified: 2026-05-22 against Crucible v0.13.3.