#How Crucible detects IPMI

Detection is capability-based, not vendor-allowlist. Crucible does not look at your BMC vendor string and decide whether to support you; it asks "can I actually talk to the BMC?" and uses the answer.

The probe chain at agent start, and on every re-check:

  1. Device-node check. stat /dev/ipmi0 (also /dev/ipmi/0 and /dev/ipmidev/0). Permission errors here surface as permission_denied.
  2. ipmitool binary check. ipmitool -V. Missing binary surfaces as no_ipmitool_binary.
  3. Fast path. If both the device node and the binary are present, Crucible records the capability as available and stops probing.
  4. Sensor-probe fallback. Only used when the binary exists but the device node did not. Runs ipmitool sensor and inspects stderr. A "could not open device" message surfaces as no_bmc_device; other errors surface as execution_failed.

The result is a structured detection.reason field on every snapshot, with one of four values: no_ipmitool_binary, permission_denied, no_bmc_device, or execution_failed. The Dashboard surfaces this reason under "IPMI: Not detected" so you know which fix to apply.

Crucible re-runs detection every hour. If you install ipmitool or load the kernel modules after the agent started, the next hourly re-check picks the change up automatically. No restart required.

#Detection vs collection: they can disagree, by design

It is normal for the Dashboard to report "IPMI: Not detected" on a host where some hardware metrics still appear. This is not a bug: detection and collection use different data sources.

The header IPMI verdict reflects Crucible's BMC probe. The dashboard's CPU temperature, fan, and ECC blocks can also be populated from non-BMC sources:

  • CPU temperature often comes from hwmon (kernel-side, no BMC needed) or lm-sensors.
  • ECC counters can come from kernel EDAC (/sys/devices/system/edac/mc/mc*/{ce,ue}_count) on systems where the BIOS exposes them, completely separate from the BMC.
  • SMART, RAID, network, disk usage are kernel-side and do not depend on IPMI at all.

When the agent cannot probe IPMI at all, the snapshot emits null for ECC and SEL counters; the Dashboard renders that as "no signal (BMC not probed)" instead of the misleading "0 / 0" reading.

#Self-diagnose with glassmkr-crucible doctor ipmi

The doctor subcommand runs the same probes the agent uses and prints actionable guidance for each failure mode. It is read-only and does not modify system state.

sudo glassmkr-crucible doctor ipmi

The available case looks like:

IPMI capability check:
  Result:        [OK] IPMI detected via ipmitool_in_band
  ipmitool:      1.8.19

Crucible will collect:
  - Sensor readings (temperature, fan, voltage, power)
  - SEL events (recent + cumulative ECC counters)
  - PSU redundancy state (per-PSU + aggregate)

Failure cases print the matching detection.reason plus a fix recipe.

#no_ipmitool_binary

Meaning: the /dev/ipmi0 device exists, but ipmitool is not installed.

Fix: install the package:

  • Debian / Ubuntu: sudo apt install ipmitool
  • RHEL / Rocky / Alma: sudo dnf install ipmitool
  • Arch: sudo pacman -S ipmitool
  • Alpine: sudo apk add ipmitool

No restart needed. The next collection cycle (within ~60 seconds at the default interval) sees the binary, and the next hourly re-check flips detection.available to true. The Dashboard updates on the following ingest.

#permission_denied

Meaning: Crucible cannot open /dev/ipmi0. The device node is mode 0600 owned by root.

Fix: Crucible runs as the non-root glassmkr user; the install script provisions a udev rule granting that user read access. If you customized the service unit, confirm:

systemctl cat glassmkr-crucible | grep '^User='
ls -l /dev/ipmi0

The default install ships a udev rule at /etc/udev/rules.d/99-glassmkr-ipmi.rules that grants the glassmkr group access. If you removed it, restore via the install script or run the agent as root (less preferred).

#no_bmc_device

Meaning: ipmitool is installed and runs, but the kernel has no IPMI device node and the in-band ipmitool probe could not open one. Usually the kernel modules are not loaded.

sudo modprobe ipmi_si ipmi_devintf ipmi_msghandler
ls -l /dev/ipmi0    # should appear after the modules load

If /dev/ipmi0 still does not appear, the host may genuinely have no BMC. This is common on consumer hardware, Raspberry Pi, laptops, and virtual machines without IPMI passthrough. In that case set collection.ipmi: false in /etc/glassmkr/collector.yaml to silence the snapshot field; the dashboard stops trying to render IPMI for this host.

#execution_failed

Meaning: ipmitool ran, but the call returned an error other than "could not open device". The BMC is reachable in some sense but not responding the way Crucible expected.

Fix: reproduce by hand and read the error:

sudo ipmitool mc info

Common causes:

  • The BMC is in a degraded state and dropped the request. Retry; if it persists, escalate via the support path below.
  • The in-band interface (KCS or SSIF) is busy. Sustained busy state usually means firmware is mid-task; wait a few minutes and retry.
  • The installed ipmitool is too old for the BMC's IPMI 2.0 dialect. Upgrade ipmitool via the distribution package manager.

Do not run sudo ipmitool mc reset cold without first confirming with your hardware vendor. Some BMCs do not recover cleanly from a cold reset and hang past the operation, which on a remote machine is much worse than the original failure.

#Per-vendor notes

Crucible's detection is capability-based, so any BMC that responds to standard IPMI 2.0 commands works. These notes are vendor-specific quirks observed on real hardware, not detection-gating rules.

Supermicro

Usually clean. The BMC reports vendor strings cleanly via ipmitool mc info (Manufacturer Name: Supermicro or Super Micro Computer Inc.). PSU sensors typically appear as PS1 Status / PS2 Status with the discrete-state bitmask in the Reading column.

Gigabyte

The BMC sometimes reports Manufacturer Name: Unknown (0x3C0A) in ipmitool mc info output, even though the IANA manufacturer ID (15370) resolves to Gigabyte. This is a Gigabyte BMC firmware quirk; Crucible does not gate detection on the manufacturer string, so no customer action is needed. PSU sensors typically appear as PS1_Status with an underscore separator.

ASUS

Validated on RS700-E10-RS4U. Detection works correctly when ipmitool is installed; the most common issue is that distributions sometimes ship without ipmitool by default, which surfaces as no_ipmitool_binary in the doctor output. Install via the per-distro command above.

ASRockRack

DMI sys_vendor may read "To Be Filled By O.E.M." on some boards (a known firmware default), but the BMC itself reports vendor cleanly via ipmitool mc info (Manufacturer Name: ASRock Rack Incorporation). PSU sensors appear as PSU1 Status / PSU2 Status.

Dell PowerEdge (iDRAC)

In-band IPMI through iDRAC works without an iDRAC Enterprise license. The license gates out-of-band IPMI over LAN, not the in-band KCS path Crucible uses. PSU sensors appear as PS1 Status / PS2 Status, and iDRAC also exposes an aggregate PS Redundancy sensor that Crucible reads for whole-pair redundancy state.

Dell iDRAC compatibility has not been validated on real hardware in our validation fleet. If you hit a detection or collection issue specific to iDRAC, file a support request with the output of sudo ipmitool mc info and sudo glassmkr-crucible doctor ipmi.

HP ProLiant (iLO)

In-band IPMI via KCS usually works without an iLO Advanced license. The license gates out-of-band iLO features, not in-band IPMI. Some older iLO firmware revisions require ipmitool 1.8.18 or later for IPMI 2.0 compatibility.

HP iLO compatibility has not been validated on real hardware in our validation fleet. Same support-request convention as Dell above.

#A note on PSU monitoring

The isPsuSensor classifier covers Supermicro, Gigabyte, ASRockRack, and ASUS naming conventions, and interprets discrete states as IPMI 2.0 spec table 42-3 hex bitmasks (Failure detected, AC lost, predictive, inactive) in addition to text-status strings.

If a multi-PSU box previously showed two healthy PSUs in the dashboard but one was actually failed or unplugged, that is the bug shape that current Crucible releases catch.

#When to file a support request

Email [email protected] when:

  • Your BMC vendor is not in the validated list above, and detection works (the doctor output shows [OK]) but a specific collection path (sensors, SEL, PSU) returns unexpected values.
  • Detection fails (doctor output shows [FAIL]) but sudo ipmitool mc info works fine when you run it interactively.
  • The doctor subcommand returns execution_failed with an error message not covered above.

Attach:

  • The doctor output: sudo glassmkr-crucible doctor ipmi 2>&1
  • A successful raw probe: sudo ipmitool mc info 2>&1
  • One hour of agent logs: sudo journalctl -u glassmkr-crucible --since "1 hour ago" --no-pager > crucible.log
  • Your server ID from the Dashboard.

Last verified: 2026-05-22 against Crucible v0.13.3.