DOCS / CHANGELOG
Changelog
Behavior changes, operational improvements, and notable fixes for Glassmkr and the Crucible agent. Most recent at top.
#2026-05-22
Crucible v0.13.3
Current. The agent is at v0.13.3 on npm (@glassmkr/crucible) and at the registries ghcr.io/glassmkr/crucible and docker.io/glassmkr/crucible. MIT-licensed. Runs as the non-root glassmkr user.
Resource footprint. Validation-fleet measurement on 2026-05-21 across 7 hosts shows a median 91 MB RSS idle, near-zero CPU, and an fio delta under 1.5%. RSS ranged 65 MB to 103 MB.
Default interval. 60 seconds (set in v0.10.0; the previous 300-second default is gone).
Rule library at 61 rules across 9 categories
Categories: storage, ZFS, filesystem, memory and CPU, network, hardware (BMC / IPMI), time and services, security and patching, GPU. 20 rules ship with deep FIX content (safe-mode, validation, rollback, impact counters); the rest are verified. GPU coverage is 8 rules across three tiers (nvidia-smi / DCGM exporter / Redfish OEM stub), validated on NVIDIA L4, A4000, and A16.
Browse all rules at /docs/rules; the machine-readable corpus is at /llms-full.txt.
Upgrade
sudo npm install -g @glassmkr/crucible@latest
sudo systemctl restart glassmkr-crucible Or via Docker: docker pull ghcr.io/glassmkr/crucible:latest.
#2026-05-13
Crucible 0.9.4
Fixed. PSU sensor classification now works across every observed BMC vendor (Supermicro, Gigabyte, ASRockRack, ASUS). Previously the PS<N> name shape was Dell-gated and four of five PSU-having boxes were silently filtered out, so per-PSU alerts never fired regardless of state. The bitmask interpretation now follows IPMI 2.0 spec table 42-3 (Failure detected, AC lost, predictive, inactive).
Changed. When the agent cannot probe IPMI at all (no ipmitool, no /dev/ipmi0, etc.), the snapshot now emits null for ipmi.ecc_errors and sel_entries_count rather than stub zeros. The dashboard renders this as "no signal (BMC not probed)" instead of the misleading "0 / 0".
Changed. IPMI capability detection re-runs every hour. Installing ipmitool after the agent started is picked up automatically; no service restart needed.
Added. glassmkr-crucible doctor ipmi subcommand for customer self-diagnosis. See /docs/troubleshooting/ipmi.
Tier-gating policy for the programmatic API
The programmatic API surface (account keys) is now uniformly gated by Pro plan. Free customers retain full web-dashboard access and full read API access for their own data; programmatic writes (channel CRUD, alert acknowledge / resolve, server CRUD, mutes, restore endpoints, key management) and Pro features (AI analysis, trend warnings) return 402 pro_required when called via an API key on a Free plan. Web-dashboard sessions are unaffected at any tier. Full breakdown at /docs/api/tier-gating.
Unexpected-reboot alerts now auto-resolve
An unexpected_reboot alert that has been firing for at least 24 hours of continuous stable uptime now auto-resolves with resolution_reason: auto_decay_stable_24h. The original incident remains in the resolved-alerts history. Tunable per-server via config_overrides.unexpected_reboot_decay_hours.
#2026-05-12 (later still)
Rate-based ECC error detection
The ecc_errors alert rule now uses a rolling 24-hour rate window instead of a cumulative threshold. Default: a warning fires when more than 10 correctable errors are observed in 24 hours. Uncorrectable ECC errors continue to fire critical immediately on any non-zero count.
This eliminates false-positive alerts on long-running hosts where BMC error counters accumulate over months without indicating an active hardware issue.
When the BMC counter is cleared (SEL reset, BMC reboot), the rule skips one evaluation cycle and resumes on the next snapshot; no false alerts from the reset itself.
#2026-05-12 (later)
API key management UI + scopes
Pro customers can now manage API keys via the dashboard at /settings/keys. New keys can be created with one of three scopes (Read, Write, or Admin), an optional expiry date (up to 5 years), and graceful 48-hour rotation. The old key keeps working through the grace window so automation can be updated without downtime, then auto-revokes. Emergency revoke is one click away.
Reminder emails fire 7 and 1 days before key expiry, plus a notification when a key is auto-revoked on expiry. Audit log view is at /settings/audit with date-range and action filters (Pro plan; 365-day retention).
#2026-05-12
Pro-tier API gating
Programmatic API access (creating account API keys, rotating account keys, reading the audit log, plus programmatic server management via account keys: create, rename, delete, rotate collector keys) is now restricted to Pro-plan customers. Free-plan accounts can continue to ingest snapshots from their 3 free servers and use the dashboard to manage them; programmatic management requires Pro.
If you encounter a 402 response on an endpoint that previously worked, your account may have been downgraded or never had Pro access. Existing API keys created before this change keep working for ingest; manage them via the dashboard if you need to rotate or delete.
#2026-05-09
Hardware visibility on the dashboard
Server tiles and detail pages now show the hardware vendor and product detected by the monitoring agent (e.g., "GIGABYTE / R292-4S1-00"), plus an IPMI badge when sensor data is available. This helps identify what is actually on each box, especially when server names do not match the underlying hardware.
#2026-05-08
Stripe billing enforcement
Pro customers without a payment method on file now have servers beyond the 3-server free quota disabled at the end of their billing period. The oldest 3 servers stay active. Snapshot ingest continues for disabled servers, so historical data is preserved and restoration is instant once a card is added.
Affected customers receive a sequence of warning emails: when a payment method is removed, 3 days before disable, 1 day before disable, and at the moment of disable. Restoration is a single-click operation: Settings → Disabled servers → Restore all.
#2026-05-07
cpu_temperature_high reads hwmon directly, with IPMI fallback
Most customers see no change. The evaluator now reads CPU thermals from the kernel's hwmon interface (more accurate, vendor-agnostic), falling back to IPMI sensor data when hwmon is not available.
Customers running on Gigabyte AMD platforms (B650, B660, X670, EPYC) see fewer false-positive alerts. Their BMC firmware (12.61, the shipping default on these boards) reports a CPU<N>_DTS IPMI sensor that runs about 30 C hotter than the actual CPU die temperature exposed via the kernel. The agent (Crucible 0.9.1+) drops the inflated CPU<N>_DTS sensor whenever a CPU<N>_TEMP sibling exists on the same socket.
ECC alerts now fire on Dell and HPE iDRAC platforms too
Previously, ECC alerts only fired when the BMC exposed a named "correctable / uncorrectable" sensor. Dell iDRAC and some HPE iLO firmwares report ECC events only via the System Event Log (SEL), not as named sensors, so customers on those platforms were missing alerts silently. This release adds a SEL-derived counter as a second source.
install.sh simplified
The bootstrap installer at glassmkr.com/install.sh now delegates configuration and systemd setup to the glassmkr-crucible init subcommand. The --dashboard-key argument is preserved as an alias for --api-key so existing automation keeps working.