#Topic pages

  • IPMI: how Crucible detects IPMI, why "Not detected" can be correct behavior, using glassmkr-crucible doctor ipmi, per-vendor notes.

#Crucible service fails to start

Symptom: systemctl status glassmkr-crucible shows failed or inactive (dead).

  1. Check the service logs:
    journalctl -u glassmkr-crucible --no-pager -n 50
  2. If you see a YAML parse error, re-run the init wizard with the same key to rewrite the config from scratch:
    sudo glassmkr-crucible init --api-key <your_collector_key>
    The wizard validates the key against the Dashboard before writing the config, so a typo surfaces immediately. Common YAML mistakes include tabs instead of spaces, missing quotes around strings with special characters, and incorrect indentation.
  3. If you see permission denied, ensure the configuration file is readable:
    ls -la /etc/glassmkr/collector.yaml
    The file should be owned by root with mode 0600.
  4. If you see bind: address already in use, another instance may be running:
    pgrep -a glassmkr-crucible
    Kill the stale process and try again.

#Server shows "offline" in the dashboard

Symptom: The server card shows a gray status indicator and "last seen" is more than 2 minutes ago (the agent pushes every 60 seconds by default; the server_unreachable rule fires after 2 missed check-ins).

  1. Check that Crucible is running:
    systemctl status glassmkr-crucible
  2. Check network connectivity to the API:
    curl -s -o /dev/null -w "%{http_code}" https://app.glassmkr.com/api/v1/health
    You should get 200. If not, check DNS resolution, firewall rules, and proxy settings.
  3. Check whether the collector key is valid:
    sudo journalctl -u glassmkr-crucible --since "5 min ago" --no-pager
    If you see auth error: 401, rotate the key in the Dashboard and update /etc/glassmkr/collector.yaml.
  4. Check for network-level blocks:
    nc -zv app.glassmkr.com 443
  5. If you are behind a proxy, configure it in collector.yaml:
    proxy:
      https: http://proxy.internal:3128

#Metrics are delayed or missing

Symptom: The dashboard shows gaps in charts or data arrives minutes late.

  1. Check the agent's push timing:
    sudo journalctl -u glassmkr-crucible --since "5 min ago" --no-pager
    The "Last push" value should be close to the configured interval (default 60 seconds).
  2. If pushes are slow, check the agent log for timeout errors:
    grep -i "timeout\|retry" /var/log/glassmkr/crucible.log | tail -20
  3. If the server's clock is significantly off, snapshots may be dropped. Verify NTP is working:
    timedatectl status
    If not synchronized:
    sudo timedatectl set-ntp true
  4. If specific collectors are slow (e.g., SMART queries on many disks), they can delay the entire push. Inspect collector timing:
    sudo journalctl -u glassmkr-crucible -f
    Consider increasing the interval or disabling slow collectors.

#SMART data is not appearing

Symptom: The Disk tab in the dashboard shows no SMART information.

  1. Ensure smartmontools is installed:
    # Debian / Ubuntu
    sudo apt install smartmontools
    
    # RHEL / Rocky / Alma
    sudo dnf install smartmontools
  2. Verify that smartctl can read your drives:
    sudo smartctl -a /dev/sda
    If this fails with a permission error, Crucible's glassmkr service user needs read access (the default install handles this via udev rules).
  3. For hardware RAID controllers, drives behind the controller are not visible to smartctl without the -d flag:
    sudo smartctl -a /dev/sda -d megaraid,0
  4. Verify the SMART collector is enabled:
    collectors:
      smart:
        enabled: true

#IPMI, thermal, or fan data is missing

Symptom: The Hardware tab shows no temperature, fan, or PSU data.

  1. Install lm-sensors for hwmon data:
    # Debian / Ubuntu
    sudo apt install lm-sensors
    sudo sensors-detect --auto
  2. For IPMI data, install ipmitool and verify it works:
    sudo apt install ipmitool
    sudo ipmitool sdr list
  3. Run the IPMI self-diagnostic:
    sudo glassmkr-crucible doctor ipmi
    See the IPMI troubleshooting page for the full per-reason fix guide.
  4. If IPMI is not available (common on consumer hardware, cloud VMs without passthrough, laptops, Raspberry Pi), Crucible reads thermal data from hwmon directly.
  5. Confirm the thermal collector is not disabled:
    collectors:
      thermal:
        enabled: true
        source: auto

#ZFS module not loaded

Symptom: the Storage tab shows no ZFS pools even though zpool list works on the host, or the zfs_* rules never fire.

  1. Check that the ZFS kernel module is loaded:
    lsmod | grep zfs
    On many distributions the module is loaded on-demand by the first zpool or zfs call. If Crucible starts before that happens, it sees no ZFS surface.
  2. Force-load the module at boot:
    echo zfs | sudo tee /etc/modules-load.d/zfs.conf
    sudo systemctl restart glassmkr-crucible
  3. If lsmod | grep zfs shows nothing and you expected ZFS, install the package set for your distribution (zfsutils-linux on Debian/Ubuntu, zfs on Rocky/Alma with EPEL).
  4. If you have a kernel update pending, ZFS DKMS sometimes lags behind the running kernel; reboot or rebuild the module against the new kernel before assuming Crucible is at fault.

#GPU tier-1 (nvidia-smi) unavailable

Symptom: a server with NVIDIA GPUs reports no GPU data even though nvidia-smi works interactively.

Crucible's GPU collector probes three tiers in order: nvidia-smi (most common), DCGM exporter (preferred when present), and Redfish OEM stub (BMC-side, vendor-dependent). Validated on L4, A4000, and A16 in the validation fleet.

  1. Confirm nvidia-smi is on the PATH that systemd sees:
    sudo systemd-run --pty --uid=glassmkr nvidia-smi
    Some distributions install nvidia-smi to /usr/lib/nvidia/current/ rather than /usr/bin/; the systemd unit's PATH may differ from your interactive shell.
  2. If the binary is found but exits non-zero, check the driver state:
    nvidia-smi --query-gpu=name,driver_version,pstate --format=csv
    A driver loaded against a different kernel than the running one will fail here.
  3. If DCGM is installed and you want the richer dataset, ensure the exporter is running:
    systemctl status nvidia-dcgm
  4. For BMC-side Redfish GPU telemetry (rare; vendor-specific OEM extension), confirm the BMC has the GPU sensor model populated:
    curl -k -u user:pass https://<bmc>/redfish/v1/Systems/1/Oem/

#Telegram notifications are not arriving

Symptom: Alerts fire in the dashboard but no Telegram messages are received.

  1. Test the channel from the dashboard or API:
    curl -X POST https://app.glassmkr.com/api/v1/channels/CHANNEL_ID/test \
      -H "Authorization: Bearer YOUR_TOKEN"
  2. If the test fails with 401 Unauthorized, the bot token is invalid. Re-create the bot via BotFather or regenerate the token.
  3. If the test fails with 400 Bad Request: chat not found, the chat ID is wrong. Common mistakes: missing the -100 prefix for supergroups, the bot was removed from the group, the bot never received any message in the chat (send a message to the bot first).
  4. If the test succeeds but real alerts do not arrive, check the channel routing. Go to Settings → Alert Defaults and confirm your Telegram channel is listed.
  5. Check the alert cooldown. By default, Glassmkr sends one notification per active alert per hour. Acknowledged or recently-notified alerts are suppressed.

#Email notifications go to spam

Symptom: Test emails arrive in the spam folder.

  1. Check the spam folder and mark messages as "not spam" to train your provider.
  2. Add [email protected] to your contacts or safe senders list.
  3. If you control the recipient domain, allow Glassmkr's SPF record. Contact support for the current IP ranges.
  4. For better deliverability, route through a custom SMTP server in your own domain. See the Channels page for setup.

#High CPU usage by Crucible

Symptom: the Crucible process uses more than 1-2% CPU consistently.

For reference, the validation-fleet measurement on 2026-05-21 across 7 hosts shows a median RSS of 91 MB idle, ~0% CPU, and fio delta under 1.5%; RSS ranged 65 MB to 103 MB. Sustained higher usage is unusual.

  1. Check which collectors are running:
    sudo journalctl -u glassmkr-crucible -f
  2. SMART queries on many disks can be expensive. If you have more than 20 disks, narrow the device list or increase the interval:
    collectors:
      smart:
        devices:
          - /dev/sda
          - /dev/sdb
  3. Per-core CPU metrics on machines with 64+ cores generate a lot of data. Disable per-core reporting if you do not need it:
    collectors:
      cpu:
        per_core: false
  4. If the collection interval is set very low (e.g., 10 seconds), increase it:
    collectors:
      interval: 60

#Registration fails with "server limit reached"

Symptom: + Add Server returns an error about the server limit.

  1. The Free plan allows 3 servers. Pro is $3/node/month with the first 3 nodes free.
  2. If you have decommissioned servers still registered, delete them from the dashboard to free up slots.
  3. To upgrade your plan, go to Settings → Billing.

#My servers are disabled (lock icon, "no payment method on file")

Symptom: some server tiles show a lock-icon overlay and "Manage in Settings". Notifications stopped firing for those servers.

Why: on the Pro plan, servers beyond the 3-server free quota are disabled at the end of the billing period (or 30 days after account creation, whichever is later) when no payment method is on file. The first 3 servers always stay active. Disabled servers continue to ingest snapshots so historical data is preserved; they just stop firing notifications.

  1. Add a payment method: Settings → Billing → Add card (opens the Stripe portal).
  2. Restore in bulk: Settings → Disabled servers → Restore all. Restoration is instant once a card is on file.
  3. If you would rather drop into the free quota than pay, delete individual servers from the same screen.

Glassmkr sends warning emails before disable: when the payment method is removed, 3 days before disable, 1 day before disable, and at the moment of disable. If you do not see these, check your spam folder and confirm the account email is correct.

#Configuration changes are not taking effect

Symptom: you edited collector.yaml but Crucible still uses the old settings.

  1. Restart the service after any configuration change:
    sudo systemctl restart glassmkr-crucible
  2. Verify the running config by inspecting the agent's startup banner:
    sudo journalctl -u glassmkr-crucible --since "1 min ago" --no-pager
    The first lines after restart print the resolved interval, enabled collectors, and Dashboard URL.
  3. Check that you edited the correct file. The systemd unit may pin a non-default config path:
    systemctl show glassmkr-crucible -p Environment
  4. Environment variables override the config file. Check for any GLASSMKR_* or CRUCIBLE_* variables in the systemd unit or shell environment.

#Per-core CPU data is not showing

Symptom: the per-core CPU chart does not appear, or per-core data is missing from AI analysis.

  1. Per-core monitoring requires Crucible 0.3.0 or later. Check:
    glassmkr-crucible --version
  2. Enable per-core in the config:
    collectors:
      cpu:
        per_core: true
  3. Restart Crucible:
    sudo systemctl restart glassmkr-crucible
  4. Wait for the next collection interval (default 60 seconds) for data to appear.

#Muted rules are still firing

Symptom: you muted a rule but it continues to fire alerts or send notifications.

  1. Muting takes effect on the next ingest cycle. Wait at least one collection interval after muting.
  2. If you muted via the configuration file, restart Crucible:
    sudo systemctl restart glassmkr-crucible
  3. If you muted via the dashboard, no restart is needed; the change applies on the next push from that server.
  4. Verify the rule is muted in the dashboard under the server's Alerts tab. Muted rules show a mute icon.

#Getting help

If your issue is not covered here:

  • Capture an hour of agent logs: sudo journalctl -u glassmkr-crucible --since "1 hour ago" --no-pager > crucible.log. Attach it when contacting support.
  • Email [email protected] with your server ID and a description of the issue.

Last verified: 2026-05-22 against Crucible v0.13.3. Resource footprint figures are from a 7-host validation-fleet measurement on 2026-05-21.