FOR SRE / INFRASTRUCTURE ENGINEERING

Monitoring for SREs who manage bare-metal fleets.

Opinionated alert rules. Open-source agent. Pricing you can predict. Built by infrastructure operators, for infrastructure operators.

3 nodes free. No card required.

curl -fsSL https://app.glassmkr.com/install.sh | bash

Crucible v0.13.0 on npm

The problem

If you run bare-metal infrastructure, you already know what should be monitored. The frustrating part isn’t figuring out what to watch. It’s the operational overhead of getting a monitoring tool set up that actually catches the things that matter without burying you in noise.

The big platforms (Datadog, New Relic) are designed for ephemeral cloud workloads. Their default rules don’t know what a degraded RAID looks like, what ECC errors mean at scale, or when an NVMe drive is about to fail. Their pricing scales with telemetry volume, which is exactly the wrong shape for bare metal.

The open-source stack (Prometheus, Grafana, Alertmanager) works, but it’s a three-week project to deploy and a permanent maintenance burden. You’re not running a monitoring engineering team. You’re running infrastructure.

How Glassmkr fits

Glassmkr is monitoring built around the assumption that you’re running bare-metal hardware and you know what you’re doing.

Opinionated defaults that match real failure modes.

60 alert rules tuned for bare metal: disk SMART, RAID state, NVMe wear, IPMI sensors, ECC errors, ZFS health, network interface state, conntrack exhaustion, BMC accessibility, kernel update pending, security patches available. The thresholds are tuned for “this is genuinely a problem worth waking someone up about,” not “CPU briefly spiked.”

A 150-line bash installer.

Read the install script before you run it. Read the agent source on GitHub before you install it. The agent runs as a non-root user, communicates over HTTPS only, and ships exactly the metrics it claims to ship.

Multi-channel notifications routed how your team works.

Telegram, Slack, Discord, PagerDuty, email, generic webhook. Group by team, by server, by severity. Suppress during maintenance windows. The alerting layer doesn’t impose a workflow on you.

Remediation guidance on every alert.

Every alert detail page shows the recommended next step: what to verify, what command to run, what to expect after. Cross-references between related rules (e.g. NVMe wear suggests pre-failure capacity planning before it becomes a SMART critical).

Predictable pricing.

$3 per node per month after 3 free nodes. No telemetry volume billing. No tier upgrades for the features your team actually uses. Every Pro account gets every feature.

Specific alert rules that matter

The 60 rules most relevant to bare-metal SRE work are roughly:

  • Storage health: disk SMART degradation, NVMe wear bands, disk I/O errors, disk latency anomalies
  • RAID state: degraded arrays, scrub errors, rebuilds in progress
  • ZFS health: pool state, vdev errors, scrub recency
  • Filesystem state: read-only mounts, inode exhaustion, file descriptor exhaustion, capacity thresholds
  • Memory pressure: RAM utilisation, swap utilisation, OOM kills observed
  • CPU state: sustained load, sustained I/O wait, temperature anomalies
  • Network state: interface errors, link speed mismatches, saturation, bond state, conntrack exhaustion
  • Hardware sensors (IPMI): fan failures, PSU redundancy loss, SEL critical entries, BMC accessibility
  • System health: clock drift, NTP sync state, systemd service failures, unexpected reboots
  • Security state: pending security updates, kernel vulnerabilities flagged, SSH password authentication exposed, no firewall configured

The agent you can read

Crucible is MIT licensed and on npm. The source is at github.com/glassmkr/crucible. The install script is ~150 lines of bash. The agent runs as the glassmkr user, never root. It writes to its own systemd unit. It communicates with the dashboard over HTTPS only. It ships metrics and alert state; it does not ship logs, command output, or anything secret.

If you can’t read the agent, you can’t trust it. We made the agent readable.

Pricing reminder

$3 per node per month. First 3 nodes free. No card required to install. Cancel any time.

  • For 10 nodes: $21/month
  • For 50 nodes: $141/month
  • For 200 nodes: $591/month

If it fits, install on a single server first.

If you want to see if it works on your stack, install on a single server in 2 minutes:

curl -fsSL https://app.glassmkr.com/install.sh | bash

Questions about whether Glassmkr fits your environment? Email [email protected] directly.