Furnace AI assistant: reads alert, looks at evidence, suggests fix. Self-hosted Gemma 4 26B, no third-party LLM APIs.

May 2026 · AI · 5 min read

Introducing Furnace: the AI assistant that helps you fix alerts.

Furnace is the AI assistant we built for Glassmkr. It reads your alerts, looks at the evidence, and suggests remediation steps. It’s been the most-skipped feature of every monitoring tool we’ve used. We think we’ve made it useful enough to keep on.

Why AI in monitoring is hard

Most monitoring tools that have shipped AI features in the last two years have made a similar mistake: they treat AI as the headline. Every alert page suddenly has a chatbot. Every threshold gets an “AI-powered” sticker. Marketing pages promise that AI will solve the alert-fatigue problem.

The problem is that AI in monitoring is most useful in exactly the cases where overconfident AI is the most dangerous. Real alerts on real infrastructure require careful, hedged thinking. Is the disk SMART degradation a soft error that’s reversible, or a sign that this drive needs to be replaced tonight? Is the OOM kill a one-time event from a misbehaving cron job, or evidence of a memory leak that will recur?

When an AI tool autocompletes a confident shell command at you and gets it wrong, you’ve turned a routine alert into an incident. The worst version of AI in monitoring isn’t an unhelpful AI; it’s a helpful-seeming AI that’s confidently wrong.

So we built Furnace with a different bias: conservative, hedging, willing to say “I don’t know.”

What Furnace does

Furnace appears on every alert detail page. When an alert fires, Furnace reads the alert state, the evidence the rule collected, and the FIX workflow defined for that rule. It then produces context-specific notes.

For example, on a nvme_wear_high alert (NVMe drive wear percentage trending high):

Drive /dev/nvme0n1 shows 92% wear. This is in the “schedule replacement” band, not the “replace immediately” band. The rule’s FIX workflow recommends planning a maintenance window in the next 4-8 weeks. I notice the drive’s lifetime_writes metric has been increasing at ~2% per week over the past 90 days, which is consistent with sustained heavy write workload. If you want to confirm whether this rate is unusual for this drive model, you could compare against the manufacturer’s endurance spec (typically published in TBW). I don’t have the specific spec for your drive model loaded; you’d find it on the manufacturer’s product page.

Three things to notice:

It’s specific. Not “this drive is wearing out, replace it.” Specific drive, specific percentage, specific wear band from the FIX workflow.

It explains its reasoning. “92% wear is in the schedule-replacement band, not the replace-immediately band” tells you why it’s suggesting maintenance window vs immediate replacement.

It hedges where it should. “If you want to confirm whether this rate is unusual…” not “this rate is unusual.” “I don’t have the specific spec for your drive model loaded” instead of guessing the TBW from training data.

What Furnace doesn’t do

Furnace doesn’t autocomplete shell commands. When the FIX workflow has a command, Furnace can explain what it does, but Furnace doesn’t generate new commands for you to run. The commands shown are the ones we’ve tested and documented.

Furnace doesn’t make claims about your business. It won’t tell you what the alert “really means for your customers” or recommend a “communication strategy” for your team. It does technical context. That’s it.

Furnace doesn’t try to be friendly. It’s not a chatbot. There’s no “Hey there!” or “Great question!” It writes the way a senior engineer writes during an incident: terse, factual, useful.

Furnace says “I don’t know” when it doesn’t know. If you ask it about something outside the alert context, it doesn’t hallucinate. If the evidence is inconclusive, it tells you the evidence is inconclusive.

How Furnace is built

We considered using a commercial LLM API (OpenAI, Anthropic, Google) for Furnace and decided against it.

Three reasons:

Data residency. Glassmkr customers’ alert data should not leave EU jurisdiction. Routing alert evidence to a US-based commercial API does not match our trust posture.

Cost predictability. Per-token API pricing scales with usage in ways we can’t pass on transparently to customers at $3/node/month. Self-hosted inference is fixed-cost.

Behavioural control. Commercial LLMs have safety filtering, refusal patterns, and system-prompt overrides that we don’t control. For a monitoring tool, we want full control of the behavior.

So Furnace runs on self-hosted Gemma 4 26B (a quantised variant) on a single NVIDIA L4 GPU in Amsterdam, served via llama.cpp. Inference latency is ~2-5 seconds for typical alert annotations.

We picked Gemma 4 for three reasons: it’s an open-weight model from a credible source (Google DeepMind), the 26B parameter size is large enough to be useful while small enough to run on a single L4 GPU, and the licensing terms allow commercial use without per-API-call fees.

The truthfulness scope

We wrote a scope document for what Furnace will and won’t do. Highlights:

Furnace hedges interpretive claims and states mechanical facts directly.
Furnace says “I don’t know” when it doesn’t know.
Furnace doesn’t autocomplete fix commands.
Furnace stays in scope: technical context for alerts and infrastructure questions, nothing else.

We treat the scope as a living document that we update when we find Furnace doing something we don’t want.

What’s next

This is the introduction of Furnace as a feature. The bigger product investments around Furnace are in the queue:

Richer remediation grounding. Furnace will read more of the structured remediation data attached to each rule and use it to ground its suggestions. The deepening work on our rule library brought the remediation content from prose-only descriptions to structured fields the dashboard renders inline. Furnace gets richer context as that data flows into the prompt.

Customer-specific patterns. As your fleet accumulates alert history, Furnace can reference “this rule has fired twice before on this server; both times it resolved within an hour without action” instead of treating each alert as isolated.

Better refusal patterns. We’re tracking cases where Furnace should have refused to answer or should have said “I don’t know” but didn’t. Each one is a training signal for the next iteration.

Try it

If you have a Glassmkr account, Furnace is enabled by default on all alert detail pages. No setup, no API key required.

If you don’t have a Glassmkr account, sign up is free for the first 3 nodes. Install Crucible on a server you care about. If anything is degraded on that server (and on bare metal, something usually is), an alert will fire within minutes. Click into the alert detail page and you’ll see Furnace’s notes.

If Furnace’s notes are useful, tell us. If they’re not, tell us harder. Email [email protected].

One more thing

We chose the name Furnace deliberately. A furnace is a tool that works hot, transforms input under pressure, and is fundamentally about controlled process. The metaphor maps to alert remediation in ways we think are honest: alerts are pressure, the response is process, the goal is controlled work not theatre.

If the name strikes you as overwrought, that’s fair. Names are arbitrary. The product is what matters.

Read more about how Glassmkr handles AI honestly: the trust posture, the documentation.