Blog
Updates, guides, and operational knowledge from the team.
What We Learned Running Gemma 4 on an L4 GPU for Production Server Analysis
How we deployed Gemma 4 26B on an NVIDIA L4 for AI health analysis of bare metal servers. Covers model selection, why vLLM failed, quantization choices, and prompting for structured infrastructure output.
IPMI, SMART, and RAID: The Hardware Layer Your Cloud Monitoring Tool Ignores
Most monitoring tools stop at the OS. Below it sits an entire hardware layer: disk firmware predicting its own failure, fans at 0 RPM, ECC memory correcting silent errors. Here is what to monitor and why.
Introducing Crucible: 38 alert rules for bare metal
We built an open-source monitoring agent that covers the failure modes that actually matter when running physical servers. IPMI sensors, SMART attributes, RAID health, network errors, and OS-level alerts, all in a single binary with zero configuration.
Why bare metal monitoring is different
Cloud monitoring tools were built for ephemeral workloads. They track HTTP latency and container restarts. But when you run physical servers, the failure modes are fundamentally different: drives wear out, DIMM slots develop bit errors, fans fail silently, and RAID arrays degrade without anyone noticing.