Open Source · AGPL-3.0

Rackscope

Prometheus-first physical infrastructure monitoring

When an alert fires, monitoring tools indicate what is wrong — but rarely where the problem is located in the physical infrastructure. Rackscope provides that physical context, mapping every metric to its exact location: site, datacenter, room, aisle, rack, device, instance.

Get Started →GitHub

🔔Toulouse HPC

→

Machine Room A

→

Compute Aisle

→

Rack C04

→

compute-042

https://rackscope.dev/home

1 / 6

Analytics Dashboard

Drag-and-drop widget grid with live health, alerts, world map and Prometheus stats

Design Philosophy

See your infrastructure,
not your spreadsheets.

Three principles that are non-negotiable.

📄

Zero Database

All configuration is stored in YAML files — GitOps-compatible, version-controlled, and diff-friendly. Commit your infrastructure topology to Git and roll back with a single command.

📡

Prometheus-Only

Every health state derives from a live PromQL query against your existing Prometheus instance. No agents, no collectors, no additional telemetry infrastructure to operate.

🏗️

Physical Hierarchy

Site → Room → Aisle → Rack → Device → Instance. Health states propagate upward — a failing node elevates its rack to CRIT, which propagates to the room level.

Physical drill-down

Zoom in. All the way.

Every alert is anchored to a precise physical location. Navigate progressively from a global overview to the exact device — at each level, only the relevant information is displayed.

🌍

GlobalAll sites — health summary, world map, active alerts

start here

🏢

DatacenterSite-level overview — rooms, live status, drill-down

🗺️

RoomFloor plan — aisle layout, rack grid, health heatmap

🔲

AisleRow of racks — aisle state, cooling zones

🖥️

RackFront/rear elevation — device placement, U occupancy

⚡

DeviceChassis or unit — instances, checks, live metrics

🔬

InstanceSingle node — health state, check results

Universal by design

Any metric. Any team.

Any metric exposed in Prometheus can become a visible health check in Rackscope — whether it originates from hardware, software, network infrastructure, or HPC workloads.

🔩

Hardware teams

Physical infrastructure

Server / rack health

Temperature & cooling

PDU load & power

Network infrastructure

Storage health

Cooling sensors

Prom
QL

💻

Software teams

Services & applications

Service availability

Application error rates

HPC workload states

Job queue depth

Any custom exporter

Plugin system

⚡

CMDB-agnostic. Generate your YAML topology from NetBox, RacksDB, any script, or use the API directly. No vendor lock-in — if your tools can write a file, Rackscope can read it.

Positioning

The physical layer that was missing.

Rackscope does not replace existing tools. It fills the gap between metrics dashboards and supervision platforms — adding the physical location of every alert to the monitoring chain.

📊

Grafana

Metrics & dashboards. Charts, panels, time series. Indicates what is happening.

"cpu_usage is 95%"

→

RACKSCOPE

🔭

Physical context

Bridges metrics to physical location. Answers where — which rack, which aisle, which room.

"Rack C04, Aisle 2, Machine Room A"

→

🚨

Supervision

Full monitoring & alerting. Nagios, Zabbix, PagerDuty. Determines what action to take.

"Ticket #4821 opened"

Not a replacement. The intermediate layer that was missing between your metrics dashboards and your supervision platform.

How it works

From Prometheus to physical view

Four steps — from your existing infrastructure to a live physical view. No agent to deploy, no database to provision.

01

📄

Define your topology

Write YAML files describing your physical infrastructure — sites, rooms, aisles, racks, devices. Or generate them from NetBox, RacksDB, any script, or the API.

topology.yaml

→

02

📡

Connect Prometheus

One URL. Point Rackscope at your existing Prometheus instance. No collector to deploy, no agent to install, nothing to change in your stack.

prometheus_url:

→

03

🔗

Map your checks

Any metric with the right labels becomes a visible health check. IPMI temperature, PDU load, software service status, Slurm node state — anything Prometheus scrapes.

expr: up{...}

→

04

🔭

See your infrastructure

Launch and navigate from global to instance level. When something is CRIT, you know exactly which rack, which aisle, which room — not just a hostname in an alert.