Skip to main content

Rackscope REST API

Rackscope exposes a JSON REST API at http://localhost:8000.

Interactive docs

Browse and test every endpoint directly in Swagger UI — automatically generated from the FastAPI code.

Base URL

http://localhost:8000

All paths are relative to this base. In production, replace with your server hostname.

Authentication

Authentication is optional and disabled by default. When disabled, all endpoints are publicly accessible with no credentials required.

Enable it in config/app.yaml:

auth:
enabled: true
username: admin
password_hash: $2b$12$... # bcrypt hash

Login

curl -X POST http://localhost:8000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "yourpassword"}'
{
"access_token": "eyJhbGciOiJIUzI1NiJ9...",
"token_type": "bearer",
"expires_in": 28800,
"username": "admin"
}

Using the token

Pass the token in the Authorization header for all subsequent requests:

curl http://localhost:8000/api/rooms \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiJ9..."

Auth endpoints

MethodPathDescription
GET/api/auth/statusCheck whether auth is enabled and configured
POST/api/auth/loginValidate credentials and receive a JWT
GET/api/auth/meReturn the currently authenticated user
POST/api/auth/change-passwordUpdate password (writes new bcrypt hash to app.yaml)
POST/api/auth/change-usernameUpdate username (requires current password for verification)

Response Format

All endpoints return JSON. Successful responses vary by endpoint — see each endpoint's section for the exact schema.

All error responses share the same envelope:

{
"detail": "Error message describing what went wrong"
}

HTTP Status Codes

CodeMeaning
200Success
400Bad request — validation error or conflict
401Unauthorized — invalid or missing token
404Resource not found
500Internal server error
502Bad gateway — Prometheus unreachable
503Service unavailable — configuration not loaded
504Timeout — Prometheus query timed out

Health States

Every entity (node, device, rack, room, site) carries one of four health states:

StateMeaning
OKAll checks passing
WARNAt least one warning threshold exceeded
CRITAt least one critical threshold exceeded
UNKNOWNNo data from Prometheus or check error

States propagate upward through the hierarchy: Node → Device → Rack → Room → Site. The worst state wins at each level (CRIT beats WARN beats UNKNOWN beats OK).

API Groups

GroupBase pathDescription
Telemetry/api/Health states, alerts, room/rack states, stats
Topology/api/topology/Sites, rooms, aisles, racks, devices (CRUD)
Catalog/api/catalog/Device and rack hardware templates
Checks/api/checks/Health check library
Metrics/api/metrics/Metrics library and live time-series queries
Plugins/api/plugins/Plugin discovery and dynamic menu
Simulator/api/simulator/Demo mode control and metric overrides
Slurm/api/slurm/HPC workload manager states
Config/api/configApplication configuration read/write
System/api/system/Backend management (status, restart, process metrics)

System endpoints

MethodPathDescription
GET/api/system/statusLiveness probe — returns { "status": "running", "pid": ... }
POST/api/system/restartTrigger a uvicorn reload (dev mode only)
GET/api/system/process-statsMemory and CPU usage for backend, simulator and Prometheus

GET /api/system/process-stats

Returns live process metrics for the three core services. The backend reads its own stats from /proc/self/; simulator and Prometheus stats are fetched asynchronously.

curl http://localhost:8000/api/system/process-stats
{
"backend": {
"memory_bytes": 108482560,
"cpu_seconds": 2.53,
"available": true
},
"simulator": {
"memory_bytes": 820785152,
"cpu_seconds": 875.96,
"available": true
},
"prometheus": {
"memory_bytes": 3788701696,
"cpu_seconds": 1085.25,
"available": true
}
}

Each service block contains:

FieldTypeDescription
memory_bytesnumber | nullResident set size in bytes (null if unavailable)
cpu_secondsnumber | nullTotal CPU time in seconds since process start
availablebooleanWhether the service was reachable
note

Simulator metrics are queried via the Prometheus API (not the /metrics endpoint directly) to avoid timeouts on large topologies. If the simulator is not enabled, available will be false.


Quick Start

# Liveness probe
curl http://localhost:8000/healthz

# Global infrastructure summary
curl http://localhost:8000/api/stats/global

# All rooms with rack counts
curl http://localhost:8000/api/rooms

# Room health state with per-rack breakdown
curl http://localhost:8000/api/rooms/dc1-r001/state

# Rack health only — fast (~30ms)
curl http://localhost:8000/api/racks/a01-r01/state

# Rack health + metrics — slower (~743ms, 20+ Prometheus queries)
curl "http://localhost:8000/api/racks/a01-r01/state?include_metrics=true"

# All active WARN/CRIT alerts with topology context
curl http://localhost:8000/api/alerts/active