Skip to main content

Metrics API

The Metrics API exposes the metrics library and live time-series data sourced from Prometheus. It is the primary interface used by the frontend for charts, metric discovery, and simulator configuration.

Architecture note

Rackscope does not collect or store metrics. All data is queried live from Prometheus at request time. The metrics library is a YAML-driven catalog that describes how to query and display each metric — not the data itself.


Endpoints Overview

MethodPathDescription
GET/api/metrics/libraryList all metrics (with optional filtering)
GET/api/metrics/library/{metric_id}Get a single metric definition
GET/api/metrics/library/filesList all YAML files in the metrics library directory
GET`/api/metrics/library/files/{name}`Read raw YAML content of a specific metric file
PUT`/api/metrics/library/files/{name}`Create or update a metric YAML file
DELETE`/api/metrics/library/files/{name}`Delete a metric YAML file
GET/api/metrics/dataQuery live time-series data from Prometheus
GET/api/metrics/categoriesList all unique categories
GET/api/metrics/tagsList all unique tags
GET/api/metrics/filesList YAML files in the metrics library directory (legacy)

GET /api/metrics/library

List all metric definitions in the library. Supports filtering by category or tag.

Query Parameters

ParameterTypeRequiredDescription
categorystringNoFilter by category. One of: temperature, power, compute, storage, network, infrastructure
tagstringNoFilter by tag (e.g. compute, hardware, ipmi, eseries, pdu, slurm)

Response

{
"count": 38,
"metrics": [
{
"id": "node_temperature",
"name": "Node Temperature",
"description": "CPU/IPMI temperature sensor",
"metric": "node_hwmon_temp_celsius",
"category": "temperature",
"tags": ["compute", "hardware"],
"display": {
"unit": "°C",
"chart_type": "line",
"color": "#ef4444",
"time_ranges": ["1h", "6h", "24h", "7d"],
"default_range": "24h",
"aggregation": "avg",
"thresholds": { "warn": 70, "crit": 85 }
}
}
]
}

Response Fields

FieldTypeDescription
countintegerTotal number of metrics returned
metricsarrayList of metric definition objects
metrics[].idstringUnique metric identifier
metrics[].namestringHuman-readable display name
metrics[].descriptionstringShort description of what the metric measures
metrics[].metricstringPrometheus metric name used in PromQL queries
metrics[].categorystringCategory for grouping and filtering
metrics[].tagsarray of stringsTags for secondary filtering
metrics[].displayobjectDisplay configuration (see below)
metrics[].display.unitstringUnit string shown in the UI (e.g. °C, W, A)
metrics[].display.chart_typestringChart type: line, bar, or gauge
metrics[].display.colorstringHex color for the chart series
metrics[].display.time_rangesarray of stringsAllowed time range options
metrics[].display.default_rangestringDefault time range selected in UI
metrics[].display.aggregationstringDefault aggregation function
metrics[].display.thresholdsobjectOptional warn/crit thresholds for visual indicators

Examples

# All metrics
curl "http://localhost:8000/api/metrics/library"

# Temperature metrics only
curl "http://localhost:8000/api/metrics/library?category=temperature"

# All metrics tagged with 'ipmi'
curl "http://localhost:8000/api/metrics/library?tag=ipmi"

# Storage metrics tagged with 'eseries'
curl "http://localhost:8000/api/metrics/library?category=storage&tag=eseries"

GET /api/metrics/library/{metric_id}

Retrieve a single metric definition by its unique identifier.

Path Parameters

ParameterDescription
metric_idMetric identifier (e.g. node_temperature, pdu_active_power)

Response

Returns a single metric object with the same structure as entries in the library list response.

{
"id": "pdu_active_power",
"name": "PDU Active Power",
"description": "Active power draw reported by the PDU",
"metric": "pdu_active_power_watts",
"category": "power",
"tags": ["pdu", "infrastructure"],
"display": {
"unit": "W",
"chart_type": "line",
"color": "#f59e0b",
"time_ranges": ["1h", "6h", "24h", "7d"],
"default_range": "6h",
"aggregation": "avg",
"thresholds": { "warn": 4000, "crit": 5000 }
}
}

Error Responses

StatusCondition
404 Not FoundNo metric with the given metric_id exists in the library
503 Service UnavailableMetrics library has not been loaded (check backend startup logs)

Example

curl "http://localhost:8000/api/metrics/library/node_temperature"

Metrics Library Files (CRUD)

These endpoints allow you to list, read, create, update, and delete metric YAML files directly via the API. The metrics library is automatically reloaded after every write or delete operation.

Visual editor

The Metrics Library Editor at /editors/metrics provides a visual UI for managing metric definitions without editing raw YAML. Use these API endpoints for automation, importers, or CI/CD workflows.


GET /api/metrics/library/files

List all YAML files present in the metrics library directory.

Response

{
"files": [
{
"name": "node_temperature.yaml",
"path": "/app/config/metrics/library/node_temperature.yaml"
},
{
"name": "pdu_active_power.yaml",
"path": "/app/config/metrics/library/pdu_active_power.yaml"
}
]
}

Response Fields

FieldTypeDescription
filesarrayList of file entries found in the library directory
files[].namestringFilename (basename, e.g. node_temperature.yaml)
files[].pathstringAbsolute path inside the container

Example

curl "http://localhost:8000/api/metrics/library/files"

GET /api/metrics/library/files/{name}

Read the raw YAML content of a specific metric file. The {name} parameter is the filename including the .yaml extension.

Path Parameters

ParameterDescription
nameFilename to read (e.g. node_temperature.yaml)

Response

{
"name": "node_temperature.yaml",
"content": "id: node_temperature\nname: Node Temperature\n..."
}

Error Responses

StatusCondition
404 Not FoundNo file with the given name exists in the library directory
503 Service UnavailableMetrics library configuration has not been loaded

Example

curl "http://localhost:8000/api/metrics/library/files/node_temperature.yaml"

PUT /api/metrics/library/files/{name}

Create or update a metric YAML file. The metrics library is automatically reloaded after the file is saved.

Path Parameters

ParameterDescription
nameFilename to create or overwrite (e.g. my_metric.yaml)

Request Body

{
"content": "id: my_metric\nname: My Metric\ndescription: A custom metric\nmetric: my_prometheus_metric\ndisplay:\n unit: W\n chart_type: line\n aggregation: avg\ncategory: power\ntags: [infrastructure]"
}
FieldTypeRequiredDescription
contentstringYesRaw YAML content to write to the file

Response

{ "status": "ok", "name": "my_metric.yaml" }

Error Responses

StatusCondition
400 Bad RequestThe provided YAML content is invalid and could not be parsed
503 Service UnavailableMetrics library configuration has not been loaded

Example

curl -X PUT "http://localhost:8000/api/metrics/library/files/my_metric.yaml" \
-H "Content-Type: application/json" \
-d '{
"content": "id: my_metric\nname: My Metric\nmetric: my_prometheus_metric\ncategory: power\ndisplay:\n unit: W\n chart_type: line\n aggregation: avg"
}'

DELETE /api/metrics/library/files/{name}

Delete a metric YAML file and reload the metrics library. This operation is irreversible.

Path Parameters

ParameterDescription
nameFilename to delete (e.g. my_metric.yaml)

Response

{ "status": "deleted", "name": "my_metric.yaml" }

Error Responses

StatusCondition
404 Not FoundNo file with the given name exists in the library directory
503 Service UnavailableMetrics library configuration has not been loaded

Example

curl -X DELETE "http://localhost:8000/api/metrics/library/files/my_metric.yaml"

GET /api/metrics/data

Query live time-series metric data from Prometheus. This is the primary endpoint consumed by frontend charts.

Performance

Each call to this endpoint issues one or more range queries against Prometheus. Avoid calling it in tight loops or for many instances simultaneously — prefer batching or using the Telemetry Planner for health state queries instead.

Query Parameters

ParameterTypeRequiredDefaultDescription
metric_idstringYesMetric ID from the library (e.g. node_temperature)
target_idstringYesInstance or node name to query (e.g. compute001, a01-r01)
time_rangestringNo24hTime window to query. One of: 1h, 6h, 24h, 7d, 30d
aggregationstringNometric defaultOverride the aggregation function. One of: avg, max, min, sum, p95, p99
stepstringNo1mPrometheus query resolution step. One of: 1m, 5m, 15m, 1h

Response

{
"metric_id": "node_temperature",
"target_id": "compute001",
"time_range": "24h",
"step": "1m",
"unit": "°C",
"aggregation": "avg",
"query": "avg_over_time(node_hwmon_temp_celsius{instance=\"compute001\"}[5m])",
"series": [
{
"metric": { "instance": "compute001", "job": "node_exporter" },
"values": [
[1706745600, "42.5"],
[1706745660, "43.1"],
[1706745720, "42.8"]
]
}
]
}

Response Fields

FieldTypeDescription
metric_idstringThe metric ID that was queried
target_idstringThe target instance that was queried
time_rangestringThe time range used
stepstringThe resolution step used
unitstringDisplay unit (from the metric definition)
aggregationstringAggregation function applied
querystringThe exact PromQL query that was sent to Prometheus
seriesarrayList of time series returned by Prometheus
series[].metricobjectLabel set identifying this series
series[].valuesarrayList of [unix_timestamp, string_value] pairs
Values format

The values array uses the standard Prometheus range query format: each entry is a two-element array [unix_timestamp, string_value]. The timestamp is a Unix epoch integer; the value is a string representation of the float (e.g. "42.5"). Parse the value with parseFloat() on the frontend.

Error Responses

StatusCondition
400 Bad Requestmetric_id not found in library, or target_id is missing/invalid
500 Internal Server ErrorPrometheus query failed (check Prometheus connectivity)
503 Service UnavailableMetrics library has not been loaded

Examples

# Temperature for compute001 over the last 24 hours
curl "http://localhost:8000/api/metrics/data?metric_id=node_temperature&target_id=compute001&time_range=24h"

# PDU active power for rack a01-r01 over the last 6 hours, at 5-minute resolution
curl "http://localhost:8000/api/metrics/data?metric_id=pdu_active_power&target_id=a01-r01&time_range=6h&step=5m"

# Peak CPU load for compute042 over the last hour
curl "http://localhost:8000/api/metrics/data?metric_id=node_cpu_load&target_id=compute042&time_range=1h&aggregation=max"

GET /api/metrics/categories

List all unique metric categories present in the loaded library.

Response

{
"categories": [
"compute",
"infrastructure",
"network",
"power",
"storage",
"temperature"
]
}

Categories are returned in alphabetical order.

Example

curl "http://localhost:8000/api/metrics/categories"

GET /api/metrics/tags

List all unique tags across all metrics in the loaded library.

Response

{
"tags": [
"compute",
"eseries",
"hardware",
"hpc",
"infrastructure",
"ipmi",
"network",
"pdu",
"sequana3",
"slurm",
"storage"
]
}

Tags are returned in alphabetical order.

Example

curl "http://localhost:8000/api/metrics/tags"

GET /api/metrics/files

List YAML files present in the metrics library directory. Used internally by the simulator and the Settings UI to enumerate available configuration files.

Response

{
"files": [
{
"name": "node_temperature.yaml",
"path": "config/metrics/library/node_temperature.yaml"
},
{
"name": "pdu_active_power.yaml",
"path": "config/metrics/library/pdu_active_power.yaml"
}
]
}

Response Fields

FieldTypeDescription
filesarrayList of file entries found in the library directory
files[].namestringFilename (basename only)
files[].pathstringRelative path from the project root

Example

curl "http://localhost:8000/api/metrics/files"

Metric Definition Format

Each metric is defined as a YAML file in config/metrics/library/. A single file may contain one or more metric definitions.

metrics:
- id: node_temperature
name: "Node Temperature"
description: "CPU/IPMI temperature sensor reported by node_exporter hwmon"
metric: node_hwmon_temp_celsius
category: temperature
tags:
- compute
- hardware
- ipmi
display:
unit: "°C"
chart_type: line
color: "#ef4444"
time_ranges: ["1h", "6h", "24h", "7d"]
default_range: "24h"
aggregation: avg
thresholds:
warn: 70
crit: 85

Field Reference

FieldTypeRequiredDescription
idstringYesUnique identifier used in API calls and template references
namestringYesHuman-readable label shown in the UI
descriptionstringNoShort explanation of what the metric measures
metricstringYesPrometheus metric name (used to construct PromQL queries)
categorystringYesCategory for filtering and UI grouping
tagslist of stringsNoSecondary labels for finer filtering
display.unitstringNoUnit string appended to values in the UI
display.chart_typestringNoline (default), bar, or gauge
display.colorstringNoHex color for the chart series
display.time_rangeslist of stringsNoTime ranges offered in the UI dropdown
display.default_rangestringNoDefault time range pre-selected in the UI
display.aggregationstringNoDefault aggregation: avg, max, min, sum, p95, p99
display.thresholds.warnnumberNoValue above which the UI shows a warning indicator
display.thresholds.critnumberNoValue above which the UI shows a critical indicator
Adding a new metric
  1. Create or edit a YAML file in config/metrics/library/.
  2. Add your metric definition following the format above.
  3. Restart the backend: make restart (or docker compose restart backend).
  4. Verify it appears at GET /api/metrics/library.

No code changes required — the library is fully configuration-driven.


Available Categories

CategoryDescriptionTypical metrics
temperatureThermal sensors from IPMI, hwmon, or chassis managementCPU temperature, chassis inlet/outlet, drive temperature
powerPower consumption at device, rack, or PDU levelNode power draw, PDU active power, rack total wattage
computeCPU and memory utilization metricsCPU load, memory used/free, system load average
storageDisk and array health or throughput metricsDrive status, controller IOPS, array rebuild progress
networkInterface state and throughput metricsPort link state, port speed, RX/TX bytes
infrastructureRack-level shared components: PDUs, cooling, humidityPDU current, PDU voltage, fan speed, humidity

Workflow Example

The following example walks through a typical frontend chart rendering flow.

Step 1 — Discover temperature metrics

GET /api/metrics/library?category=temperature
{
"count": 3,
"metrics": [
{ "id": "node_temperature", "name": "Node Temperature", ... },
{ "id": "chassis_inlet_temp", "name": "Chassis Inlet Temperature", ... },
{ "id": "drive_temperature", "name": "Drive Temperature", ... }
]
}

Step 2 — Query a node's temperature over 24 hours

GET /api/metrics/data?metric_id=node_temperature&target_id=compute001&time_range=24h
{
"metric_id": "node_temperature",
"target_id": "compute001",
"time_range": "24h",
"step": "1m",
"unit": "°C",
"aggregation": "avg",
"query": "avg_over_time(node_hwmon_temp_celsius{instance=\"compute001\"}[5m])",
"series": [
{
"metric": { "instance": "compute001", "job": "node_exporter" },
"values": [
[1706745600, "42.5"],
[1706745660, "43.1"],
[1706745720, "42.8"]
]
}
]
}

Pass series[0].values directly to a Chart.js dataset — map values[i][0] to the x-axis timestamp and parseFloat(values[i][1]) to the y-axis.

Step 3 — Plot PDU power over 6 hours for a rack

GET /api/metrics/data?metric_id=pdu_active_power&target_id=a01-r01&time_range=6h&step=5m

Use step=5m for PDU and rack-level metrics to reduce the number of data points returned and keep chart rendering fast. The target_id here is the rack or PDU instance name as defined in the topology.

Tip — choosing step

Smaller steps (1m) give higher resolution but return more points. For time ranges of 24h or longer, prefer step=5m or step=15m to keep response sizes reasonable and Prometheus load low.