Skip to main content

Plugins, Simulator & Slurm API

This page covers the Plugin Discovery endpoints, the Simulator plugin API, the Slurm plugin API, and the Config & System endpoints.


Plugin Discovery

GET /api/plugins

Returns all registered plugins with their current status.

GET /api/plugins

Response

[
{"plugin_id": "simulator", "plugin_name": "Metrics Simulator", "enabled": true, "version": "1.0.0"},
{"plugin_id": "workload-slurm", "plugin_name": "Slurm Workload Manager", "enabled": true, "version": "1.0.0"}
]
FieldTypeDescription
plugin_idstringUnique plugin identifier
plugin_namestringHuman-readable plugin name
enabledbooleanWhether the plugin is currently active
versionstringPlugin version

GET /api/plugins/menu

Returns the sidebar navigation sections contributed by all active plugins. The frontend uses this endpoint to build dynamic navigation — each plugin registers its own menu sections and items.

GET /api/plugins/menu

Response

[
{
"id": "workload",
"label": "Workload",
"icon": "Zap",
"order": 50,
"items": [
{"id": "slurm-overview", "label": "Overview", "path": "/slurm/overview", "icon": "BarChart2"},
{"id": "slurm-nodes", "label": "Nodes", "path": "/slurm/nodes", "icon": "Server"},
{"id": "slurm-partitions", "label": "Partitions", "path": "/slurm/partitions", "icon": "Layers"},
{"id": "slurm-alerts", "label": "Alerts", "path": "/slurm/alerts", "icon": "AlertTriangle"}
]
},
{
"id": "simulator",
"label": "Simulator",
"icon": "FlaskConical",
"order": 200,
"items": [
{"id": "sim-control", "label": "Control", "path": "/editors/settings#simulator", "icon": "Settings"}
]
}
]
FieldTypeDescription
idstringSection identifier
labelstringDisplay label in the sidebar
iconstringLucide icon name
orderintegerSidebar sort order (lower = higher in sidebar)
itemsarrayNavigation items within this section
items[].pathstringFrontend route path

GET /api/plugins/\{plugin\_id\}

Returns details about a specific plugin.

GET /api/plugins/simulator

Response

{"plugin_id": "simulator", "plugin_name": "Metrics Simulator", "enabled": true, "version": "1.0.0"}

Returns 404 if the plugin is not registered.


GET /api/plugins/\{plugin\_id\}/config

Returns the full configuration for a specific plugin, read from its dedicated YAML file at config/plugins/<plugin_id>/config.yml.

GET /api/plugins/simulator/config

Response

{
"config": {
"incident_mode": "light",
"changes_per_hour": 2,
"update_interval_seconds": 20
},
"source": "file",
"path": "config/plugins/simulator/config/plugin.yaml"
}
FieldTypeDescription
configobjectFull plugin configuration dict
sourcestring"file" if loaded from disk, "defaults" if no file found
pathstringFilesystem path of the config file

POST /api/plugins/\{plugin\_id\}/config

Updates the configuration for a specific plugin. Writes the new config to the plugin's YAML file and hot-reloads it.

POST /api/plugins/simulator/config
Content-Type: application/json

{"config": {"incident_mode": "heavy", "changes_per_hour": 4}}

Notes

  • For the Simulator plugin, incident_mode and changes_per_hour apply on the next tick (~20 s).
  • Path changes (overrides_path, metrics_catalog_path) require a POST /api/simulator/restart.
  • Returns the updated config object on success.

Simulator Plugin

The Simulator plugin generates realistic Prometheus metrics for testing without real hardware. It is enabled by setting plugins.simulator.enabled: true in config/app.yaml. Prometheus scrapes the simulator and the backend queries Prometheus normally, making the demo environment behaviorally identical to production.


GET /api/simulator/status

Returns the current simulator status, including the active incident mode and number of active overrides.

GET /api/simulator/status

Response

{
"running": true,
"endpoint": "http://simulator:9000",
"update_interval": 20,
"incident_mode": "light",
"changes_per_hour": 2,
"overrides_count": 2
}
FieldTypeDescription
runningbooleanWhether the simulator process is reachable
endpointstringSimulator scrape endpoint used by Prometheus
update_intervalintegerMetric refresh interval in seconds
incident_modestringActive incident mode (full_ok / light / medium / heavy / chaos / custom)
changes_per_hourintegerHow many times per hour incidents are reshuffled
overrides_countintegerNumber of active metric overrides

POST /api/simulator/restart

Sends a restart signal to the simulator container via its internal control server (port 9001). Docker restarts the container automatically (restart: unless-stopped). Use this after changing overrides_path or metrics_catalog_path which are not hot-reloaded.

POST /api/simulator/restart

Response

{"status": "restarting"}

Returns 503 if the simulator control server is unreachable.


POST /api/simulator/incidents

Triggers a simulated incident stored as an override. Currently supports rack_down; aisle_cooling is reserved but not yet implemented.

POST /api/simulator/incidents
Content-Type: application/json

Request body

FieldTypeRequiredDescription
typestringyes"rack_down" (aisle_cooling reserved)
target_idstringyesRack ID to bring down
durationintegernoDuration in seconds (default: 300). 0 = permanent.

Response

{
"status": "triggered",
"incident_type": "rack_down",
"target_id": "r01-01",
"duration": 300,
"expires_at": 1770000000
}

The incident is stored as a rack_down override and expires after duration seconds. Use DELETE /api/simulator/overrides to clear immediately.


GET /api/simulator/overrides

Returns all currently active metric overrides.

GET /api/simulator/overrides

Response

{
"overrides": [
{"id": "ov-001", "instance": "compute001", "rack_id": null, "metric": "up", "value": 0, "expires_at": null},
{"id": "ov-002", "instance": "compute002", "rack_id": null, "metric": "node_temperature_celsius", "value": 90, "expires_at": 1709251200}
]
}
FieldTypeDescription
idstringOverride identifier
instancestring or nullTarget node instance name
rack_idstring or nullTarget rack ID (for rack-level overrides)
metricstringMetric name to override
valuenumberOverride value
expires_atinteger or nullUnix timestamp when override expires, or null if permanent

POST /api/simulator/overrides

Adds a new metric override. Use overrides to simulate failures, temperature spikes, or power anomalies without restarting the simulator.

POST /api/simulator/overrides
Content-Type: application/json

Request body examples

Force a node down permanently:

{"instance": "compute001", "metric": "up", "value": 0, "ttl_seconds": 0}

Simulate a high temperature for 5 minutes:

{"instance": "compute001", "metric": "node_temperature_celsius", "value": 90, "ttl_seconds": 300}

Override an entire rack PDU:

{"rack_id": "a01-r01", "metric": "up", "value": 0}

Request fields

FieldTypeRequiredDescription
instancestringConditionalTarget node instance name. Either instance or rack_id must be provided.
rack_idstringConditionalTarget rack ID. Either instance or rack_id must be provided.
metricstringYesMetric name to override (see GET /api/simulator/metrics)
valuenumberYesValue to inject
ttl_secondsintegerNoDuration in seconds. 0 = permanent. Omit to use the default TTL from config.

Response

Returns the updated override list:

{"overrides": []}

DELETE /api/simulator/overrides

Clears all active overrides immediately.

DELETE /api/simulator/overrides

Response

{"overrides": []}

DELETE /api/simulator/overrides/{override_id}

Deletes a specific override by its ID.

DELETE /api/simulator/overrides/{override_id}

Path parameter

ParameterDescription
override_idThe override ID returned by GET /api/simulator/overrides

Response

Returns the remaining override list:

{"overrides": []}

GET /api/simulator/metrics

Returns all metrics available for override, grouped by category.

GET /api/simulator/metrics

Response

{
"metrics": [
{"id": "node_temperature", "name": "Node Temperature", "unit": "°C", "category": "temperature"},
{"id": "up", "name": "Node Up", "unit": "", "category": "compute"},
{"id": "pdu_active_power", "name": "PDU Active Power", "unit": "W", "category": "power"}
]
}
FieldTypeDescription
idstringMetric identifier used in override requests
namestringHuman-readable metric name
unitstringMeasurement unit (empty string if dimensionless)
categorystringGrouping category (compute, temperature, power, etc.)

Slurm Plugin

The Slurm plugin reads node states from Prometheus via a Slurm exporter and maps them to the physical topology. It provides workload-aware views for HPC cluster operations. The plugin is only available when slurm.enabled: true is set in config/app.yaml.

Node states are mapped to health severities using slurm.status_map in the application config. For example: allocatedOK, drainCRIT, downCRIT.


GET /api/slurm/rooms/{room_id}/nodes

Returns Slurm node states for all nodes in a given room, keyed by instance name. Used by the Slurm Wallboard view to color-code devices by workload state.

GET /api/slurm/rooms/{room_id}/nodes

Path parameter

ParameterDescription
room_idThe room identifier from the topology

Response

{
"room_id": "dc1-r001",
"nodes": {
"compute001": {
"status": "allocated",
"severity": "OK",
"statuses": ["allocated"],
"partitions": ["compute", "all"]
},
"compute002": {
"status": "drain",
"severity": "CRIT",
"statuses": ["drain"],
"partitions": ["compute"]
}
}
}
FieldTypeDescription
room_idstringRoom identifier
nodesobjectMap of instance name → node state
nodes[].statusstringPrimary Slurm status
nodes[].severitystringMapped severity: OK, WARN, CRIT, or UNKNOWN
nodes[].statusesarrayAll Slurm statuses reported for the node
nodes[].partitionsarrayPartitions the node belongs to

GET /api/slurm/summary

Returns an aggregate summary of node counts by Slurm status and health severity. Optionally scoped to a single room.

GET /api/slurm/summary?room_id=dc1-r001

Query parameter

ParameterRequiredDescription
room_idNoScope the summary to a specific room. Omit for cluster-wide totals.

Response

{
"room_id": null,
"total_nodes": 320,
"by_status": {
"allocated": 280,
"idle": 24,
"down": 8,
"drain": 6,
"mixed": 2
},
"by_severity": {
"OK": 306,
"WARN": 6,
"CRIT": 8,
"UNKNOWN": 0
}
}

GET /api/slurm/partitions

Returns per-partition node count breakdowns. Optionally scoped to a single room.

GET /api/slurm/partitions?room_id=dc1-r001

Query parameter

ParameterRequiredDescription
room_idNoScope to a specific room. Omit for cluster-wide data.

Response

{
"room_id": null,
"partitions": {
"compute": {"allocated": 200, "idle": 15, "down": 5, "drain": 4, "mixed": 2},
"visu": {"allocated": 8, "idle": 4, "down": 0, "drain": 0, "mixed": 0},
"all": {"allocated": 280, "idle": 24, "down": 8, "drain": 6, "mixed": 2}
}
}

GET /api/slurm/nodes

Returns the full flat node list with Slurm state and topology placement context. Used by the Node List dashboard view.

GET /api/slurm/nodes?room_id=dc1-r001

Query parameter

ParameterRequiredDescription
room_idNoFilter nodes to a specific room. Omit for all nodes.

Response

{
"room_id": null,
"nodes": [
{
"node": "compute001",
"status": "allocated",
"severity": "OK",
"statuses": ["allocated"],
"partitions": ["compute"],
"site_id": "dc1",
"room_id": "dc1-r001",
"aisle_id": "a01",
"rack_id": "a01-r01",
"device_id": "compute-blade-01"
}
]
}
FieldTypeDescription
nodestringSlurm node name (matched via slurm.mapping_path if configured)
statusstringPrimary Slurm status
severitystringMapped severity (OK, WARN, CRIT, UNKNOWN)
statusesarrayAll reported Slurm statuses
partitionsarrayPartitions the node belongs to
site_idstring or nullTopology site ID
room_idstring or nullTopology room ID
aisle_idstring or nullTopology aisle ID
rack_idstring or nullTopology rack ID
device_idstring or nullTopology device ID

GET /api/slurm/mapping

Returns current node name → topology instance mapping entries.

GET /api/slurm/mapping

Response

{
"mapping_path": "config/plugins/slurm/node_mapping.yaml",
"entries": [
{"node": "n*", "instance": "compute*"},
{"node": "gpu001", "instance": "gpu001"}
]
}

POST /api/slurm/mapping

Saves node mapping entries to the configured YAML file. Used by the node mapping editor in Settings → Plugins → Slurm.

POST /api/slurm/mapping
Content-Type: application/json

Request body

{
"entries": [
{"node": "n*", "instance": "compute*"}
]
}

Returns 400 if mapping_path is not configured.


GET /api/slurm/metrics/catalog

Returns all loaded Slurm metric definitions and the list of available catalog files.

GET /api/slurm/metrics/catalog

Response

{
"metrics": [...],
"loaded_files": [{"id": "slurm", "path": "...", "enabled": true}],
"available_files": [{"name": "metrics_slurm.yaml", "path": "..."}]
}

POST /api/slurm/metrics/catalog/config

Updates which Slurm metric catalog files are active (persisted to plugin config).

POST /api/slurm/metrics/catalog/config
Content-Type: application/json

Request body

{
"metrics_catalogs": [
{"id": "slurm", "path": "config/plugins/slurm/metrics/metrics_slurm.yaml", "enabled": true}
]
}

GET /api/slurm/metrics/data

Queries Prometheus for a specific Slurm metric from the loaded catalog.

GET /api/slurm/metrics/data?metric_id=slurm_node_status&scope=all
ParameterTypeDescription
metric_idstringMetric ID from the Slurm metrics catalog
scopestringOptional scope filter

Config & System


GET /api/config

Returns the full application configuration as a JSON object. This reflects the contents of config/app.yaml at the time of the last reload.

GET /api/config

Response

The full AppConfig object. See Configuration Reference for the complete schema.


PUT /api/config

Updates the application configuration and persists the changes to config/app.yaml. Triggers a config reload and syncs dependent plugin configurations (simulator incident mode, Slurm settings, etc.).

PUT /api/config
Content-Type: application/json

Request body

The full AppConfig object. Sensitive fields such as password_hash and secret_key are preserved from the current config if they are not included in the request body.

Notes

  • Prometheus URL and credential changes take effect on the next query.
  • Simulator incident_mode and changes_per_hour are hot-reloaded on the next tick (~20 s). Path changes (overrides_path, metrics_catalog_path) require a POST /api/simulator/restart.
  • Slurm label and status map changes apply to the next state fetch.

GET /api/env

Returns the environment variables that affect Rackscope's behavior. Useful for debugging deployment configuration.

GET /api/env

Response

{
"RACKSCOPE_APP_CONFIG": "config/app.yaml",
"PROMETHEUS_URL": "http://prometheus:9090",
"RACKSCOPE_CONFIG_DIR": null
}
VariableDescription
RACKSCOPE_APP_CONFIGPath to the main application config file
PROMETHEUS_URLPrometheus base URL (from config or environment)
RACKSCOPE_CONFIG_DIRBase config directory override (null if not set)

GET /api/system/status

Returns the current status of the backend process.

GET /api/system/status

Response

{"status": "running", "pid": 12345}
FieldTypeDescription
statusstringAlways "running" when the backend responds
pidintegerBackend process ID

POST /api/system/restart

Triggers a backend server restart. Only available when the backend is running in development mode with uvicorn --reload.

POST /api/system/restart

Response

{"status": "ok", "message": "Backend restart initiated"}
warning

This endpoint is intended for development use only. It requires the backend to be started with uvicorn --reload. It has no effect in production deployments.


Authentication

These endpoints manage user credentials. They are available whether or not auth.enabled is true in app.yaml — credential updates always go through these endpoints.


POST /api/auth/change-password

Updates the user's password. Verifies the current password before applying the change, then writes the new bcrypt hash to config/app.yaml and hot-reloads the config.

POST /api/auth/change-password
Content-Type: application/json

{
"current_password": "old-password",
"new_password": "new-secure-password"
}

Request body

FieldTypeDescription
current_passwordstringThe user's current password (verified against the stored bcrypt hash)
new_passwordstringThe new password (validated against auth.policy)

Response

{"ok": true}

Errors

StatusReason
401current_password does not match the stored hash
422new_password fails policy validation (too short, missing digit, etc.)

Password policy rules are configured under auth.policy in app.yaml:

auth:
policy:
min_length: 6
max_length: 128
require_digit: false
require_symbol: false

POST /api/auth/change-username

Updates the username. Requires the current password for verification. Writes the new username to config/app.yaml and hot-reloads the config.

POST /api/auth/change-username
Content-Type: application/json

{
"new_username": "newname",
"password": "current-password"
}

Request body

FieldTypeDescription
new_usernamestringThe new username (must not be empty or whitespace-only)
passwordstringThe current password (for verification)

Response

{"ok": true, "username": "newname"}

Errors

StatusReason
401password does not match the stored hash
422new_username is empty or whitespace-only