Skip to main content

app.yaml — Complete Reference

The config/app.yaml file is the central configuration file for Rackscope. Every aspect of the system — from Prometheus connectivity to plugin behavior — is controlled here. The file is loaded at startup and can be reloaded without restarting the container via the Settings UI or by calling POST /api/config/reload.

Quick Navigation

SectionPurpose
appApplication identity
pathsFile system paths for all config
refreshState polling intervals
cachePrometheus query cache TTLs
telemetryPrometheus connection and authentication
plannerBatching and caching for PromQL execution
featuresFeature flags
authUI authentication
mapWorld map defaults
playlistNOC screen rotation
pluginsPlugin enable/disable flags

app

Application identity shown in the browser title and header.

app:
name: Rackscope
description: Datacenter Monitoring
KeyTypeDefaultDescription
namestringRackscopeApplication name displayed in the UI header and browser tab
descriptionstringDatacenter OverviewShort subtitle shown below the name

paths

Filesystem paths (relative to the working directory, or absolute) for the four configuration directories that Rackscope loads at startup.

paths:
topology: config/topology
templates: config/templates
checks: config/checks/library
metrics: config/metrics/library
KeyTypeDefaultDescription
topologystring— (required)Path to topology root. Can be a directory (segmented layout) or a single topology.yaml file
templatesstring— (required)Path to device and rack template directories
checksstring— (required)Path to health checks library directory (YAML files, one per family)
metricsstringconfig/metrics/libraryPath to metrics library directory (YAML files describing queryable metrics)
Segmented vs monolithic topology

When topology points to a directory, Rackscope expects the segmented layout: sites.yaml at the root plus per-site/room/aisle/rack files. When it points to a single .yaml file, the entire topology is loaded from that file. The segmented layout is strongly recommended for production environments.


refresh

Controls how often Rackscope re-fetches health states from Prometheus for room and rack views. These intervals affect how quickly changes appear in the UI.

refresh:
room_state_seconds: 60
rack_state_seconds: 60
KeyTypeDefaultMinDescription
room_state_secondsinteger3010How often room-level health state is refreshed (seconds)
rack_state_secondsinteger3010How often rack-level health state is refreshed (seconds)
Performance impact

Lower values increase Prometheus query frequency. For large topologies (hundreds of racks), keep these at 60 seconds or higher. The TelemetryPlanner batches all queries, so the actual Prometheus load is much lower than the number of devices might suggest.


cache

Controls the time-to-live (TTL) for different categories of Prometheus query results stored in Rackscope's in-process cache. Separate TTLs allow fast health feedback while reducing load from expensive metric queries.

cache:
ttl_seconds: 60 # Generic cache (backward compatibility)
health_checks_ttl_seconds: 30 # Health check queries
metrics_ttl_seconds: 120 # Detailed metrics
KeyTypeDefaultMinDescription
ttl_secondsinteger301Generic cache TTL. Used for queries that do not fall into the categories below. Kept for backward compatibility
health_checks_ttl_secondsinteger301TTL for health check query results. Shorter = more responsive to failures
metrics_ttl_secondsinteger1201TTL for detailed metric queries (temperature, power, PDU). Longer = fewer heavy Prometheus calls
Choosing TTL values
  • health_checks_ttl_seconds: 30 is a good balance — failures appear within 30 s.
  • metrics_ttl_seconds: 120 is intentional: metric charts are expensive and do not need sub-minute refresh. Reduce only if users need near-realtime metric graphs.

telemetry

All configuration for connecting to Prometheus, including URL, authentication, TLS, and diagnostic tunables.

telemetry:
prometheus_url: http://prometheus:9090
identity_label: instance
rack_label: rack_id
chassis_label: chassis_id
job_regex: node|rackscope-simulator
prometheus_heartbeat_seconds: 30
prometheus_latency_window: 20
debug_stats: false
basic_auth_user: null
basic_auth_password: null
tls_verify: false
tls_ca_file: null
tls_cert_file: null
tls_key_file: null

Connection

KeyTypeDefaultDescription
prometheus_urlstring | nullnullFull URL of the Prometheus HTTP API. Example: http://prometheus:9090 or https://prometheus.example.com
identity_labelstringinstancePrometheus label that maps to topology instance names. Rackscope uses this label to match metric series to physical devices
rack_labelstringrack_idPrometheus label used to identify rack-scoped metrics
chassis_labelstringchassis_idPrometheus label used to identify chassis-scoped metrics
job_regexstring.*Regular expression matched against the Prometheus job label to filter relevant scrape targets. Example: node|rackscope-simulator matches two jobs
identity_label is critical

If your node_exporter labels use node instead of instance, set identity_label: node. Mismatched labels cause all devices to show UNKNOWN state.

Diagnostic tunables

KeyTypeDefaultMinDescription
prometheus_heartbeat_secondsinteger3010Interval in seconds for the background Prometheus reachability probe. Shown in the connection status indicator
prometheus_latency_windowinteger201Number of samples used to compute the rolling average query latency shown in diagnostics
debug_statsbooleanfalseWhen true, logs per-query timing and cache hit/miss statistics to the backend log. Useful for diagnosing slow views

Basic authentication

telemetry:
basic_auth_user: monitoring
basic_auth_password: s3cr3t
KeyTypeDefaultDescription
basic_auth_userstring | nullnullHTTP Basic Auth username for Prometheus
basic_auth_passwordstring | nullnullHTTP Basic Auth password. Requires basic_auth_user to also be set

TLS

telemetry:
tls_verify: true
tls_ca_file: config/certs/ca.pem
tls_cert_file: config/certs/client.crt
tls_key_file: config/certs/client.key
KeyTypeDefaultDescription
tls_verifybooleantrueWhether to verify the Prometheus server's TLS certificate. Set to false only for self-signed certificates in development
tls_ca_filestring | nullnullPath to a custom CA certificate bundle (PEM) for verifying the Prometheus server
tls_cert_filestring | nullnullPath to a client certificate (PEM) for mutual TLS authentication
tls_key_filestring | nullnullPath to the client private key (PEM). Requires tls_cert_file

planner

The TelemetryPlanner batches all topology node/chassis/rack IDs into a small set of PromQL queries (using instance=~"id1|id2|...") and caches the result snapshot. This avoids per-device query explosion in large topologies.

planner:
unknown_state: UNKNOWN
cache_ttl_seconds: 60
max_ids_per_query: 300
KeyTypeDefaultDescription
unknown_stateOK | WARN | CRIT | UNKNOWNUNKNOWNHealth state assigned to devices for which Prometheus returns no data. Set to OK to suppress noise for devices not yet instrumented
cache_ttl_secondsinteger30How long (seconds) a PlannerSnapshot is reused before the planner re-queries Prometheus. Lower values increase Prometheus load
max_ids_per_queryinteger200Maximum number of IDs packed into a single PromQL regex match. Prevents URL length limits from being hit with very large clusters
Tuning max_ids_per_query

At the default of 300, a topology with 1 000 nodes produces approximately 4 batched queries instead of 1 000 individual ones. Reduce this value if Prometheus returns URI Too Long errors (typically at 500+ IDs depending on ID length).


features

Feature flags control which UI sections and behaviors are active. All are boolean.

features:
notifications: true
notifications_max_visible: 10
playlist: true
offline: true
worldmap: true
dev_tools: true
wizard: true
KeyTypeDefaultDescription
notificationsbooleantrueShow the notifications bell in the header. When false, the bell and notification panel are hidden
notifications_max_visibleinteger10Maximum number of notifications shown in the panel at once (minimum: 1)
playlistbooleanfalseEnable NOC playlist mode (screen rotation). Exposes the playlist controls and /playlist route
offlinebooleanfalseEnable offline mode indicator. When Prometheus is unreachable, shows a banner rather than erroring
worldmapbooleantrueShow the World Map view (/views/worldmap). Hide this if all your sites lack geolocation data
dev_toolsbooleanfalseShow developer tools pages (UI component showcase, internal diagnostics). Disable in production
wizardbooleantrueShow the setup wizard on first launch. Set to false to disable permanently

auth

Controls access to the Rackscope UI. Authentication is disabled by default. When enabled, users must log in with a username and bcrypt-hashed password.

auth:
enabled: false
username: admin
password_hash: $2b$12$X91lqP3eT0gSs7rF9JdM0OtowhwbsugMYDDGySdrXscVmtggB4eCS
secret_key: ''
session_duration: 24h
policy:
min_length: 6
max_length: 128
require_digit: false
require_symbol: false
KeyTypeDefaultDescription
enabledbooleanfalseEnable authentication. When false, all UI routes are accessible without login
usernamestringadminLogin username
password_hashstring""bcrypt hash of the password. An empty string means authentication is not yet configured even if enabled: true
secret_keystring""JWT signing secret. Auto-generated at startup when empty. Set explicitly for multi-instance deployments to preserve sessions across restarts
session_duration8h | 24h | unlimited24hHow long a login session remains valid

Generating a password hash

docker compose exec backend python -c \
"import bcrypt; print(bcrypt.hashpw(b'yourpassword', bcrypt.gensalt()).decode())"

Copy the output (starting with $2b$) into password_hash.

auth.policy

Password validation rules applied when changing the password via the Settings UI.

KeyTypeDefaultDescription
min_lengthinteger6Minimum password length (1–128)
max_lengthinteger128Maximum password length (6–512)
require_digitbooleanfalseRequire at least one digit
require_symbolbooleanfalseRequire at least one non-alphanumeric character

map

Defaults for the World Map view. These control the initial viewport when the map is loaded.

map:
default_view: world
default_zoom: 3
min_zoom: 2
max_zoom: 7
zoom_controls: true
center:
lat: 20.0
lon: 0.0
KeyTypeDefaultDescription
default_viewworld | continent | country | city | nullworldInitial zoom preset applied when the map first loads
default_zoominteger | nullnullExplicit zoom level (1–18). When set, overrides default_view zoom calculation
min_zoominteger2Minimum zoom level the user can zoom out to (1–18)
max_zoominteger7Maximum zoom level the user can zoom in to (1–18)
zoom_controlsbooleantrueShow the + / zoom buttons on the map
center.latfloat20.0Initial map center latitude (−90 to 90)
center.lonfloat0.0Initial map center longitude (−180 to 180)

playlist

Configures the NOC wallboard screen-rotation mode. When enabled via features.playlist: true, the UI cycles through the listed routes automatically.

playlist:
interval_seconds: 30
views:
- /views/worldmap
- /slurm/overview
KeyTypeDefaultDescription
interval_secondsinteger30Seconds each view is displayed before advancing (minimum: 5)
viewslist of strings[/views/worldmap, /slurm/overview]Ordered list of frontend routes to cycle through. Any valid Rackscope route can be used

Example with multiple room views:

playlist:
interval_seconds: 20
views:
- /views/worldmap
- /views/room/r001
- /views/room/r002
- /slurm/wallboard/a01

plugins

Plugin enable/disable flags. Detailed plugin configuration lives in separate files at config/plugins/{plugin_id}/config.yml, not in app.yaml. This separation was introduced to avoid configuration duplication and improve maintainability.

plugins:
simulator:
enabled: true
slurm:
enabled: true
PluginKeyTypeDefaultDescription
Simulatorsimulator.enabledbooleanfalseActivate the SimulatorPlugin for demo/testing mode
Slurmslurm.enabledbooleantrueActivate the SlurmPlugin for HPC workload manager integration
app.yaml must only carry enabled

Never put simulator behaviour settings in app.yaml (incident_mode, changes_per_hour, slurm_random_statuses, etc.). The simulator process merges plugin.yaml with app.yaml, and app.yaml wins on conflicts — so any incident_mode in app.yaml silently overrides the value you set in the Settings UI.

Rule: app.yamlenabled: true/false only. Everything else → dedicated file.

Plugin configuration files

Each plugin's detailed settings are managed in dedicated configuration files:

PluginFileManaged by
Simulatorconfig/plugins/simulator/config/plugin.yamlSettings UI → Plugins → Simulator
Slurmconfig/plugins/slurm/config.ymlSettings UI → Plugins → Slurm

See Plugins for the full configuration reference.


Complete example

The following is the full config/app.yaml as shipped with the development stack:

# ── Application identity ──────────────────────────────────────────────────────
app:
name: Rackscope
description: Datacenter Monitoring

# ── Config file paths ─────────────────────────────────────────────────────────
paths:
topology: config/topology
templates: config/templates
checks: config/checks/library
metrics: config/metrics/library

# ── State refresh intervals (seconds) ─────────────────────────────────────────
refresh:
room_state_seconds: 60
rack_state_seconds: 60

# ── Prometheus query cache TTLs ───────────────────────────────────────────────
cache:
ttl_seconds: 60
health_checks_ttl_seconds: 30
metrics_ttl_seconds: 120

# ── Prometheus connection ─────────────────────────────────────────────────────
telemetry:
prometheus_url: http://prometheus:9090
identity_label: instance
rack_label: rack_id
chassis_label: chassis_id
job_regex: node|rackscope-simulator
prometheus_heartbeat_seconds: 30
prometheus_latency_window: 20
debug_stats: false
basic_auth_user: null
basic_auth_password: null
tls_verify: false
tls_ca_file: null
tls_cert_file: null
tls_key_file: null

# ── Telemetry planner ─────────────────────────────────────────────────────────
planner:
unknown_state: UNKNOWN
cache_ttl_seconds: 60
max_ids_per_query: 300

# ── Feature flags ─────────────────────────────────────────────────────────────
features:
notifications: true
notifications_max_visible: 10
playlist: true
offline: true
worldmap: true
dev_tools: true
wizard: true # Show setup wizard on first launch

# ── Authentication ────────────────────────────────────────────────────────────
auth:
enabled: false
username: admin
password_hash: ''
secret_key: ''
session_duration: 24h
policy:
min_length: 6
max_length: 128
require_digit: false
require_symbol: false

# ── World map defaults ────────────────────────────────────────────────────────
map:
default_view: world
default_zoom: 3
min_zoom: 2
max_zoom: 7
zoom_controls: true
center:
lat: 20.0
lon: 0.0

# ── Playlist (NOC wallboard rotation) ─────────────────────────────────────────
playlist:
interval_seconds: 30
views:
- /views/worldmap
- /slurm/overview

# ── Plugins ───────────────────────────────────────────────────────────────────
# Only enabled flags live here — full config in config/plugins/{id}/config.yml
plugins:
simulator:
enabled: true
slurm:
enabled: true

Reference file

A fully annotated reference file is included in the repository at config/app.yaml.reference. It documents every key with its default value, type, and description — useful as a starting point when setting up a new deployment.

# Start from the reference
cp config/app.yaml.reference config/app.yaml
# Then edit for your environment

The reference file always reflects the current schema. It is kept in sync with this page.