Skip to main content

Configuration

All Rackscope configuration lives in config/ as YAML files. The main entry point is config/app.yaml.

Reference file

config/app.yaml.reference contains every available key with its default value and a description. Copy it to config/app.yaml as a starting point for your own deployment.


app.yaml overview

app:
name: Rackscope
description: My Datacenter

# Paths to your topology, templates, checks, and metrics
paths:
topology: config/examples/hpc-cluster/topology
templates: config/examples/hpc-cluster/templates
checks: config/examples/hpc-cluster/checks/library
metrics: config/examples/hpc-cluster/metrics/library

# Prometheus connection
telemetry:
prometheus_url: http://prometheus:9090
identity_label: instance # Prometheus label mapping to a node
rack_label: rack_id
chassis_label: chassis_id
job_regex: node # Jobs included in health check queries

# Cache and performance
cache:
health_checks_ttl_seconds: 60
metrics_ttl_seconds: 120
planner:
cache_ttl_seconds: 60
max_ids_per_query: 300 # Tune upward for large clusters (>1000 nodes)

# Feature flags
features:
notifications: true
playlist: true
worldmap: true
wizard: true

# Authentication (disabled by default)
auth:
enabled: false

# Plugins
plugins:
simulator:
enabled: true # Set to false on real infrastructure
slurm:
enabled: false # Enable if you have a Slurm workload manager

Full reference: every key with its default and description is in config/app.yaml.reference.


Setting up for real infrastructure

When connecting Rackscope to a real Prometheus and real hardware, replace the paths section to point to your own config directories:

paths:
topology: config/topology # your topology YAML files
templates: config/templates # your hardware templates
checks: config/checks/library # your health check definitions
metrics: config/metrics/library # your metric definitions

Then disable the simulator and configure your Prometheus:

telemetry:
prometheus_url: http://your-prometheus:9090
job_regex: node|ipmi # match your actual Prometheus job names

plugins:
simulator:
enabled: false

Topology

Topology can be defined in two formats:

Monolithic (simple labs)

A single config/topology.yaml containing all sites, rooms, racks, and devices.

config/topology/
sites.yaml
datacenters/{site_id}/
rooms/{room_id}/
room.yaml
aisles/{aisle_id}/
aisle.yaml
racks/{rack_id}.yaml

Rack example

id: a01-r01
name: "Rack A01-R01"
u_height: 42
template_id: standard-42u

devices:
- id: compute-01
name: "Compute 01"
template_id: generic-1u-server
u_position: 1
instance: compute[001-004] # expands to compute001..compute004

See Topology YAML for the full schema.


Templates

Templates define hardware characteristics and are reused across the topology.

templates:
- id: generic-1u-server
name: "Generic 1U Server"
type: server
u_height: 1
checks:
- node_up
- ipmi_temp_warn
metrics:
- node_temperature
- node_power

See Templates for all fields.


Plugin configuration

Each plugin has its own config file. Only the enabled flag lives in app.yaml.

PluginConfig file
Simulatorconfig/plugins/simulator/config/plugin.yaml
Slurmconfig/plugins/slurm/config.yml

See Simulator Plugin and Slurm Plugin for details.


Environment variables

VariableDefaultDescription
RACKSCOPE_APP_CONFIGconfig/app.yamlPath to app.yaml
APP_CONFIGapp.yamlFilename within config/ — set by make use
RACKSCOPE_CONFIG_DIRconfigBase config directory