Reliability at build time, not incident time.
Teams deploy code without reliability validation:
- Alerts created after the first incident
- Dashboards built after users complain
- SLOs defined after budget is exhausted
- No gates to prevent risky deploys
NthLayer shifts reliability left into your CI/CD pipeline:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ service.yaml โ generate โ lint โ verify โ check-deploy โ deploy โ
โ โ โ โ โ โ
โ artifacts valid? metrics? budget ok? โ
โ โ
โ "Is this production-ready?" - answered BEFORE deployment โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# In your Tekton/GitHub Actions pipeline:
nthlayer apply service.yaml --lint # Generate + validate PromQL syntax
nthlayer verify service.yaml # Verify declared metrics exist
nthlayer check-deploy service.yaml # Check error budget gate
# Only if all pass: deploy to productionWorks with: Tekton, GitHub Actions, GitLab CI, ArgoCD, Mimir/Cortex
| Command | What It Does | Pipeline Exit Code |
|---|---|---|
nthlayer verify |
Validates declared metrics exist in Prometheus | 1 if missing metrics |
nthlayer check-deploy |
Checks error budget - blocks if exhausted | 2 if budget exhausted |
nthlayer apply --lint |
Validates PromQL syntax with pint | 1 if invalid queries |
pipx install nthlayer
nthlayer apply service.yaml
# Output: generated/payment-api/
# โโโ dashboard.json โ Grafana
# โโโ alerts.yaml โ Prometheus
# โโโ slos.yaml โ OpenSLO
# โโโ recording-rules.yaml โ Prometheus| Benefit | How It Works |
|---|---|
| Prevent, Don't React | Validate reliability requirements before deploy, not after incidents |
| Contract Verification | nthlayer verify fails pipeline if declared metrics don't exist |
| Deployment Gates | nthlayer check-deploy blocks deploys when error budget exhausted |
| Immutable Standards | Update NthLayer version = all services get new standards |
| GitOps Native | Generated files commit to git, works with any CD system |
| Tool | Focus | NthLayer Difference |
|---|---|---|
| PagerDuty | Incident response | "They respond to incidents, we prevent them" |
| Datadog | Post-deploy monitoring | "They monitor after, we validate before" |
| Nobl9 | SLO tracking | "They track SLOs, we enforce them as gates" |
| Backstage | Service catalog | "They document, we generate and enforce" |
# Minimal example (5 lines)
name: payment-api
tier: critical
type: api
dependencies:
- postgresql# ๐ PagerDuty - auto-create team, escalation policy, service
export PAGERDUTY_API_KEY=...
# ๐ Grafana - auto-push dashboards
export NTHLAYER_GRAFANA_URL=...
export NTHLAYER_GRAFANA_API_KEY=...
export NTHLAYER_GRAFANA_ORG_ID=1 # Default: 1
# ๐ Prometheus - metric discovery for intent resolution
export NTHLAYER_PROMETHEUS_URL=...
export NTHLAYER_METRICS_USER=... # If auth required
export NTHLAYER_METRICS_PASSWORD=...| Output | File | Deploy To |
|---|---|---|
| ๐ Dashboard | generated/<service>/dashboard.json |
Grafana |
| ๐จ Alerts | generated/<service>/alerts.yaml |
Prometheus |
| ๐ฏ SLOs | generated/<service>/slos.yaml |
OpenSLO-compatible |
| โก Recording Rules | generated/<service>/recording-rules.yaml |
Prometheus |
| ๐ PagerDuty | Created via API | Team, escalation policy, service |
Track reliability across your entire organization:
nthlayer portfolio # Org-wide reliability view
nthlayer portfolio --format json # Machine-readable for dashboards
nthlayer slo collect service.yaml # Query current budget from Prometheusname: payment-api
tier: critical # critical | standard | low
type: api # api | worker | stream
team: payments
slos:
availability: 99.95 # Generates Prometheus alerts
latency_p99_ms: 200 # Generates histogram queries
dependencies:
- postgresql # Adds PostgreSQL panels
- redis # Adds Redis panels
- kubernetes # Adds K8s pod metrics
pagerduty:
enabled: true
support_model: self # self | shared | sre | business_hours| Task | Manual Effort | With NthLayer |
|---|---|---|
| ๐ฏ Define SLOs & error budgets | 6 hours | Generated from tier |
| ๐จ Research & configure alerts | 4 hours | 400+ battle-tested rules |
| ๐ Build Grafana dashboards | 5 hours | 12-28 panels auto-generated |
| ๐ PagerDuty escalation setup | 2 hours | Tier-based defaults |
| ๐ Write recording rules | 3 hours | 20+ pre-computed metrics |
| Problem | Without NthLayer | With NthLayer |
|---|---|---|
| Missing metrics | Discover after deploy | nthlayer verify blocks promotion |
| Invalid PromQL | Prometheus rejects rules | --lint catches in CI |
| Policy violations | Manual review | nthlayer validate-spec enforces |
| Exhausted budget | Deploy anyway, incident | check-deploy blocks risky deploys |
| Scale | Generation Saved | Incidents Prevented* |
|---|---|---|
| ๐ 50 services | 996 hours ($100K) | ~12/year |
| ๐ 200 services | 3,983 hours ($400K) | ~48/year |
| ๐ข 1,000 services | 19,917 hours ($2M) | ~240/year |
*Estimated based on 60% reduction in "missing monitoring" incidents. Value at $100/hr engineering cost.
| Step | What Happens |
|---|---|
| ๐ฏ Intent Resolution | Maps "availability SLO" โ best matching PromQL query |
| ๐ Type Routing | API services get HTTP metrics, workers get job metrics |
| โก Tier Defaults | Critical = 99.95% SLO + 5min escalation, Low = 99.5% + 60min |
| ๐๏ธ Technology Templates | 23 built-in: PostgreSQL, Redis, Kafka, MongoDB, etc. |
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Generate โโโโโถโ Validate โโโโโถโ Protect โโโโโถโ Deploy โ
โ nthlayer โ โ --lint โ โ check-deployโ โ kubectl โ
โ apply โ โ verify โ โ โ โ argocd โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
artifacts exit 1 if exit 2 if
to git invalid budget exhausted
Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins
nthlayer init # Interactive service.yaml creation
nthlayer plan service.yaml # Preview what will be generated
nthlayer apply service.yaml # Generate all artifacts
nthlayer apply --push # Also push dashboard to Grafana
nthlayer apply --push-ruler # Push alerts to Mimir/Cortex Ruler APInthlayer apply --lint # Validate PromQL syntax (pint)
nthlayer validate-spec service.yaml # Check against policies (OPA/Rego)
nthlayer verify service.yaml # Verify metrics exist in Prometheusnthlayer check-deploy service.yaml # Check error budget gate (exit 2 = blocked)
nthlayer portfolio # Org-wide SLO health
nthlayer slo collect service.yaml # Query current budget from Prometheus| Feature | Description | Status |
|---|---|---|
| ๐ฐ Error Budgets | Track budget consumption, correlate with deploys | โ Done |
| ๐ SLO Portfolio | Org-wide reliability view across all services | โ Done |
| ๐ฆ Deployment Gates | Block deploys when error budget exhausted | โ Done |
| โ Contract Verification | Verify declared metrics exist before promotion | โ Done |
| ๐ Loki Integration | Generate LogQL alert rules, technology-specific log patterns | ๐จ Next |
| ๐ค AI Generation | Conversational service.yaml creation via MCP | ๐ Planned |
# Recommended
pipx install nthlayer
# Or with pip
pip install nthlayer
# Verify
nthlayer --versionSee NthLayer in action with real Grafana dashboards and generated configs:
Full Documentation - Comprehensive guides and reference.
| Quick Links | |
|---|---|
| ๐ Quick Start | Get running in 5 minutes |
| ๐ง Setup Wizard | Interactive configuration |
| ๐ SLO Portfolio | Org-wide reliability view |
| ๐ 18 Technologies | PostgreSQL, Redis, Kafka... |
| ๐ CLI Reference | All commands |
| ๐ค Contributing | How to contribute |
Build docs locally
pip install -e ".[docs]"
mkdocs serve # Opens at http://localhost:8000git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup # Install deps, start services
make test # Run tests (84 should pass)See CONTRIBUTING.md for details.
MIT - See LICENSE.txt
- grafana-foundation-sdk - Dashboard generation SDK (Apache 2.0)
- awesome-prometheus-alerts - 580+ battle-tested alert rules (CC BY 4.0)
- pint - PromQL linting and validation (Apache 2.0)
- conftest / OPA - Policy validation (Apache 2.0)
- PagerDuty Python SDK - Incident management integration (MIT)
- autograf - Dynamic Prometheus metric discovery
- Sloth - SLO specification and burn rate calculations
- OpenSLO - SLO specification standard
- Rich - Terminal formatting and styling (MIT)
- Questionary - Interactive CLI prompts (MIT)
- MkDocs Material - Documentation theme (MIT)
- VHS - Terminal demo recordings (MIT)
- Nord Theme - Color palette inspiration (MIT)
- Shields.io - Badges
- Slidev - Presentation framework


