Skip to content

rsionnach/nthlayer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

NthLayer

Reliability at build time, not incident time.

Status: Alpha PyPI License: MIT


โš ๏ธ The Problem

Teams deploy code without reliability validation:

  • Alerts created after the first incident
  • Dashboards built after users complain
  • SLOs defined after budget is exhausted
  • No gates to prevent risky deploys

โœ… The Solution

NthLayer shifts reliability left into your CI/CD pipeline:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ service.yaml โ†’ generate โ†’ lint โ†’ verify โ†’ check-deploy โ†’ deploy            โ”‚
โ”‚                   โ†“         โ†“       โ†“           โ†“                          โ”‚
โ”‚               artifacts   valid?  metrics?  budget ok?                     โ”‚
โ”‚                                                                            โ”‚
โ”‚ "Is this production-ready?" - answered BEFORE deployment                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
# In your Tekton/GitHub Actions pipeline:
nthlayer apply service.yaml --lint    # Generate + validate PromQL syntax
nthlayer verify service.yaml          # Verify declared metrics exist
nthlayer check-deploy service.yaml    # Check error budget gate
# Only if all pass: deploy to production

Works with: Tekton, GitHub Actions, GitLab CI, ArgoCD, Mimir/Cortex


๐Ÿšฆ Shift Left Features

Command What It Does Pipeline Exit Code
nthlayer verify Validates declared metrics exist in Prometheus 1 if missing metrics
nthlayer check-deploy Checks error budget - blocks if exhausted 2 if budget exhausted
nthlayer apply --lint Validates PromQL syntax with pint 1 if invalid queries

Deployment Gate Example

nthlayer check-deploy demo

โšก Quick Start

pipx install nthlayer

nthlayer apply service.yaml

# Output: generated/payment-api/
#   โ”œโ”€โ”€ dashboard.json       โ†’ Grafana
#   โ”œโ”€โ”€ alerts.yaml          โ†’ Prometheus
#   โ”œโ”€โ”€ slos.yaml            โ†’ OpenSLO
#   โ””โ”€โ”€ recording-rules.yaml โ†’ Prometheus

๐ŸŽฏ Why NthLayer?

Benefit How It Works
Prevent, Don't React Validate reliability requirements before deploy, not after incidents
Contract Verification nthlayer verify fails pipeline if declared metrics don't exist
Deployment Gates nthlayer check-deploy blocks deploys when error budget exhausted
Immutable Standards Update NthLayer version = all services get new standards
GitOps Native Generated files commit to git, works with any CD system

Competitive Positioning

Tool Focus NthLayer Difference
PagerDuty Incident response "They respond to incidents, we prevent them"
Datadog Post-deploy monitoring "They monitor after, we validate before"
Nobl9 SLO tracking "They track SLOs, we enforce them as gates"
Backstage Service catalog "They document, we generate and enforce"

๐Ÿ“ฅ What You Put In

1. Service Spec (service.yaml)

# Minimal example (5 lines)
name: payment-api
tier: critical
type: api
dependencies:
  - postgresql

2. Environment Variables (optional)

# ๐Ÿ“Ÿ PagerDuty - auto-create team, escalation policy, service
export PAGERDUTY_API_KEY=...

# ๐Ÿ“Š Grafana - auto-push dashboards
export NTHLAYER_GRAFANA_URL=...
export NTHLAYER_GRAFANA_API_KEY=...
export NTHLAYER_GRAFANA_ORG_ID=1              # Default: 1

# ๐Ÿ” Prometheus - metric discovery for intent resolution
export NTHLAYER_PROMETHEUS_URL=...
export NTHLAYER_METRICS_USER=...              # If auth required
export NTHLAYER_METRICS_PASSWORD=...

๐Ÿ“ค What You Get Out

Output File Deploy To
๐Ÿ“Š Dashboard generated/<service>/dashboard.json Grafana
๐Ÿšจ Alerts generated/<service>/alerts.yaml Prometheus
๐ŸŽฏ SLOs generated/<service>/slos.yaml OpenSLO-compatible
โšก Recording Rules generated/<service>/recording-rules.yaml Prometheus
๐Ÿ“Ÿ PagerDuty Created via API Team, escalation policy, service

๐Ÿ“Š SLO Portfolio

Track reliability across your entire organization:

nthlayer portfolio demo
nthlayer portfolio              # Org-wide reliability view
nthlayer portfolio --format json  # Machine-readable for dashboards
nthlayer slo collect service.yaml  # Query current budget from Prometheus

๐Ÿ“ Full Service Example

name: payment-api
tier: critical              # critical | standard | low
type: api                   # api | worker | stream
team: payments

slos:
  availability: 99.95       # Generates Prometheus alerts
  latency_p99_ms: 200       # Generates histogram queries

dependencies:
  - postgresql              # Adds PostgreSQL panels
  - redis                   # Adds Redis panels
  - kubernetes              # Adds K8s pod metrics

pagerduty:
  enabled: true
  support_model: self       # self | shared | sre | business_hours

๐Ÿ’ฐ The Value

Generation: 20 hours โ†’ 5 minutes per service

Task Manual Effort With NthLayer
๐ŸŽฏ Define SLOs & error budgets 6 hours Generated from tier
๐Ÿšจ Research & configure alerts 4 hours 400+ battle-tested rules
๐Ÿ“Š Build Grafana dashboards 5 hours 12-28 panels auto-generated
๐Ÿ“Ÿ PagerDuty escalation setup 2 hours Tier-based defaults
๐Ÿ“‹ Write recording rules 3 hours 20+ pre-computed metrics

Validation: Catch issues before production

Problem Without NthLayer With NthLayer
Missing metrics Discover after deploy nthlayer verify blocks promotion
Invalid PromQL Prometheus rejects rules --lint catches in CI
Policy violations Manual review nthlayer validate-spec enforces
Exhausted budget Deploy anyway, incident check-deploy blocks risky deploys

At Scale

Scale Generation Saved Incidents Prevented*
๐Ÿš€ 50 services 996 hours ($100K) ~12/year
๐Ÿ“ˆ 200 services 3,983 hours ($400K) ~48/year
๐Ÿข 1,000 services 19,917 hours ($2M) ~240/year

*Estimated based on 60% reduction in "missing monitoring" incidents. Value at $100/hr engineering cost.


๐Ÿง  How It Works

Generation

Step What Happens
๐ŸŽฏ Intent Resolution Maps "availability SLO" โ†’ best matching PromQL query
๐Ÿ”€ Type Routing API services get HTTP metrics, workers get job metrics
โšก Tier Defaults Critical = 99.95% SLO + 5min escalation, Low = 99.5% + 60min
๐Ÿ—๏ธ Technology Templates 23 built-in: PostgreSQL, Redis, Kafka, MongoDB, etc.

CI/CD Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Generate  โ”‚โ”€โ”€โ”€โ–ถโ”‚   Validate  โ”‚โ”€โ”€โ”€โ–ถโ”‚   Protect   โ”‚โ”€โ”€โ”€โ–ถโ”‚   Deploy    โ”‚
โ”‚ nthlayer    โ”‚    โ”‚ --lint      โ”‚    โ”‚ check-deployโ”‚    โ”‚ kubectl     โ”‚
โ”‚ apply       โ”‚    โ”‚ verify      โ”‚    โ”‚             โ”‚    โ”‚ argocd      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚                  โ”‚                  โ”‚
      โ–ผ                  โ–ผ                  โ–ผ
  artifacts         exit 1 if          exit 2 if
  to git            invalid            budget exhausted

Works with: GitHub Actions, GitLab CI, ArgoCD, Tekton, Jenkins


๐Ÿ› ๏ธ CLI Commands

Generate

nthlayer init                   # Interactive service.yaml creation
nthlayer plan service.yaml      # Preview what will be generated
nthlayer apply service.yaml     # Generate all artifacts
nthlayer apply --push           # Also push dashboard to Grafana
nthlayer apply --push-ruler     # Push alerts to Mimir/Cortex Ruler API

Validate

nthlayer apply --lint           # Validate PromQL syntax (pint)
nthlayer validate-spec service.yaml  # Check against policies (OPA/Rego)
nthlayer verify service.yaml    # Verify metrics exist in Prometheus

Protect

nthlayer check-deploy service.yaml  # Check error budget gate (exit 2 = blocked)
nthlayer portfolio              # Org-wide SLO health
nthlayer slo collect service.yaml   # Query current budget from Prometheus

๐Ÿ”ฎ Coming Soon

Feature Description Status
๐Ÿ’ฐ Error Budgets Track budget consumption, correlate with deploys โœ… Done
๐Ÿ“Š SLO Portfolio Org-wide reliability view across all services โœ… Done
๐Ÿšฆ Deployment Gates Block deploys when error budget exhausted โœ… Done
โœ… Contract Verification Verify declared metrics exist before promotion โœ… Done
๐Ÿ“ Loki Integration Generate LogQL alert rules, technology-specific log patterns ๐Ÿ”จ Next
๐Ÿค– AI Generation Conversational service.yaml creation via MCP ๐Ÿ“‹ Planned

๐Ÿ“ฆ Installation

# Recommended
pipx install nthlayer

# Or with pip
pip install nthlayer

# Verify
nthlayer --version

๐ŸŒ Live Demo

See NthLayer in action with real Grafana dashboards and generated configs:

Live Dashboards Interactive Demo


๐Ÿ“š Documentation

Full Documentation - Comprehensive guides and reference.

Quick Links
๐Ÿš€ Quick Start Get running in 5 minutes
๐Ÿ”ง Setup Wizard Interactive configuration
๐Ÿ“Š SLO Portfolio Org-wide reliability view
๐Ÿ”Œ 18 Technologies PostgreSQL, Redis, Kafka...
๐Ÿ“– CLI Reference All commands
๐Ÿค Contributing How to contribute
Build docs locally
pip install -e ".[docs]"
mkdocs serve  # Opens at http://localhost:8000

๐Ÿค Contributing

git clone https://github.com/rsionnach/nthlayer.git
cd nthlayer
make setup    # Install deps, start services
make test     # Run tests (84 should pass)

See CONTRIBUTING.md for details.


๐Ÿ“„ License

MIT - See LICENSE.txt


๐Ÿ™ Acknowledgments

Core Dependencies

Architecture Inspiration

  • autograf - Dynamic Prometheus metric discovery
  • Sloth - SLO specification and burn rate calculations
  • OpenSLO - SLO specification standard

CLI & Documentation

  • Rich - Terminal formatting and styling (MIT)
  • Questionary - Interactive CLI prompts (MIT)
  • MkDocs Material - Documentation theme (MIT)
  • VHS - Terminal demo recordings (MIT)
  • Nord Theme - Color palette inspiration (MIT)

Tooling

About

Generate the complete reliability stack from a service spec in 5 minutes. Dashboards, alerts, SLOs, PagerDuty - zero toil.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •