Nimify Anything

CLI that wraps any ONNX or TensorRT engine into an NVIDIA NIM microservice with auto-generated OpenAPI + Prometheus metrics. Turn any model into a production-ready API in seconds.

🚀 Overview

NVIDIA NIM (NVIDIA Inference Microservices) added open MoE models in July 2025, but developers still hand-roll deployment configs. Nimify automates the entire process:

One-command deployment from model file to production API
Auto-generated OpenAPI with type-safe clients
Built-in monitoring via Prometheus/Grafana
Smart autoscaling based on latency and GPU utilization
Helm charts for Kubernetes deployment

⚡ Quick Demo

# Transform any ONNX model into a NIM service
nimify create my-model.onnx --name my-service --port 8080

# Deploy to Kubernetes with autoscaling
nimify deploy my-service --replicas 3 --autoscale

# Access your API
curl http://localhost:8080/v1/predict -d '{"input": [1, 2, 3]}'

📋 Requirements

# Core dependencies
python>=3.10
onnx>=1.16.0
onnxruntime-gpu>=1.18.0
tensorrt>=10.0.0
tritonclient>=2.45.0
nvidia-pyindex

# API & Infrastructure
fastapi>=0.110.0
uvicorn>=0.30.0
pydantic>=2.0.0
prometheus-client>=0.20.0

# Deployment tools
docker>=24.0.0
kubernetes>=29.0.0
helm>=3.14.0

🛠️ Installation

From PyPI

pip install nimify-anything

From Source

git clone https://github.com/yourusername/nimify-anything.git
cd nimify-anything
pip install -e .

Verify Installation

# Check version and dependencies
nimify --version
nimify doctor

🚦 Usage Examples

Basic Model Wrapping

# ONNX model
nimify create model.onnx --name my-classifier

# TensorRT engine
nimify create model.trt --name my-detector --input-shapes "images:3,224,224"

# Hugging Face model
nimify create "facebook/bart-large-mnli" --source huggingface

Advanced Configuration

# Create with custom settings
nimify create model.onnx \
    --name sentiment-analyzer \
    --port 8080 \
    --max-batch-size 32 \
    --dynamic-batching \
    --gpu-memory 4GB \
    --metrics-port 9090

Python API

from nimify import Nimifier, ModelConfig

# Configure model
config = ModelConfig(
    name="my-model",
    max_batch_size=64,
    dynamic_batching=True,
    preferred_batch_sizes=[8, 16, 32, 64],
    max_queue_delay_microseconds=100
)

# Create NIM service
nim = Nimifier(config)
service = nim.wrap_model(
    "model.onnx",
    input_schema={"input": "float32[?,3,224,224]"},
    output_schema={"predictions": "float32[?,1000]"}
)

# Generate artifacts
service.generate_openapi("openapi.json")
service.generate_helm_chart("./helm/my-model")
service.build_container("myregistry/my-model:latest")

🏗️ Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│   Model File    │────▶│   Nimifier   │────▶│  NIM Service    │
│ (ONNX/TRT/HF)   │     │   Engine     │     │  (Container)    │
└─────────────────┘     └──────────────┘     └─────────────────┘
         │                      │                      │
         ▼                      ▼                      ▼
┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│ Model Analysis  │     │   Triton     │     │   Kubernetes    │
│                 │     │   Config     │     │   Deployment    │
└─────────────────┘     └──────────────┘     └─────────────────┘

🎯 Features

Auto-Generated OpenAPI

# Generated openapi.yaml
openapi: 3.0.0
info:
  title: my-model NIM API
  version: 1.0.0
paths:
  /v1/predict:
    post:
      summary: Run inference
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                input:
                  type: array
                  items:
                    type: number
      responses:
        200:
          description: Successful prediction
          content:
            application/json:
              schema:
                type: object
                properties:
                  predictions:
                    type: array

Prometheus Metrics

# Automatically exposed metrics:
# - nim_request_duration_seconds
# - nim_request_count_total
# - nim_batch_size_histogram
# - nim_gpu_utilization_percent
# - nim_model_loading_time_seconds
# - nim_queue_size

Smart Autoscaling

# Generated HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-model-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-model
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: gpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: nim_request_duration_seconds_p99
      target:
        type: AverageValue
        averageValue: "100m"  # 100ms

🐳 Container Building

Automatic Dockerfile Generation

# Auto-generated Dockerfile
FROM nvcr.io/nvidia/tritonserver:24.06-py3

# Install NIM runtime
RUN pip install nvidia-nim-runtime

# Copy model and config
COPY model_repository/ /models/
COPY nim_config.pbtxt /models/my-model/config.pbtxt

# Expose ports
EXPOSE 8000 8001 8002

# Launch Triton with NIM
CMD ["tritonserver", "--model-repository=/models", "--nim-mode"]

Build and Push

# Build optimized container
nimify build my-model --optimize --tag myregistry/my-model:v1

# Push to registry
nimify push myregistry/my-model:v1

# Or use GitHub Actions
nimify generate-ci --platform github

☸️ Kubernetes Deployment

Helm Chart Generation

# Generate production-ready Helm chart
nimify helm create my-model --values prod-values.yaml

# Deploy to Kubernetes
helm install my-model ./my-model-chart \
  --namespace nim \
  --set image.tag=v1 \
  --set autoscaling.enabled=true

Generated Resources

# values.yaml
replicaCount: 3

image:
  repository: myregistry/my-model
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: LoadBalancer
  port: 80
  targetPort: 8000

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetGPUUtilizationPercentage: 80
  targetLatencyMilliseconds: 100

resources:
  limits:
    nvidia.com/gpu: 1
    memory: 16Gi
  requests:
    nvidia.com/gpu: 1
    memory: 8Gi

monitoring:
  prometheus:
    enabled: true
    port: 9090
  grafana:
    enabled: true
    dashboards:
      - nim-overview
      - gpu-metrics

📊 Monitoring Dashboard

Grafana Integration

# Deploy Grafana dashboard
nimify grafana deploy --model my-model

# Access dashboard
kubectl port-forward svc/grafana 3000:3000

Pre-built Dashboards

Request latency P50/P95/P99
Throughput (requests/sec)
GPU utilization and memory
Batch size distribution
Queue depth and wait times
Error rates and types

🔧 Advanced Features

Multi-Model Serving

# Create ensemble service
nimify ensemble create \
  --name multi-stage-pipeline \
  --models preprocessor:preprocess.onnx \
           detector:yolov8.trt \
           classifier:resnet50.onnx \
  --pipeline sequential

A/B Testing

from nimify import ABTestConfig

# Configure A/B test
ab_config = ABTestConfig(
    variants={
        "control": "model_v1.onnx",
        "treatment": "model_v2.onnx"
    },
    traffic_split={"control": 0.8, "treatment": 0.2},
    metrics=["latency", "accuracy"]
)

nimify.create_ab_test("my-experiment", ab_config)

Custom Preprocessors

from nimify import Preprocessor

@Preprocessor.register("image_normalize")
def normalize_image(input_data):
    """Custom preprocessing logic"""
    return (input_data - 127.5) / 127.5

# Use in configuration
nimify create model.onnx \
  --preprocessor image_normalize \
  --postprocessor argmax

🧪 Testing & Validation

Load Testing

# Run built-in load test
nimify loadtest my-model \
  --concurrent-users 100 \
  --duration 5m \
  --rps 1000

# Generate report
nimify loadtest report --output performance.html

Model Validation

from nimify import ModelValidator

validator = ModelValidator()

# Validate model serving
results = validator.validate(
    model_path="model.onnx",
    test_data="test_samples.json",
    checks=["output_shape", "latency", "throughput"]
)

assert results.passed, f"Validation failed: {results.errors}"

🔄 CI/CD Integration

GitHub Actions

# .github/workflows/nimify.yml
name: Build and Deploy NIM

on:
  push:
    branches: [main]

jobs:
  nimify:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Install Nimify
      run: pip install nimify-anything
    
    - name: Build NIM service
      run: |
        nimify create model.onnx --name my-model
        nimify build my-model --tag ${{ github.sha }}
    
    - name: Deploy to Kubernetes
      run: |
        nimify deploy my-model \
          --image my-model:${{ github.sha }} \
          --wait

GitLab CI

# .gitlab-ci.yml
stages:
  - build
  - deploy

build-nim:
  stage: build
  script:
    - nimify create $MODEL_PATH --name $SERVICE_NAME
    - nimify build $SERVICE_NAME --tag $CI_COMMIT_SHA
    - nimify push $REGISTRY/$SERVICE_NAME:$CI_COMMIT_SHA

deploy-nim:
  stage: deploy
  script:
    - nimify deploy $SERVICE_NAME --image $REGISTRY/$SERVICE_NAME:$CI_COMMIT_SHA

🎯 Real-World Examples

Computer Vision Pipeline

# Object detection service
nimify create yolov8.onnx \
  --name object-detector \
  --input-type image \
  --output-type bounding-boxes \
  --preprocessing resize,normalize \
  --postprocessing nms

# Deploy with GPU optimization
nimify deploy object-detector \
  --gpu-memory 8GB \
  --tensorrt-optimization aggressive

NLP Service

# Text classification
nimify create bert-sentiment.onnx \
  --name sentiment-analyzer \
  --input-type text \
  --tokenizer bert-base-uncased \
  --max-sequence-length 512

Time Series Prediction

# Financial forecasting
nimify create lstm-forecast.onnx \
  --name stock-predictor \
  --input-shape "sequence:30,features:5" \
  --output-shape "predictions:5" \
  --streaming-mode true

🤝 Contributing

We welcome contributions! Priority areas:

Additional model format support
Custom metric collectors
Cloud provider integrations
Performance optimizations
Documentation improvements

See CONTRIBUTING.md for guidelines.

📄 Citation

@software{nimify_anything,
  title={Nimify Anything: Automated NVIDIA NIM Service Generation},
  author={Your Name},
  year={2025},
  url={https://github.com/yourusername/nimify-anything}
}

🏆 Acknowledgments

NVIDIA for the NIM framework
The Triton Inference Server team
Contributors to the Kubernetes ecosystem

📝 License

MIT License - See LICENSE for details.

🔗 Resources

📧 Contact

GitHub Issues: Bug reports and features
Email: [email protected]
Twitter: @NimifyAnything# Nimify Anything

CLI that wraps any ONNX or TensorRT engine into an NVIDIA NIM microservice with auto-generated OpenAPI + Prometheus metrics. Turn any model into a production-ready API in seconds.

🚀 Overview

NVIDIA NIM (NVIDIA Inference Microservices) added open MoE models in July 2025, but developers still hand-roll deployment configs. Nimify automates the entire process:

One-command deployment from model file to production API
Auto-generated OpenAPI with type-safe clients
Built-in monitoring via Prometheus/Grafana
Smart autoscaling based on latency and GPU utilization
Helm charts for Kubernetes deployment

⚡ Quick Demo

# Transform any ONNX model into a NIM service
nimify create my-model.onnx --name my-service --port 8080

# Deploy to Kubernetes with autoscaling
nimify deploy my-service --replicas 3 --autoscale

# Access your API
curl http://localhost:8080/v1/predict -d '{"input": [1, 2, 3]}'

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.devcontainer		.devcontainer
.github		.github
.terragon		.terragon
.vscode		.vscode
demo-service-chart		demo-service-chart
deployment		deployment
docs		docs
global_validation_results		global_validation_results
invalid name with spaces-chart		invalid name with spaces-chart
kubernetes		kubernetes
linux-amd64		linux-amd64
monitoring		monitoring
nimify-anything-production-deployment		nimify-anything-production-deployment
nimify-production-chart		nimify-production-chart
quality_gates_results		quality_gates_results
scripts		scripts
src/nimify		src/nimify
test-service-chart		test-service-chart
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.releaserc.json		.releaserc.json
.secrets.baseline		.secrets.baseline
AUTONOMOUS_EXECUTION_COMPLETE.md		AUTONOMOUS_EXECUTION_COMPLETE.md
AUTONOMOUS_EXECUTION_SUMMARY.md		AUTONOMOUS_EXECUTION_SUMMARY.md
AUTONOMOUS_SDLC_COMPLETE.md		AUTONOMOUS_SDLC_COMPLETE.md
AUTONOMOUS_SDLC_EXECUTION_COMPLETE.md		AUTONOMOUS_SDLC_EXECUTION_COMPLETE.md
AUTONOMOUS_SDLC_EXECUTION_SUMMARY.md		AUTONOMOUS_SDLC_EXECUTION_SUMMARY.md
AUTONOMOUS_SDLC_RESEARCH_FINDINGS.md		AUTONOMOUS_SDLC_RESEARCH_FINDINGS.md
BACKLOG.md		BACKLOG.md
CHANGELOG.md		CHANGELOG.md
CI_CD_SETUP.md		CI_CD_SETUP.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPLIANCE.md		COMPLIANCE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT_COMPLETE.md		DEPLOYMENT_COMPLETE.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
Dockerfile		Dockerfile
Dockerfile.production		Dockerfile.production
Dockerfile.production-multi		Dockerfile.production-multi
Dockerfile.security-hardened		Dockerfile.security-hardened
ENHANCEMENT_SUMMARY.md		ENHANCEMENT_SUMMARY.md
EXECUTION_SUMMARY.md		EXECUTION_SUMMARY.md
LICENSE		LICENSE
Makefile		Makefile
NIMIFY_SDLC_COMPLETE.md		NIMIFY_SDLC_COMPLETE.md
PRODUCTION_READY.md		PRODUCTION_READY.md
PROJECT_CHARTER.md		PROJECT_CHARTER.md
README.md		README.md
README_BIONEURO.md		README_BIONEURO.md
RESEARCH_PUBLICATION.md		RESEARCH_PUBLICATION.md
SBOM.json		SBOM.json
SECURITY.md		SECURITY.md
autonomous_quality_gates_report.json		autonomous_quality_gates_report.json
autonomous_sdlc_execution_report.md		autonomous_sdlc_execution_report.md
demo-service-openapi.json		demo-service-openapi.json
deploy_production_ready.py		deploy_production_ready.py
deployment_orchestrator.py		deployment_orchestrator.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.production.yml		docker-compose.production.yml
docker-compose.yml		docker-compose.yml
final_quality_gates.py		final_quality_gates.py
global_deployment_report.json		global_deployment_report.json
global_deployment_validator.py		global_deployment_validator.py
invalid name with spaces-openapi.json		invalid name with spaces-openapi.json
nimify-production-openapi.json		nimify-production-openapi.json
nimify_quality_assessment.json		nimify_quality_assessment.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
quality_assurance_report.py		quality_assurance_report.py
quality_gates.py		quality_gates.py
quality_gates_check.py		quality_gates_check.py
quality_gates_report.json		quality_gates_report.json
research_validation_framework.py		research_validation_framework.py
research_validation_report.json		research_validation_report.json
run_comprehensive_tests.py		run_comprehensive_tests.py
run_global_deployment_demo.py		run_global_deployment_demo.py
run_quality_gates_demo.py		run_quality_gates_demo.py
safety_report.json		safety_report.json
security_report.json		security_report.json
simple_functionality_test.py		simple_functionality_test.py
simple_quality_check.py		simple_quality_check.py
simple_quality_demo.py		simple_quality_demo.py
simple_research_validation.py		simple_research_validation.py
simple_test.py		simple_test.py
test-service-openapi.json		test-service-openapi.json
test_optimized_generation.py		test_optimized_generation.py
test_optimized_system.py		test_optimized_system.py
test_robust_generation.py		test_robust_generation.py
test_robust_simple.py		test_robust_simple.py
test_robust_system.py		test_robust_system.py
test_simple_api.py		test_simple_api.py
tox.ini		tox.ini
validate_implementation.py		validate_implementation.py

License

danieleschmidt/nimify-anything

Folders and files

Latest commit

History

Repository files navigation

Nimify Anything

🚀 Overview

⚡ Quick Demo

🛠️ Installation

From PyPI

From Source

Verify Installation

🚦 Usage Examples

Basic Model Wrapping

Advanced Configuration

Python API

🏗️ Architecture

🎯 Features

Auto-Generated OpenAPI

Prometheus Metrics

Smart Autoscaling

🐳 Container Building

Automatic Dockerfile Generation

Build and Push

☸️ Kubernetes Deployment

Helm Chart Generation

Generated Resources

📊 Monitoring Dashboard

Grafana Integration

Pre-built Dashboards

🔧 Advanced Features

Multi-Model Serving

A/B Testing

Custom Preprocessors

🧪 Testing & Validation

Load Testing

Model Validation

🔄 CI/CD Integration

GitHub Actions

GitLab CI

🎯 Real-World Examples

Computer Vision Pipeline

NLP Service

Time Series Prediction

🤝 Contributing

📄 Citation

🏆 Acknowledgments

📝 License

🔗 Resources

📧 Contact

🚀 Overview

⚡ Quick Demo

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages