A TypeScript-based orchestrator that generates complete music videos from a user prompt, coordinating multiple sub-agents (song generator, script generator, image/video generator) using the A2A (Agent-to-Agent) protocol. Supports JSON-RPC 2.0, Server-Sent Events (SSE), webhooks, and real-time task orchestration.
This project demonstrates how to build an Orchestrator Agent that receives a creative brief for a music video (e.g., "A cyberpunk rap anthem about AI collaboration"), then proceeds through several steps to:
- Generate a Song (lyrics + audio track) via an A2A-compatible Song Generator Agent
- Generate a Script (scenes, camera movements, character descriptions, settings) via an A2A-compatible Script Generator Agent
- Create Images for each character and setting via an A2A-compatible Image/Video Generator Agent
- Produce Short Video Clips based on the generated prompts
- Compile the clips and audio track into a final music video (MP4)
- Return the final IPFS URL for the video to the user
All orchestration and communication between agents is performed using the A2A protocol (JSON-RPC 2.0 over HTTP, with support for streaming/SSE and webhooks).
[User Prompt] | v [Orchestrator] |--(A2A)--> [Song Generator Agent] |--(A2A)--> [Script Generator Agent] |--(A2A, concurrent)--> [Media Generator Agent] (images/videos) |--(local)--> [Compile video + audio] |--(local)--> [Upload to IPFS] v [Return final video URL]
- Prerequisites
- Installation
- Environment Variables
- Project Structure
- Main Orchestration Flow
- A2A Protocol & Interoperability
- Usage
- License
- Node.js (>= 14 recommended)
- TypeScript (project built on version 4.x or later)
- Pinata account with API key and secret for IPFS uploads (optional, for video storage)
- Clone the repository:
git clone https://github.com/nevermined-io/music-video-orchestrator-a2a.git cd music-video-orchestrator-a2a - Install dependencies:
npm install
- Build the project (optional step if you want the compiled JS):
npm run build
Rename .env.example to .env and configure all relevant environment variables. Example:
# Pinata Config for IPFS uploads (optional)
PINATA_API_KEY=your_pinata_api_key
PINATA_API_SECRET=your_pinata_api_secret.
├── src
│ ├── agents
│ │ ├── a2aAgentClient.ts # A2A client for fetching agent cards
│ │ └── a2aResultExtractor.ts # Utilities to extract characters, settings, scenes
│ ├── controllers
│ │ └── a2aController.ts # Main A2A controller
│ ├── core
│ │ ├── errorHandler.ts # Error handling
│ │ ├── logger.ts # Logging utility
│ │ ├── sessionManager.ts # Session management
│ │ ├── taskProcessor.ts # Task processing logic
│ │ ├── taskQueue.ts # Task queue management
│ │ ├── taskStore.ts # In-memory task storage
│ │ └── videoUtils.ts # Video compilation utilities
│ ├── models
│ │ ├── a2aEventType.ts # Unified A2A event type enum
│ │ └── task.ts # Task and TaskState definitions
│ ├── routes
│ │ └── a2aRoutes.ts # Express routes for A2A endpoints
│ ├── services
│ │ ├── orchestrationTasks.ts # Song, script, image and video generation helpers
│ │ ├── pushNotificationService.ts# SSE & webhook notifications
│ │ ├── streamingService.ts # SSE streaming service
│ │ └── uploadVideoToIPFS.ts # Uploads compiled video to IPFS
│ ├── orchestrator.ts # Main orchestration workflow
│ └── server.ts # Express server entry point
├── package.json
├── README.md
├── tsconfig.json
├── .gitignore
└── .env
The main orchestration logic is implemented in src/orchestrator.ts:
- Agent Discovery: Fetches the agentCard of each sub-agent (song, script, media) using A2A discovery.
- Song Generation: Sends a prompt to the Song Generator Agent (A2A) and receives the song, audio URL, and title.
- Script Generation: Sends the prompt and song result to the Script Generator Agent (A2A) and receives the script.
- Extraction: Extracts characters, settings, and scenes from the script using utility functions.
- Media Generation: Requests the Media Agent (A2A) to generate images for characters/settings and video clips for each scene.
- Compilation: Compiles all video clips and the audio track into a final music video (MP4) using FFmpeg utilities.
- IPFS Upload: Uploads the final video to IPFS via Pinata and returns the IPFS URL.
- Result: Returns a structured result with all generated artifacts and metadata.
- The orchestrator exposes endpoints compatible with the A2A (Agent-to-Agent) protocol, supporting:
- JSON-RPC 2.0 methods:
/tasks/send,/tasks/sendSubscribe, etc. - Real-time updates via Server-Sent Events (SSE)
- Push notifications via webhooks
- Agent discovery via
/.well-known/agent.json(agentCard)
- JSON-RPC 2.0 methods:
- Any other agent or client that implements A2A can interoperate with this orchestrator.
Once your environment is set up and dependencies are installed, run:
npm run build
npm startThe Orchestrator will:
- Expose A2A endpoints for task orchestration and status updates
- Accept music video prompts and coordinate sub-agents using A2A
- Stream task updates and results via SSE or webhooks
To request a music video orchestration, send a POST to /tasks/send with a body like:
{
"jsonrpc": "2.0",
"id": "req-001",
"method": "tasks/send",
"params": {
"id": "task-001",
"sessionId": "session-001",
"message": {
"role": "user",
"parts": [
{ "type": "text", "text": "Create a cyberpunk rap anthem about AI collaboration." }
]
}
}
}The message parameter must be an object with a role (usually user) and a parts array, where the first part contains the prompt text.
Apache License 2.0
Copyright 2025 Nevermined AG
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
