Skip to content

queelius/jaf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

50 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

JAF - Just Another Flow

PyPI version License: MIT

JAF (Just Another Flow) is a powerful streaming data processing system for JSON/JSONL data with a focus on lazy evaluation, composability, and a fluent API.

Features

  • ๐Ÿš€ Streaming Architecture - Process large datasets without loading everything into memory
  • ๐Ÿ”— Lazy Evaluation - Build complex pipelines that only execute when needed
  • ๐ŸŽฏ Fluent API - Intuitive method chaining for readable code
  • ๐Ÿงฉ Composable - Combine operations freely, integrate with other tools
  • ๐Ÿ“ฆ Multiple Sources - Files, directories, stdin, memory, compressed files, infinite streams
  • ๐Ÿ› ๏ธ Unix Philosophy - Works great with pipes and other command-line tools

Installation

pip install jaf

Quick Start

Command Line

# Filter JSON data (lazy by default)
jaf filter users.jsonl '["gt?", "@age", 25]'

# Evaluate immediately
jaf filter users.jsonl '["gt?", "@age", 25]' --eval

# Chain operations
jaf filter users.jsonl '["eq?", "@status", "active"]' | \
jaf map - "@email" | \
jaf eval -

# Combine with other tools
jaf filter logs.jsonl '["eq?", "@level", "ERROR"]' --eval | \
ja groupby service

Python API

from jaf import stream

# Build a pipeline
pipeline = stream("users.jsonl") \
    .filter(["gt?", "@age", 25]) \
    .map(["dict", "name", "@name", "email", "@email"]) \
    .take(10)

# Execute when ready
for user in pipeline.evaluate():
    print(user)

Core Concepts

Lazy Evaluation

Operations don't execute until you call .evaluate() or use --eval:

# This doesn't read any data yet
pipeline = stream("huge_file.jsonl") \
    .filter(["contains?", "@tags", "important"]) \
    .map("@message")

# Now it processes data
for message in pipeline.evaluate():
    process(message)

Query Language

JAF uses S-expression syntax for queries:

# Simple comparisons
["eq?", "@status", "active"]         # status == "active"
["gt?", "@age", 25]                  # age > 25
["contains?", "@tags", "python"]     # "python" in tags

# Boolean logic
["and", 
    ["gte?", "@age", 18],
    ["eq?", "@verified", true]
]

# Path navigation with @
["eq?", "@user.profile.name", "Alice"]  # Nested access
["any", "@items.*.inStock"]             # Wildcard
["exists?", "@**.error"]                # Recursive search

Streaming Operations

  • filter - Keep items matching a predicate
  • map - Transform each item
  • take/skip - Limit or paginate results
  • batch - Group items into chunks
  • Boolean ops - AND, OR, NOT on filtered streams

Documentation

Examples

Log Analysis

# Find errors in specific services
errors = stream("app.log.jsonl") \
    .filter(["and",
        ["eq?", "@level", "ERROR"],
        ["in?", "@service", ["api", "auth"]]
    ]) \
    .map(["dict", 
        "time", "@timestamp",
        "service", "@service",
        "message", "@message"
    ]) \
    .evaluate()

Data Validation

# Find invalid records
invalid = stream("users.jsonl") \
    .filter(["or",
        ["not", ["exists?", "@email"]],
        ["not", ["regex-match?", "@email", "^[^@]+@[^@]+\\.[^@]+$"]]
    ]) \
    .evaluate()

ETL Pipeline

# Transform and filter data
pipeline = stream("raw_sales.jsonl") \
    .filter(["eq?", "@status", "completed"]) \
    .map(["dict",
        "date", ["date", "@timestamp"],
        "amount", "@amount",
        "category", ["if", ["gt?", "@amount", 1000], "high", "low"]
    ]) \
    .batch(1000)

# Process in chunks
for batch in pipeline.evaluate():
    bulk_insert(batch)

Integration

JAF works seamlessly with other tools:

# With jsonl-algebra
jaf filter orders.jsonl '["gt?", "@amount", 100]' --eval | \
ja groupby customer_id --aggregate 'total:amount:sum'

# With jq
jaf filter data.jsonl '["exists?", "@metadata"]' --eval | \
jq '.metadata'

# With standard Unix tools
jaf map users.jsonl "@email" --eval | sort | uniq -c

Performance

JAF is designed for streaming large datasets:

  • Processes one item at a time
  • Minimal memory footprint
  • Early termination (e.g., with take)
  • Efficient pipeline composition

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

JAF is licensed under the MIT License. See LICENSE for details.

Related Projects

  • jsonl-algebra - Relational operations on JSONL
  • jq - Command-line JSON processor