Skip to content

Batch option sorts the files #3880

@AdemSaglamRB

Description

@AdemSaglamRB

Is your feature request related to a problem? Please describe.
Using --batch in snakemake inherently sorts the list of input files alphabetically before making batches, which in some cases puts a downstream file in the first batch and already generates the whole DAG in the first batch.

Here is a minimal example:

Snakefile

NUM = list(range(10))

rule all:
    input:
        expand("folder/file{num}.txt", num = NUM),
        expand("folder/tile{num}.txt", num = NUM),
        "folder/a.txt",
        

rule makefile:
    output:
        touch("folder/file{num}.txt")

rule maketile:
    input:
        "folder/file{num}.txt"
    output:
        touch("folder/tile{num}.txt")

rule agg:
    input:
        expand("folder/file{num}.txt", num = NUM),
        expand("folder/tile{num}.txt", num = NUM)
    output:
        touch("folder/a.txt")

And here is the batch file that runs snakemake:

run_batches.bat

@echo off
setlocal EnableDelayedExpansion

:: Total number of batches
set TOTAL_BATCHES=5

:: Rule to batch on
set RULE=all

:: Loop through each batch
for /L %%I in (1,1,%TOTAL_BATCHES%) do (
    echo Running batch %%I of %TOTAL_BATCHES%...
    snakemake --cores 1 --batch %RULE%=%%I/%TOTAL_BATCHES%
    if errorlevel 1 (
        echo Error in batch %%I
        exit /b 1
    )
)

echo All batches completed.
endlocal

All outputs are already generated in the first batch, while the remaining batches have nothing to do because when the files in input all are sorted alphabetically, the final file a.txt will be in the first batch.

Describe the solution you'd like

I would like that the batching keeps the order of the files I provide in input all.

Describe alternatives you've considered

Comment out or remove the sorting line:
https://github.com/snakemake/snakemake/blob/9598655f9ba99380b13b8a5797110352d41d2831/src/snakemake/settings/types.py#L158

By commenting this out in my local installation of snakemake, I got the behaviour I wanted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions