-
Notifications
You must be signed in to change notification settings - Fork 617
Description
Is your feature request related to a problem? Please describe.
Using --batch in snakemake inherently sorts the list of input files alphabetically before making batches, which in some cases puts a downstream file in the first batch and already generates the whole DAG in the first batch.
Here is a minimal example:
Snakefile
NUM = list(range(10))
rule all:
input:
expand("folder/file{num}.txt", num = NUM),
expand("folder/tile{num}.txt", num = NUM),
"folder/a.txt",
rule makefile:
output:
touch("folder/file{num}.txt")
rule maketile:
input:
"folder/file{num}.txt"
output:
touch("folder/tile{num}.txt")
rule agg:
input:
expand("folder/file{num}.txt", num = NUM),
expand("folder/tile{num}.txt", num = NUM)
output:
touch("folder/a.txt")
And here is the batch file that runs snakemake:
run_batches.bat
@echo off
setlocal EnableDelayedExpansion
:: Total number of batches
set TOTAL_BATCHES=5
:: Rule to batch on
set RULE=all
:: Loop through each batch
for /L %%I in (1,1,%TOTAL_BATCHES%) do (
echo Running batch %%I of %TOTAL_BATCHES%...
snakemake --cores 1 --batch %RULE%=%%I/%TOTAL_BATCHES%
if errorlevel 1 (
echo Error in batch %%I
exit /b 1
)
)
echo All batches completed.
endlocal
All outputs are already generated in the first batch, while the remaining batches have nothing to do because when the files in input all are sorted alphabetically, the final file a.txt will be in the first batch.
Describe the solution you'd like
I would like that the batching keeps the order of the files I provide in input all.
Describe alternatives you've considered
Comment out or remove the sorting line:
https://github.com/snakemake/snakemake/blob/9598655f9ba99380b13b8a5797110352d41d2831/src/snakemake/settings/types.py#L158
By commenting this out in my local installation of snakemake, I got the behaviour I wanted.