Skip to content

No output vcf obtained from custom model #869

Closed
@bioinformaticaomicalabs

Description

ISSUE
First of all, I found DeepVariant to be a very good and innovative tool. I'm considering including it in my exome analysis pipeline. I followed the tutorial (DeepVariant worked correctly with the Complete Genomics model), and I created my own model using Genome in a Bottle samples. To do this, I sequenced the same reference sample three times to use each BAM file for training, validation, and testing. I didn't encounter any errors during the model creation process, but when I tried to test it, the process got stuck at the call_variants step.

Setup

  • Operating system: Ubuntu 22.04.4 LTS
  • DeepVariant version:1.6.1
  • Installation method:docker
  • Type of data: MGI DNBSEQ 400, exome sequencing.

Steps to reproduce:

  • Command:
    Create examples for trainning set
    sudo docker run -v "${PWD}/input":"/input" -v "${PWD}/REF":"/ref" -v "${PWD}"/output:"/output" google/deepvariant:"1.6.1" make_examples --mode training --ref "/ref/GRCh38.p14.genome.fa" --reads "/input/26_r_groups.bam" --examples "/output/training_set.gz" --truth_variants "/ref/HG001_GRCh38_1_22_v4.2.1_benchmark_ROCHE.vcf.gz" --confident_regions "/ref/HG001_GRCh38_1_22_v4.2.1_benchmark_ROCHE.bed"
    Create examples for validation set
    sudo docker run -v "${PWD}/input":"/input" -v "${PWD}/REF":"/ref" -v "${PWD}"/output:"/output" google/deepvariant:"1.6.1" make_examples --mode training --ref "/ref/GRCh38.p14.genome.fa" --reads "/input/27_r_groups.bam" --examples "/output/validation_set.gz" --truth_variants "/ref/HG001_GRCh38_1_22_v4.2.1_benchmark_ROCHE.vcf.gz" --confident_regions "/ref/HG001_GRCh38_1_22_v4.2.1_benchmark_ROCHE.bed"
    Trainning Shuffling
    python3 scripts/shuffle_tfrecords_beam.py --input_pattern_list=output/training_set.gz --output_pattern_prefix="output/training_shuffled" --output_dataset_name="26" --output_dataset_config_pbtxt="output/training.pbtxt" --job_name=shuffle-tfrecords
    Validation Shuffling
    python3 scripts/shuffle_tfrecords_beam.py --input_pattern_list=output/validation_set.gz --output_pattern_prefix="output/validation_shuffled" --output_dataset_name="27" --output_dataset_config_pbtxt="output/validation.pbtxt" --job_name=shuffle-tfrecords
    Model trainning
    sudo docker run -v "${PWD}/input":"/input" -v "${PWD}/REF":"/ref" -v "${PWD}"/output:"/output" google/deepvariant:"1.6.1" train --config=/input/dv_config.py:base --config.train_dataset_pbtxt="/output/training.pbtxt" --config.tune_dataset_pbtxt="/output/validation.pbtxt" --config.num_epochs=10 --config.learning_rate=0.0001 --config.num_validation_examples=0 --strategy=mirrored --experiment_dir="/output/" --config.batch_size=512
    Model test
    sudo docker run -v "${PWD}/input":"/input" -v "${PWD}/REF":"/ref" -v "${PWD}"/output:"/output" google/deepvariant:"1.6.1" /opt/deepvariant/bin/run_deepvariant --model_type WES --customized_model "/output/checkpoints/ckpt-679" --ref "/ref/GRCh38.p14.genome.fa" --reads "/input/33_r_groups.bam" --output_vcf "/output/33.vcf.gz" --output_gvcf "/output/33.g.vcf.gz" -intermediate_results_dir "/output/intermediate_results_dir" --num_shards 10

  • Error trace:
    ` - I0822 07:51:54.272576 127450974123840 make_examples_core.py:301] Task 17/20: Writing example info to /output/intermediate_results_dir/make_examples.tfrecord-00017-of-00020.gz.example_info.json
    I0822 07:51:54.272647 127450974123840 make_examples_core.py:2958] example_shape = [100, 221, 7]
    I0822 07:51:54.272740 127450974123840 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 19]
    I0822 07:51:54.272911 127450974123840 make_examples_core.py:301] Task 17/20: Found 17451 candidate variants
    I0822 07:51:54.272932 127450974123840 make_examples_core.py:301] Task 17/20: Created 18817 examples
    I0822 07:52:09.283522 133276175411008 make_examples_core.py:301] Task 8/20: Writing example info to /output/intermediate_results_dir/make_examples.tfrecord-00008-of-00020.gz.example_info.json
    I0822 07:52:09.283617 133276175411008 make_examples_core.py:2958] example_shape = [100, 221, 7]
    I0822 07:52:09.283712 133276175411008 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 19]
    I0822 07:52:09.283882 133276175411008 make_examples_core.py:301] Task 8/20: Found 17371 candidate variants
    I0822 07:52:09.283904 133276175411008 make_examples_core.py:301] Task 8/20: Created 18820 examples

real 34m15.728s
user 624m43.553s
sys 2m24.932s

***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "/output/intermediate_results_dir/call_variants_output.tfrecord.gz" --examples "/output/intermediate_results_dir/[email protected]" --checkpoint "/output/checkpoints/ckpt-679"

/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: tensorflow/addons#2807

warnings.warn(
I0822 07:52:10.812179 127086447671104 call_variants.py:563] Total 1 writing processes started.
I0822 07:52:10.813103 127086447671104 dv_utils.py:370] From /output/intermediate_results_dir/make_examples.tfrecord-00000-of-00020.gz.example_info.json: Shape of input examples: [100, 221, 7], Channels of input examples: [1, 2, 3, 4, 5, 6, 19].
I0822 07:52:10.813141 127086447671104 call_variants.py:588] Shape of input examples: [100, 221, 7]
I0822 07:52:10.813338 127086447671104 call_variants.py:592] Use saved model: True
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 789, in
app.run(main)
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/absl_py/absl/app.py", line 312, in run
_run_main(main, args)
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/absl_py/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 768, in main
call_variants(
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 598, in call_variants
model_example_shape = dv_utils.get_shape_and_channels_from_json(
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/com_google_deepvariant/deepvariant/dv_utils.py", line 367, in get_shape_and_channels_from_json
example_info = json.load(f)
File "/usr/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Process ForkProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/tmp/Bazel.runfiles_5v5s5_vp/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 454, in post_processing
item = output_queue.get(timeout=180)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 108, in get
raise Empty
_queue.Empty

real 3m2.335s
user 0m7.450s
sys 0m4.274s`

Does the quick start test work on your system?
Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions