-
Notifications
You must be signed in to change notification settings - Fork 42
Training & Testing Details
Junyong Lee edited this page Feb 4, 2022
·
2 revisions
# multi GPU (with DistributedDataParallel) example
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
--is_train \
--mode IFAN \
--config config_IFAN \
--trainer trainer \
--network IFAN \
-b 2 \
-th 8 \
-dl \
-ss \
-dist
# resuming example (trainer will load checkpoint saved after 100 epoch, training will resume from 101 epoch)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
... \
-th 8 \
-r 100 \
-ss \
-dist
# single GPU (with DataParallel) example
CUDA_VISIBLE_DEVICES=0 python -B run.py \
... \
-ssNote:
- The image loss (MSE) will be applied no matter what.
- If
IFANis included in[mode], it will trigger both disparity and reblurring losses.- To separately apply each of the disparity and the reblurring losses, do not include
IFANin[mode]but trigger each loss by includingD(for the disparity loss) orR(for the reblurring loss) in[mode](e.g.,--mode my_net_D).- To train a network that takes dual-pixel stereo images as an input,
dualshould be included in the option[mode], andIFAN_dualshould be specified for the option[network].
- Options
-
--is_train: If it is specified,run.pywill train the network. Default:False -
--mode: The name of a model to train. The logging folder named with the[mode]will be created as[LOG_ROOT]/IFAN_CVPR2021/[mode]/. Default:IFAN -
--config: The name of a config file located as./config/[config].py. Default:None, and the default should not be changed. -
--trainer: The name of a trainer file located as./models/trainers/[trainer].py. Default:trainer -
--network: The name of a network file located as./models/archs/[network].py. Default:IFAN -
-b,--batch_size: The batch size. For the multi GPUs (DistributedDataParallel), the total batch size will be,nproc_per_node * b. Default: 8 -
-th,--thread_num: The number of threads (num_workers) for the data loader. Default: 8 -
-dl,--delete_log: The option whether to delete logs under[mode](i.e.,[LOG_ROOT]/IFAN_CVPR2021/[mode]/*). The option works only when--is_trainis specified. Default:False -
-r,--resume: Resume training with the specified epoch (e.g.,-r 100). Note that-dlshould not be specified with this option. -
-ss,--save_sample: Save sample images for both training and testing. Images will be saved in[LOG_ROOT]/RefVSR_CVPR2021/[mode]/sample/. Default:False -
-dist: Enables multi-processing withDistributedDataParallel. Default:False
-
CUDA_VISIBLE_DEVICES=0 python run.py --mode [mode] --data [DATASET]
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py --mode IFAN --data DPDDNote:
- Specify only
[mode]of the trained model.[config]doesn't have to be specified, as it will be automatically loaded.- Testing results will be saved in
[LOG_ROOT]/IFAN_CVPR2021/[mode]/result/quanti_quali/[mode]_[epoch]/[data]/.
- Options
-
--mode: The name of a model to test. -
--data: The name of a dataset for evaluation:DPDD|RealDOF|CUHK|PixelDP|random. Default:DPDD- The data structure can be modified by the function
set_eval_path(..)in./configs/config.py. -
randomis for testing models with any images, which should be placed as[DATASET_ROOT]/random/*.[jpg|png].
- The data structure can be modified by the function
-
-ckpt_name: Loads the checkpoint with the name of the checkpoint under[LOG_ROOT]/IFAN_CVPR2021/[mode]/checkpoint/train/epoch/ckpt/(e.g.,python run.py --mode IFAN --data DPDD --ckpt_name IFAN_00100.pytorch). -
-ckpt_abs_name. Loads the checkpoint of the absolute path (e.g.,python run.py --mode IFAN --data DPDD --ckpt_abs_name ./ckpt/IFAN.pytorch). -
-ckpt_epoch: Loads the checkpoint of the specified epoch (e.g.,python run.py --mode IFAN --data DPDD --ckpt_epoch 100). -
-ckpt_sc: Loads the checkpoint with the best validation score (e.g.,python run.py --mode IFAN --data DPDD --ckpt_sc)
-