22Hyperparameter tuning with Ray Tune
33===================================
44
5- **Author:** `Ricardo Decal <https://github.com/crypdick>`_
5+ **Author:** `Ricardo Decal <https://github.com/crypdick>`__
66
77This tutorial shows how to integrate Ray Tune into your PyTorch training
88workflow to perform scalable and efficient hyperparameter tuning.
5757
5858######################################################################
5959# How to use PyTorch data loaders with Ray Tune
60- # ---------------------------------------------
60+ # =============================================
6161#
6262# Wrap the data loaders in a constructor function. Pass a global data
6363# directory here to reuse the dataset across different trials.
@@ -80,10 +80,11 @@ def load_data(data_dir="./data"):
8080
8181######################################################################
8282# Configure the hyperparameters
83- # -----------------------------
83+ # =============================
8484#
85- # In this example, we specify the layer sizes of the fully connected
86- # layers.
85+ # In this tutorial, we will tune the sizes of the fully connected layers
86+ # and the learning rate. In order to do so, we need to expose the layer
87+ # sizes and the learning rate as configurable parameters.
8788
8889class Net (nn .Module ):
8990 def __init__ (self , l1 = 120 , l2 = 84 ):
@@ -106,7 +107,7 @@ def forward(self, x):
106107
107108######################################################################
108109# Use a train function with Ray Tune
109- # ----------------------------------
110+ # ==================================
110111#
111112# Now it gets interesting, because we introduce some changes to the
112113# example `from the PyTorch
@@ -144,13 +145,13 @@ def forward(self, x):
144145#
145146# optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
146147#
147- # We also split the dataset into training and validation subsets.
148- # We thus train on 80% of the data and calculate the validation loss on
149- # the remaining 20%. The batch sizes with which we iterate through the
148+ # We also split the dataset into training and validation subsets. We thus
149+ # train on 80% of the data and calculate the validation loss on the
150+ # remaining 20%. The batch sizes with which we iterate through the
150151# training and test sets are configurable by Ray Tune.
151152#
152153# Add multi-GPU support with DataParallel
153- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154+ # ---------------------------------------
154155#
155156# Image classification benefits largely from GPUs. Luckily, you can
156157# continue to use PyTorch tools in Ray Tune. Thus, you can wrap the model
@@ -182,7 +183,7 @@ def forward(self, x):
182183# the GPU memory. We will return to that later.
183184#
184185# Communicating with Ray Tune
185- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~
186+ # ---------------------------
186187#
187188# The most interesting part is the communication with Ray Tune. As you’ll
188189# see, integrating Ray Tune into your training code requires only a few
@@ -226,7 +227,7 @@ def forward(self, x):
226227# remains standard PyTorch.
227228#
228229# Full training function
229- # ~~~~~~~~~~~~~~~~~~~~~~
230+ # ----------------------
230231#
231232# The full code example looks like this:
232233
@@ -336,7 +337,7 @@ def train_cifar(config, data_dir=None):
336337# example.
337338#
338339# Compute test set accuracy
339- # -------------------------
340+ # =========================
340341#
341342# Commonly the performance of a machine learning model is tested on a
342343# held-out test set with data that has not been used for training the
@@ -367,7 +368,7 @@ def test_accuracy(net, device="cpu"):
367368# set validation on a GPU.
368369#
369370# Configure the search space
370- # --------------------------
371+ # ==========================
371372#
372373# Lastly, we need to define Ray Tune’s search space. Ray Tune offers a
373374# variety of `search space
@@ -395,7 +396,7 @@ def test_accuracy(net, device="cpu"):
395396# the search space is explored efficiently across different magnitudes.
396397#
397398# Smarter sampling and scheduling
398- # -------------------------------
399+ # ===============================
399400#
400401# To make the hyperparameter search process efficient, Ray Tune provides
401402# two main controls:
@@ -406,7 +407,7 @@ def test_accuracy(net, device="cpu"):
406407# such as
407408# `Optuna <https://docs.ray.io/en/latest/tune/api/suggestion.html#optuna>`__
408409# or
409- # ```bayesopt`` <https://docs.ray.io/en/latest/tune/api/suggestion.html#bayesopt>`__,
410+ # `BayesOpt <https://docs.ray.io/en/latest/tune/api/suggestion.html#bayesopt>`__,
410411# instead of relying only on random or grid search.
411412# 2. It can detect underperforming trials and stop them early using
412413# `schedulers <https://docs.ray.io/en/latest/tune/key-concepts.html#tune-schedulers>`__,
@@ -417,7 +418,7 @@ def test_accuracy(net, device="cpu"):
417418# terminates low-performing trials to save computational resources.
418419#
419420# Configure the resources
420- # -----------------------
421+ # =======================
421422#
422423# Tell Ray Tune what resources should be available for each trial using
423424# ``tune.with_resources``:
@@ -436,11 +437,11 @@ def test_accuracy(net, device="cpu"):
436437#
437438# For example, if you are running this experiment on a cluster of 20
438439# machines, each with 8 GPUs, you can set ``gpus_per_trial = 0.5`` to
439- # schedule 2 concurrent trials per GPU. This configuration runs 320 trials
440- # in parallel across the cluster.
440+ # schedule two concurrent trials per GPU. This configuration runs 320
441+ # trials in parallel across the cluster.
441442#
442443# Putting it together
443- # -------------------
444+ # ===================
444445#
445446# The Ray Tune API is designed to be modular and composable: you pass your
446447# configurations to the ``tune.Tuner`` class to create a tuner object,
@@ -560,21 +561,20 @@ def main(num_trials=10, max_num_epochs=10, gpus_per_trial=2):
560561# You can now tune the parameters of your PyTorch models.
561562#
562563# Observability
563- # -------------
564+ # =============
564565#
565566# When running large-scale experiments, monitoring is crucial. Ray
566567# provides a
567568# `Dashboard <https://docs.ray.io/en/latest/ray-observability/getting-started.html>`__
568569# that lets you view the status of your trials, check cluster resource
569570# utilization, and inspect logs in real-time.
570571#
571- # For debugging, Ray also offers `Distributed
572- # Debugging <https://docs.ray.io/en/latest/ray-observability/index.html>`__
573- # tools that let you attach a debugger to running trials across the
574- # cluster.
572+ # For debugging, Ray also offers `distributed debugging
573+ # tools <https://docs.ray.io/en/latest/ray-observability/index.html>`__
574+ # that let you attach a debugger to running trials across the cluster.
575575#
576576# Conclusion
577- # ----------
577+ # ==========
578578#
579579# In this tutorial, you learned how to tune the hyperparameters of a
580580# PyTorch model using Ray Tune. You saw how to integrate Ray Tune into
@@ -588,7 +588,7 @@ def main(num_trials=10, max_num_epochs=10, gpus_per_trial=2):
588588# efficiently.
589589#
590590# Further reading
591- # ---------------
591+ # ===============
592592#
593593# - `Ray Tune
594594# documentation <https://docs.ray.io/en/latest/tune/index.html>`__
0 commit comments