You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -324,7 +324,7 @@ torchtune focuses on integrating with popular tools and libraries from the ecosy
324
324
-[EleutherAI's LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) for [evaluating](recipes/eleuther_eval.py) trained models
325
325
-[Hugging Face Datasets](https://huggingface.co/docs/datasets/en/index) for [access](torchtune/datasets/_instruct.py) to training and evaluation datasets
326
326
-[PyTorch FSDP2](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md) for distributed training
327
-
-[torchao](https://github.com/pytorch-labs/ao) for lower precision dtypes and [post-training quantization](recipes/quantize.py) techniques
327
+
-[torchao](https://github.com/pytorch/ao) for lower precision dtypes and [post-training quantization](recipes/quantize.py) techniques
328
328
-[Weights & Biases](https://wandb.ai/site) for [logging](https://pytorch.org/torchtune/main/deep_dives/wandb_logging.html) metrics and checkpoints, and tracking training progress
329
329
-[Comet](https://www.comet.com/site/) as another option for [logging](https://pytorch.org/torchtune/main/deep_dives/comet_logging.html)
330
330
-[ExecuTorch](https://pytorch.org/executorch-overview) for [on-device inference](https://github.com/pytorch/executorch/tree/main/examples/models/llama2#optional-finetuning) using finetuned models
@@ -351,7 +351,7 @@ We really value our community and the contributions made by our wonderful users.
351
351
The transformer code in this repository is inspired by the original [Llama2 code](https://github.com/meta-llama/llama/blob/main/llama/model.py). We also want to give a huge shout-out to EleutherAI, Hugging Face and
352
352
Weights & Biases for being wonderful collaborators and for working with us on some of these integrations within torchtune. In addition, we want to acknowledge some other awesome libraries and tools from the ecosystem:
353
353
354
-
-[gpt-fast](https://github.com/pytorch-labs/gpt-fast) for performant LLM inference techniques which we've adopted out-of-the-box
354
+
-[gpt-fast](https://github.com/meta-pytorch/gpt-fast) for performant LLM inference techniques which we've adopted out-of-the-box
355
355
-[llama recipes](https://github.com/meta-llama/llama-recipes) for spring-boarding the llama2 community
356
356
-[bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for bringing several memory and performance based techniques to the PyTorch ecosystem
357
357
-[@winglian](https://github.com/winglian/) and [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for early feedback and brainstorming on torchtune's design and feature set.
Copy file name to clipboardExpand all lines: docs/source/tutorials/e2e_flow.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -341,7 +341,7 @@ these parameters.
341
341
Introduce some quantization
342
342
~~~~~~~~~~~~~~~~~~~~~~~~~~~
343
343
344
-
We rely on `torchao <https://github.com/pytorch-labs/ao>`_ for `post-training quantization <https://github.com/pytorch/ao/tree/main/torchao/quantization#quantization>`_.
344
+
We rely on `torchao <https://github.com/pytorch/ao>`_ for `post-training quantization <https://github.com/pytorch/ao/tree/main/torchao/quantization#quantization>`_.
345
345
To quantize the fine-tuned model after installing torchao we can run the following command::
346
346
347
347
# we also support `int8_weight_only()` and `int8_dynamic_activation_int8_weight()`, see
Copy file name to clipboardExpand all lines: docs/source/tutorials/llama3.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -241,7 +241,7 @@ Running generation with our LoRA-finetuned model, we see the following output:
241
241
Faster generation via quantization
242
242
----------------------------------
243
243
244
-
We rely on `torchao <https://github.com/pytorch-labs/ao>`_ for `post-training quantization <https://github.com/pytorch/ao/tree/main/torchao/quantization#quantization>`_.
244
+
We rely on `torchao <https://github.com/pytorch/ao>`_ for `post-training quantization <https://github.com/pytorch/ao/tree/main/torchao/quantization#quantization>`_.
245
245
To quantize the fine-tuned model after installing torchao we can run the following command::
246
246
247
247
# we also support `int8_weight_only()` and `int8_dynamic_activation_int8_weight()`, see
Copy file name to clipboardExpand all lines: docs/source/tutorials/qlora_finetune.rst
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ accuracy.
42
42
43
43
The QLoRA authors introduce two key abstractions to decrease memory usage and avoid accuracy degradation: the bespoke 4-bit NormatFloat
44
44
type, and a double quantization method that quantizes the quantization parameters themselves to save even more memory. torchtune uses
45
-
the `NF4Tensor <https://github.com/pytorch-labs/ao/blob/b9beaf351e27133d189b57d6fa725b1a7824a457/torchao/dtypes/nf4tensor.py#L153>`_ abstraction from the `torchao library <https://github.com/pytorch-labs/ao>`_ to build QLoRA components as specified in the paper.
45
+
the `NF4Tensor <https://github.com/pytorch/ao/blob/b9beaf351e27133d189b57d6fa725b1a7824a457/torchao/dtypes/nf4tensor.py#L153>`_ abstraction from the `torchao library <https://github.com/pytorch/ao>`_ to build QLoRA components as specified in the paper.
46
46
torchao is a PyTorch-native library that allows you to quantize and prune your models.
47
47
48
48
@@ -275,7 +275,7 @@ As mentioned above, torchtune takes a dependency on torchao for some of the core
275
275
276
276
The key changes on top of the LoRA layer are the usage of the ``to_nf4``and``linear_nf4`` APIs.
277
277
278
-
``to_nf4`` accepts an unquantized (bf16 or fp32) tensor and produces an ``NF4`` representation of the weight. See the `implementation <https://github.com/pytorch-labs/ao/blob/c40358072f99b50cd7e58ec11e0e8d90440e3e25/torchao/dtypes/nf4tensor.py#L587>`_ of ``to_nf4`` for more details.
278
+
``to_nf4`` accepts an unquantized (bf16 or fp32) tensor and produces an ``NF4`` representation of the weight. See the `implementation <https://github.com/pytorch/ao/blob/c40358072f99b50cd7e58ec11e0e8d90440e3e25/torchao/dtypes/nf4tensor.py#L587>`_ of ``to_nf4`` for more details.
279
279
``linear_nf4`` handles the forward passand autograd when running with quantized base model weights. It computes the forward passas a regular
280
280
``F.linear``with the incoming activation and unquantized weight. The quantized weight is saved for backward, as opposed to the unquantized version of the weight, to avoid extra
281
-
memory usage due to storing higher precision variables to compute gradients in the backward pass. See `linear_nf4 <https://github.com/pytorch-labs/ao/blob/main/torchao/dtypes/nf4tensor.py#L577>`_ for more details.
281
+
memory usage due to storing higher precision variables to compute gradients in the backward pass. See `linear_nf4 <https://github.com/pytorch/ao/blob/main/torchao/dtypes/nf4tensor.py#L577>`_ for more details.
0 commit comments