Skip to content

Conversation

hiyouga
Copy link
Owner

@hiyouga hiyouga commented Aug 5, 2025

Apply LoRA Fine-Tuning on GPT-OSS model in 3 steps

1. Install LlamaFactory and transformers

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
pip install "transformers==4.55.0"

2. Train GPT-OSS on a single GPU (> 44GB) (multi-GPU is also supported)

llamafactory-cli train examples/train_lora/gpt_lora_sft.yaml

3. Merge the LoRA weight into the base model

llamafactory-cli export --model_name_or_path openai/gpt-oss-20b --adapter_name_or_path saves/gpt-20b/lora/sft --export_dir gpt_merged

(Optional) Chat with the fine-tuned model

llamafactory-cli chat --model_name_or_path gpt_merged --template gpt --skip_special_tokens False

Full fine-tuning recipes

See #8837

image

Use Web UI to fine-tune the model:

@hiyouga hiyouga added the solved This problem has been already solved label Aug 5, 2025
@hiyouga hiyouga merged commit 706b3e5 into main Aug 5, 2025
16 checks passed
@hiyouga hiyouga deleted the hiyouga/gpt branch August 5, 2025 21:56
@yuimo
Copy link

yuimo commented Aug 6, 2025

what is the weight precision after finetune? is it still mxfp4 for the moe layer?

@ziheng0924
Copy link

When will full-parameter fine-tuning be supported?

@hiyouga
Copy link
Owner Author

hiyouga commented Aug 6, 2025

@ziheng0924 full finetuning is supported

@hiyouga
Copy link
Owner Author

hiyouga commented Aug 6, 2025

@yuimo the lora weights will be fp32 format

@PROoshio
Copy link

PROoshio commented Aug 6, 2025

VLLM推理什么时候支持呢?

@BenjaminBossan
Copy link

Hi, I just saw this PR to support gpt-oss. IIUC, with this recipe, only the nn.Linear layers are being targeted. To target the MoE layers, you'd have to use a new PEFT feature, namely LoraConfig(target_parameters=[...]). The OpenAI cookbook has an example of that.

Generally, targeting just the MoE layers may be fine, but the majority of parameters reside in the MoE layers (90% for 20b), so users may want to target those too. On the other hand, this will be much more expensive memory-wise, so there is a trade-off here. If you have questions about the new PEFT feature, LMK.

@hiyouga
Copy link
Owner Author

hiyouga commented Aug 6, 2025

@BenjaminBossan Sure, I agree with you. I'm excited to dive into exploring these new PEFT features. Thank you for pointing it out.

@RalphMao
Copy link

RalphMao commented Aug 6, 2025

VLLM推理什么时候支持呢?

vLLM doesn't support bf16 ckpt yet

vllm-project/vllm#22380

@Pikachu1412
Copy link

image 加载oss 120b模型时CPU内存异常增长,导致oom

@KosmoCHE
Copy link

KosmoCHE commented Aug 7, 2025

what is the weight precision after finetune? is it still mxfp4 for the moe layer?

it will be bf16, if triton < 3.4.0

@Opdoop
Copy link

Opdoop commented Aug 7, 2025

After finetuning the gpt-oss model, I have a BF16 weight. Is there any tool to create the MXFP4 weight from BF16 weight?

@liuqianchao
Copy link

+1, is there any method to convert a trained SFT model in bf16 to mxfp4, or to directly train using native mxfp4?

kahlun pushed a commit to DataInsightAutomation/LLaMA-Factory that referenced this pull request Aug 8, 2025
@bobzhang208
Copy link

bobzhang208 commented Aug 9, 2025

微调gpt-oss-20b显存占用大概是什么样子呢,在A100上占用大概为50,单是为什么4*3090 24G会oom

@WeiminWu2000
Copy link

微调gpt-oss-20b看起来好像不支持 linger-kernel,input context length比较长,内存会爆掉,请问有解决办法来支持linger-kernel吗

@dragon18456 dragon18456 mentioned this pull request Aug 21, 2025
1 task
@Imbernoulli
Copy link

请问gpt-oss-120b全参数微调最少要多少卡?

@Lei-Tin
Copy link

Lei-Tin commented Sep 3, 2025

Does this allow us to train with the reasoning content within the prompt? Or do we have to process the prompt to allow the gpt template to take in reasoning content as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.