Description
Describe the bug/ 问题描述 (Mandatory / 必填)
A clear and concise description of what the bug is.
mindnlp在通过PeftModel.from_pretrained
加载LoRA adapter权重时,权重事先以fp32保存,但加载后显示权重的dtype为fp16
相对比下torch同代码,加载后的权重dtype仍为fp32
- safetensors文件的type
- 加载权重后dtype
- Hardware Environment(
Ascend
/GPU
/CPU
) / 硬件环境:
Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU
-
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) : 2.5.0
-- Python version (e.g., Python 3.7.5) : 3.9
-- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
-- GCC/Compiler version (if compiled from source): -
Excute Mode / 执行模式 (Mandatory / 必填)(
PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:
import mindspore
import mindnlp
from mindnlp.transformers import AutoModelForCausalLM, AutoTokenizer
from mindnlp.engine import TrainingArguments, Trainer
from mindnlp.dataset import load_dataset, BaseMapFunction
from mindspore import load_checkpoint, Tensor
from mindnlp.transformers import GenerationConfig
import troubleshooter as ts
import numpy as np
checkpoint_save_dir = "xxx"
# MindSpore base model
ms_base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", ms_dtype=mindspore.float16)
ms_base_model.generation_config = GenerationConfig.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
ms_base_model.generation_config.pad_token_id = ms_base_model.generation_config.eos_token_id
# MindSpore LoRA Adapter
from mindnlp.peft import LoraConfig, TaskType, get_peft_model, PeftModel
ms_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
inference_mode=False, # 训练模式
r=8, # Lora 秩
lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理
lora_dropout=0.0# Dropout 比例
)
# 检查LoRA权重的参数dtype
adapter_model_dir = os.path.join(checkpoint_save_dir, "lora_init_checkpoint")
ms_adapter_model_path = os.path.join(adapter_model_dir, "adapter_model.ckpt")
ms_param_dict = mindspore.load_checkpoint(ms_adapter_model_path)
print("-"*20, "Check ckpt dtype", "-"*20)
for key, value in ms_param_dict.items():
print(f"{key} : {value.dtype}")
# 加载LoRA权重
ms_model = PeftModel.from_pretrained(ms_base_model, adapter_model_dir, is_trainable=True)
# 检查加载后的权重dtype
print("-"*20, "Check param dtype after loading ckpt", "-"*20)
for name, param in ms_model.parameters_dict().items():
if "lora_" in name:
print(f"{name} : {param.dtype}")
Expected behavior / 预期结果 (Mandatory / 必填)
A clear and concise description of what you expected to happen.
Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.
检查发现,是由于mindnlp在Module
的_load_from_state_dict
中,assign parameters的dtype是原始模型的dtype,而由于base model实例化加载权重时是fp16,所以LoRA adapter也是转换成了fp16进行加载
torch的是直接将参数复制
所以把这里改为如下代码就好
dtype = input_param.dtype