Skip to content

PeftModel.from_pretrained加载权重后的dtype无法和ckpt保持一致 #2006

Closed
@xing-yiren

Description

@xing-yiren

Describe the bug/ 问题描述 (Mandatory / 必填)
A clear and concise description of what the bug is.

mindnlp在通过PeftModel.from_pretrained加载LoRA adapter权重时,权重事先以fp32保存,但加载后显示权重的dtype为fp16

  • ckpt文件的dtype
    Image

  • 加载权重后dtype

Image

相对比下torch同代码,加载后的权重dtype仍为fp32

  • safetensors文件的type

Image

  • 加载权重后dtype

Image

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) : 2.5.0
    -- Python version (e.g., Python 3.7.5) : 3.9
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
    -- GCC/Compiler version (if compiled from source):

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative

To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:

import mindspore
import mindnlp
from mindnlp.transformers import AutoModelForCausalLM, AutoTokenizer
from mindnlp.engine import TrainingArguments, Trainer
from mindnlp.dataset import load_dataset, BaseMapFunction
from mindspore import load_checkpoint, Tensor

from mindnlp.transformers import GenerationConfig

import troubleshooter as ts

import numpy as np

checkpoint_save_dir = "xxx"

# MindSpore base model

ms_base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", ms_dtype=mindspore.float16)
ms_base_model.generation_config = GenerationConfig.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
ms_base_model.generation_config.pad_token_id = ms_base_model.generation_config.eos_token_id

# MindSpore LoRA Adapter
from mindnlp.peft import LoraConfig, TaskType, get_peft_model, PeftModel

ms_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False, # 训练模式
    r=8, # Lora 秩
    lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理
    lora_dropout=0.0# Dropout 比例
)

# 检查LoRA权重的参数dtype
adapter_model_dir = os.path.join(checkpoint_save_dir, "lora_init_checkpoint")
ms_adapter_model_path = os.path.join(adapter_model_dir, "adapter_model.ckpt")
ms_param_dict = mindspore.load_checkpoint(ms_adapter_model_path)
print("-"*20, "Check ckpt dtype", "-"*20)
for key, value in ms_param_dict.items():
    print(f"{key} : {value.dtype}")

# 加载LoRA权重
ms_model = PeftModel.from_pretrained(ms_base_model, adapter_model_dir, is_trainable=True)


# 检查加载后的权重dtype
print("-"*20, "Check param dtype after loading ckpt", "-"*20)
for name, param in ms_model.parameters_dict().items():
    if "lora_" in name:
        print(f"{name} : {param.dtype}")

Expected behavior / 预期结果 (Mandatory / 必填)
A clear and concise description of what you expected to happen.

Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.

Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.

检查发现,是由于mindnlp在Module_load_from_state_dict中,assign parameters的dtype是原始模型的dtype,而由于base model实例化加载权重时是fp16,所以LoRA adapter也是转换成了fp16进行加载

Image

torch的是直接将参数复制

Image

所以把这里改为如下代码就好

Image

dtype = input_param.dtype

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions