Skip to content

v0.3.02

Latest
Compare
Choose a tag to compare
@xming521 xming521 released this 17 Aug 07:26
· 2 commits to master since this release
a96996f

🎉 What's Changed

Enable configurable thinking in offline cleaning, improve image and gif handling in QA processing, refactor configuration models for cleaner dataset naming, and bump versions and dependencies for release v0.3.02

New Features:

  • Introduce enable_thinking flag in LLMCleanConfig to control offline cleaning behavior
  • Supporting scoring and cleaning of datasets containing images (assigning the highest score to QA pairs that include images).

Enhancements:

  • Refactor cleaned_dataset_name to derive dynamically from original dataset
  • Pass enable_thinking through vLLM inference pipeline and adjust repetition_penalty and max_new_tokens accordingly
  • Implement CommonMethods to parse dataset names with modality-based suffixes and remove deprecated config fields

Build:

  • Bump project version to 0.3.02 and config_version to 0.3.02
  • Update dependencies: openai to 1.87.0, vllm to 0.10.0, torch to 2.7.1, add torchvision, transformers to 4.53.2, and triton to 3.3.1

Full Changelog: v0.3.01...v0.3.02

😊 更新内容

在离线清理中启用可配置的“思考”功能,改进问答处理中的图像和 GIF 处理,重构配置模型以实现更清晰的数据集命名,并为发布 v0.3.02 提升版本和依赖项。

新功能:

  • 引入 enable_thinking 以控制离线清理行为
  • 支持对含有图片的数据集打分清洗(含有图片的qa对赋值最高分)

改进:

  • 重构 cleaned_dataset_name 以从原始数据集动态派生
  • enable_thinking 传递给 vLLM 推理管道,并相应调整 repetition_penaltymax_new_tokens
  • 实现 CommonMethods 以解析带有模态后缀的数据集名称,并移除已弃用的配置字段

构建:

  • 将项目版本提升至 0.3.02,配置版本提升至 0.3.02
  • 更新依赖项:openai 至 1.87.0,vllm 至 0.10.0,torch 至 2.7.1,添加 torchvisiontransformers 至 4.53.2,以及 triton 至 3.3.1

CI:

  • pre-commit-hooks 升级至 v6.0.0,ruff 升级至 v0.12.8