Skip to content

Conversation

@muyouzhi6
Copy link

@muyouzhi6 muyouzhi6 commented Dec 27, 2025

修复cli2api gemini3系列模型异常输出思维链的bug

Modifications / 改动点

修改 gemini_source.py 中的 _query_stream 方法,遍历 chunk.candidates[0].content.parts 并过滤 thought=True 的 parts,而不是直接使用 chunk.text。
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

修改前:
PixPin_2025-12-27_22-15-51
修改后:
PixPin_2025-12-27_22-14-00


Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

cli2api 中处理 Gemini 流式响应时,将内部推理内容与用户可见输出分离。

Bug 修复:

  • 防止 Gemini 3 系列模型在 cli2api 的流式用户可见输出中泄露思考/推理内容。

增强功能:

  • 规范化对流式响应中 Gemini 内容片段的处理,将推理内容与最终答案文本分别进行聚合。
Original summary in English

Summary by Sourcery

Handle Gemini streaming responses by separating internal reasoning content from user-visible output in cli2api.

Bug Fixes:

  • Prevent Gemini 3 series models from leaking thought/reasoning content into the streamed user-visible output in cli2api.

Enhancements:

  • Normalize processing of Gemini content parts in streaming responses by aggregating reasoning and final answer text separately.

Summary by Sourcery

防止 Gemini 流式响应在 cli2api 输出中暴露内部推理内容,同时保留正常的用户可见文本。

错误修复:

  • 在内容处理和流式响应中忽略被标记为 thought 的 Gemini 内容部分,避免将思维链(chain-of-thought)泄露到用户可见的输出中。

功能增强:

  • 改进 Gemini 流式处理逻辑,通过从内容部分中分别聚合推理文本和最终答案文本;当内容部分缺失时,回退到使用 chunk.text
Original summary in English

Summary by Sourcery

Prevent Gemini streaming responses from exposing internal reasoning content in cli2api outputs while preserving normal user-visible text.

Bug Fixes:

  • Ignore Gemini content parts marked as thought in both content processing and streaming responses to avoid leaking chain-of-thought into user-visible output.

Enhancements:

  • Improve Gemini streaming handling by separately aggregating reasoning and final answer text from content parts, with a fallback to chunk.text when parts are absent.

Summary by Sourcery

在 cli2api 中处理 Gemini 的流式响应,在保留用户可见文本的同时避免泄露内部推理内容。

Bug Fixes:

  • 通过忽略在流式响应和最终响应中被标记为 thought 的部分,防止 Gemini 3 系列的流式响应在用户可见输出中包含链式思维(chain-of-thought)推理内容。

Enhancements:

  • 引入结构化的 ChunkView 辅助工具和 split_chunk_content 实用函数,以防御性方式解析 Gemini 的流式数据块,将推理文本与可见文本分离;当相关部分缺失时提供回退行为。
Original summary in English

Summary by Sourcery

Handle Gemini streaming responses in cli2api to avoid leaking internal reasoning content while preserving user-visible text.

Bug Fixes:

  • Prevent Gemini 3 series streaming responses from including chain-of-thought reasoning content in user-visible output by ignoring parts marked as thought in both streaming and final responses.

Enhancements:

  • Introduce a structured ChunkView helper and split_chunk_content utility to defensively parse Gemini streaming chunks and separate reasoning text from visible text, with fallback behavior when parts are absent.

@auto-assign auto-assign bot requested review from Fridemn and anka-afk December 27, 2025 16:59
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Dec 27, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 3 个问题,并给出了一些总体层面的反馈:

  • 目前推理内容的抽取逻辑同时实现于 _split_chunk_content_process_content_parts 中;建议在 _process_content_parts 中复用 ChunkView.reasoning_text(或一个共享的辅助函数),以避免流式与非流式路径之间的行为发生偏差。
  • _query_stream 中,当 chunk_view.parts is None 时,这些 chunk 会被完全跳过,但其实 _split_chunk_content 仍然可以从 chunk.text 中推导出 visible_text;建议将提前返回的判断条件从 parts is None 改为基于 visible_text/has_function_call,以避免丢弃有效的纯文本 chunk。
给 AI Agent 的提示词
请根据以下代码评审评论进行修改:

## 总体评论
- 目前推理内容的抽取逻辑同时实现于 `_split_chunk_content``_process_content_parts` 中;建议在 `_process_content_parts` 中复用 `ChunkView.reasoning_text`(或一个共享的辅助函数),以避免流式与非流式路径之间的行为发生偏差。
-`_query_stream` 中,当 `chunk_view.parts is None` 时,这些 chunk 会被完全跳过,但其实 `_split_chunk_content` 仍然可以从 `chunk.text` 中推导出 `visible_text`;建议将提前返回的判断条件从 `parts is None` 改为基于 `visible_text`/`has_function_call`,以避免丢弃有效的纯文本 chunk。

## 逐条评论

### Comment 1
<location> `astrbot/core/provider/sources/gemini_source.py:389-398` </location>
<code_context>
+    def _split_chunk_content(
</code_context>

<issue_to_address>
**suggestion:** 在更多位置复用 ChunkView,避免重复实现推理/文本抽取逻辑。

`_split_chunk_content` 已经集中处理了 `reasoning_text``visible_text` 的拆分逻辑,包括对 `thought``text` 以及 `function_call` 的防御性处理。在 `_process_content_parts` 中,你又基于 `result_parts` 重新实现了部分逻辑。如果让 `_process_content_parts` 接受一个 `ChunkView`(或者使用一个小的辅助函数,接收 `parts` 并返回 `(reasoning_text, visible_text)`),就可以把这段逻辑集中到一个地方,同时保持流式与非流式路径的行为一致。
</issue_to_address>

### Comment 2
<location> `astrbot/core/provider/sources/gemini_source.py:386-395` </location>
<code_context>
+                finish_reason=None,
+            )

-        thought_buf: list[str] = [
-            (p.text or "") for p in candidate.content.parts if p.thought
-        ]
</code_context>

<issue_to_address>
**suggestion:** 使用一个共享的辅助函数来抽取推理文本,而不是在这里内联列表推导逻辑。

这里的推理抽取逻辑与 `_split_chunk_content` 中的逻辑重复(在 `thought` 为 true 时收集 `p.text`)。为了保持行为一致并降低维护成本,可以在这里复用 `_split_chunk_content`/`ChunkView`,或者抽出一个类似 `_extract_reasoning_from_parts(parts)` 的辅助函数,然后在流式路径和 `_process_content_parts` 中都使用它。
</issue_to_address>

### Comment 3
<location> `astrbot/core/provider/sources/gemini_source.py:36` </location>
<code_context>
 logging.getLogger("google_genai.types").addFilter(SuppressNonTextPartsWarning())


+@dataclass
+class ChunkView:
+    """流式响应 chunk 的结构化视图对象
</code_context>

<issue_to_address>
**issue (complexity):** 建议将新的 chunk 帮助类收窄为只承载结构化数据,并集中推理抽取逻辑,以减少可选状态数量,同时让文本/思考的处理逻辑更靠近其使用位置。

你可以在保留当前所有新行为的前提下,通过扁平化流程并移除大部分可选字段来改进结构:

1. 将“视图”辅助类收窄为纯结构(不包含推理/可见文本)。
2. 在两处复用一个统一的“推理抽取”辅助函数。
3. 将文本/思考的拆分逻辑保留在 `_query_stream` 中,靠近真正使用它们的地方。

### 1. 用更精简的结构化辅助类替换 `ChunkView`

与其让 `ChunkView` 承载多个可选字段、混合多种职责,不如保留一个只做最小防御性检查和结构提取的小辅助类:

```python
@dataclass
class ChunkBasics:
    candidate: types.Candidate
    parts: list[types.Part]
    finish_reason: types.FinishReason | None
    has_function_call: bool


def _get_chunk_basics(
    self, chunk: types.GenerateContentResponse
) -> ChunkBasics | None:
    if not chunk.candidates:
        logger.warning(f"收到的 chunk 中 candidates 为空: {chunk}")
        return None

    candidate = chunk.candidates[0]
    content = candidate.content
    if not content or not content.parts:
        logger.warning(f"收到的 chunk 中 content 为空: {chunk}")
        return None

    parts = content.parts
    has_function_call = any(
        part.function_call for part in parts if hasattr(part, "function_call")
    )

    return ChunkBasics(
        candidate=candidate,
        parts=parts,
        finish_reason=getattr(candidate, "finish_reason", None),
        has_function_call=has_function_call,
    )
```

这样可以从“视图层”中移除 `reasoning_text``visible_text` 和大多数 `getattr` 调用,将状态压缩为两种:`None` 或一个完全可用的对象。

### 2. 将基于 parts 的推理抽取逻辑集中起来

目前你在两个地方计算推理内容,并且逻辑略有不同。可以把这一逻辑集中到一个辅助函数中,让 `_query_stream``_process_content_parts` 共同使用:

```python
def _extract_reasoning_from_parts(self, parts: list[types.Part]) -> str:
    thought_buf: list[str] = [
        (p.text or "") for p in parts if getattr(p, "thought", False)
    ]
    return "".join(thought_buf).strip()
```

然后在 `_process_content_parts` 中:

```python
reasoning = self._extract_reasoning_from_parts(result_parts)
if reasoning:
    llm_response.reasoning_content = reasoning

for part in result_parts:
    if getattr(part, "thought", False):
        continue
    ...
```

### 3. 将推理/可见文本的拆分逻辑局部化在 `_query_stream`

有了 `ChunkBasics``_extract_reasoning_from_parts``_query_stream` 就可以在不依赖 `ChunkView.reasoning_text` / `.visible_text` 的情况下变得更简单、更清晰:

```python
async for chunk in result:
    llm_response = LLMResponse("assistant", is_chunk=True)

    basics = self._get_chunk_basics(chunk)
    if basics is None:
        continue

    if basics.has_function_call:
        llm_response = LLMResponse("assistant", is_chunk=False)
        llm_response.raw_completion = chunk
        llm_response.result_chain = self._process_content_parts(
            basics.candidate, llm_response
        )
        llm_response.id = chunk.response_id
        if chunk.usage_metadata:
            llm_response.usage = self._extract_usage(chunk.usage_metadata)
        yield llm_response
        return

    has_content = False

    # reasoning
    reasoning = self._extract_reasoning_from_parts(basics.parts)
    if reasoning:
        has_content = True
        accumulated_reasoning += reasoning
        llm_response.reasoning_content = reasoning

    # visible text (非 thought 的 text)
    visible_text_parts = [
        p.text or ""
        for p in basics.parts
        if not getattr(p, "thought", False) and getattr(p, "text", None)
    ]
    visible_text = "".join(visible_text_parts).strip()
    if not visible_text and getattr(chunk, "text", None):
        visible_text = chunk.text  # 保留对 chunk.text 的回退逻辑

    if visible_text:
        has_content = True
        accumulated_text += visible_text
        llm_response.result_chain = MessageChain(
            chain=[Comp.Plain(visible_text)]
        )

    if has_content:
        yield llm_response

    if basics.finish_reason:
        if basics.parts:
            final_response = LLMResponse("assistant", is_chunk=False)
            final_response.raw_completion = chunk
            final_response.result_chain = self._process_content_parts(
                basics.candidate, final_response
            )
            ...
        break
```

这样可以:

- 保持相同的函数调用检测行为;
- 保持推理/可见文本的分离(包括 `chunk.text` 的回退逻辑);
- 为“thought 与 visible”语义提供唯一可信来源。

但同时:

- 移除了过于通用的 `ChunkView`(需要跟踪的可选状态更少);
- 将展示层面的决策局部化到 `_query_stream` / `_process_content_parts`- 减少了 `getattr` 噪音,让代码更加贴近底层的 Gemini 类型。
</issue_to_address>

Sourcery 对开源项目免费 —— 如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点击 👍 或 👎,我会根据这些反馈改进后续评审。
Original comment in English

Hey - I've found 3 issues, and left some high level feedback:

  • The reasoning extraction logic is now implemented both in _split_chunk_content and _process_content_parts; consider reusing ChunkView.reasoning_text (or a shared helper) in _process_content_parts to avoid divergence between streaming and non‑streaming paths.
  • In _query_stream, chunks with chunk_view.parts is None are skipped entirely even though _split_chunk_content can still derive visible_text from chunk.text; consider basing the early‑return check on visible_text/has_function_call instead of parts is None to avoid dropping valid text-only chunks.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The reasoning extraction logic is now implemented both in `_split_chunk_content` and `_process_content_parts`; consider reusing `ChunkView.reasoning_text` (or a shared helper) in `_process_content_parts` to avoid divergence between streaming and non‑streaming paths.
- In `_query_stream`, chunks with `chunk_view.parts is None` are skipped entirely even though `_split_chunk_content` can still derive `visible_text` from `chunk.text`; consider basing the early‑return check on `visible_text`/`has_function_call` instead of `parts is None` to avoid dropping valid text-only chunks.

## Individual Comments

### Comment 1
<location> `astrbot/core/provider/sources/gemini_source.py:389-398` </location>
<code_context>
+    def _split_chunk_content(
</code_context>

<issue_to_address>
**suggestion:** Reuse ChunkView in more places to avoid duplicating reasoning/text extraction logic.

`_split_chunk_content` already centralizes the logic for separating `reasoning_text` and `visible_text`, including the defensive handling of `thought`, `text`, and `function_call`. In `_process_content_parts` you reimplement part of this against `result_parts`. If `_process_content_parts` accepted a `ChunkView` (or used a small helper that takes `parts` and returns `(reasoning_text, visible_text)`), this logic could live in one place and keep streaming and non-streaming behavior consistent.
</issue_to_address>

### Comment 2
<location> `astrbot/core/provider/sources/gemini_source.py:386-395` </location>
<code_context>
+                finish_reason=None,
+            )

-        thought_buf: list[str] = [
-            (p.text or "") for p in candidate.content.parts if p.thought
-        ]
</code_context>

<issue_to_address>
**suggestion:** Extract reasoning text using a shared helper instead of inlining list comprehension logic here.

This reasoning extraction duplicates the logic in `_split_chunk_content` (collecting `p.text` when `thought` is true). To keep behavior consistent and easier to maintain, either reuse `_split_chunk_content`/`ChunkView` here or introduce a helper like `_extract_reasoning_from_parts(parts)` and call it both in the streaming path and `_process_content_parts`.
</issue_to_address>

### Comment 3
<location> `astrbot/core/provider/sources/gemini_source.py:36` </location>
<code_context>
 logging.getLogger("google_genai.types").addFilter(SuppressNonTextPartsWarning())


+@dataclass
+class ChunkView:
+    """流式响应 chunk 的结构化视图对象
</code_context>

<issue_to_address>
**issue (complexity):** Consider narrowing the new chunk helper to just structural data and centralizing reasoning extraction to reduce optional state and keep the text/thought logic local to where it’s used.

You can keep all the new behavior while flattening the flow and removing most of the optional fields by:

1. Narrowing the “view” helper to pure structure (no reasoning/visible text).
2. Reusing a single “reasoning extraction” helper in both places.
3. Keeping the text/thought split in `_query_stream`, close to where it is used.

### 1. Replace `ChunkView` with a narrower structural helper

Instead of a broad `ChunkView` carrying multiple optional fields and mixing concerns, keep a small helper that only does minimal defensive checks and structure extraction:

```python
@dataclass
class ChunkBasics:
    candidate: types.Candidate
    parts: list[types.Part]
    finish_reason: types.FinishReason | None
    has_function_call: bool


def _get_chunk_basics(
    self, chunk: types.GenerateContentResponse
) -> ChunkBasics | None:
    if not chunk.candidates:
        logger.warning(f"收到的 chunk 中 candidates 为空: {chunk}")
        return None

    candidate = chunk.candidates[0]
    content = candidate.content
    if not content or not content.parts:
        logger.warning(f"收到的 chunk 中 content 为空: {chunk}")
        return None

    parts = content.parts
    has_function_call = any(
        part.function_call for part in parts if hasattr(part, "function_call")
    )

    return ChunkBasics(
        candidate=candidate,
        parts=parts,
        finish_reason=getattr(candidate, "finish_reason", None),
        has_function_call=has_function_call,
    )
```

This drops `reasoning_text`, `visible_text` and most `getattr` usage from the “view” layer and keeps the number of possible states small (`None` vs a fully usable object).

### 2. Centralize reasoning extraction on parts

You currently compute reasoning twice with slightly different logic. You can centralize this into a helper that both `_query_stream` and `_process_content_parts` use:

```python
def _extract_reasoning_from_parts(self, parts: list[types.Part]) -> str:
    thought_buf: list[str] = [
        (p.text or "") for p in parts if getattr(p, "thought", False)
    ]
    return "".join(thought_buf).strip()
```

Then in `_process_content_parts`:

```python
reasoning = self._extract_reasoning_from_parts(result_parts)
if reasoning:
    llm_response.reasoning_content = reasoning

for part in result_parts:
    if getattr(part, "thought", False):
        continue
    ...
```

### 3. Keep reasoning/visible text split local in `_query_stream`

With `ChunkBasics` and `_extract_reasoning_from_parts`, `_query_stream` can be simplified and made more explicit, without `ChunkView.reasoning_text` / `.visible_text`:

```python
async for chunk in result:
    llm_response = LLMResponse("assistant", is_chunk=True)

    basics = self._get_chunk_basics(chunk)
    if basics is None:
        continue

    if basics.has_function_call:
        llm_response = LLMResponse("assistant", is_chunk=False)
        llm_response.raw_completion = chunk
        llm_response.result_chain = self._process_content_parts(
            basics.candidate, llm_response
        )
        llm_response.id = chunk.response_id
        if chunk.usage_metadata:
            llm_response.usage = self._extract_usage(chunk.usage_metadata)
        yield llm_response
        return

    has_content = False

    # reasoning
    reasoning = self._extract_reasoning_from_parts(basics.parts)
    if reasoning:
        has_content = True
        accumulated_reasoning += reasoning
        llm_response.reasoning_content = reasoning

    # visible text (非 thought 的 text)
    visible_text_parts = [
        p.text or ""
        for p in basics.parts
        if not getattr(p, "thought", False) and getattr(p, "text", None)
    ]
    visible_text = "".join(visible_text_parts).strip()
    if not visible_text and getattr(chunk, "text", None):
        visible_text = chunk.text  # 保留对 chunk.text 的回退逻辑

    if visible_text:
        has_content = True
        accumulated_text += visible_text
        llm_response.result_chain = MessageChain(
            chain=[Comp.Plain(visible_text)]
        )

    if has_content:
        yield llm_response

    if basics.finish_reason:
        if basics.parts:
            final_response = LLMResponse("assistant", is_chunk=False)
            final_response.raw_completion = chunk
            final_response.result_chain = self._process_content_parts(
                basics.candidate, final_response
            )
            ...
        break
```

This keeps:

- The same function-call detection behavior.
- The reasoning/visible-text separation (including `chunk.text` fallback).
- A single source of truth for “thought vs visible” semantics.

But it:

- Removes the over-general `ChunkView` (less optional state to track).
- Localizes presentation decisions to `_query_stream` / `_process_content_parts`.
- Reduces `getattr` noise and brings code closer to the underlying Gemini types.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Soulter
Copy link
Member

Soulter commented Dec 29, 2025

我觉得可以把 thought=true 的内容放到 llm_response 中的 reasoning_content 里面?

@muyouzhi6
Copy link
Author

我觉得可以把 thought=true 的内容放到 llm_response 中的 reasoning_content 里面?

是的,已经是这样实现的。

@Soulter
Copy link
Member

Soulter commented Dec 29, 2025

我理解应该直接在 _process_content_parts 的 for 里面跳过 part.text 不为空且 part.thought 为 true 的 part 就行了,感觉不需要这么麻烦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants