Skip to content

csm : implement Sesame-based conversation example #12392

Closed
@ggerganov

Description

@ggerganov

With the first Sesame CSM model openly available, we should implement a local example similar to their online research demo. It seems that the released CSM model uses Kyutai's Mimi audio codec which we have to implement in a similar way as we did with the WavTokenizer. Next we can modify the talk-llama example to support audio generation with the CSM. This way we will be able to plug any LLM for the text response generation and use Sesame for speech input/output.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions