不得不说的Chat Format——大模型CPU部署系列03

作者: 引线小白-本文永久链接：httpss://www.limoncc.com/post/14476f03eace0cbc/
知识共享许可协议: 本博客采用署名-非商业-禁止演绎4.0国际许可证

摘要: 本文意在理清ggfu的原理和使用。若有错误，请大家指正。
关键词: Chat Format,大模型,聊天格式

一、引导

所谓Chat Format其实就是一种聊天格式，给模型能看的懂的语言格式。模型训练不可能像人对话一样，你说一句，它说一句。这bert时代就有诸如[CLS]、[SEP]、[PAD]之类的特殊token。到了大模型时代这个问题稍微复杂了一点。因为你不仅要标记句子的开始，结尾还要区分说话角色，谁说了哪些话。于是各种Chat Format就出现了。

一个简单想法就是这样处理

{一段角色定位的话}
User：
{一段用户的话}
Assistant:
{一段助理的话}

但是这样还是有点问题，如果角色说的话中包含一段对话，有俄罗斯套娃怎么办？还是要用一种特殊token才行。我们在本地部署模型的时候，尤其要注意模型的Chat Format格式。每家的模型是不一样的，如果弄错用户指令理解就会存在问题。

二、一些典型的Chat Format

2.1、ChatML(聊天标记语言)

最有名的应该是openai的ChatML(聊天标记语言)，英文全称Chat Markup Language。具体格式如下

<|im_start|>system
{一段角色定位的话}
<|im_end|>
<|im_start|>user
{一段用户的话}
<|im_end|>
<|im_start|>assistant
{一段助理的话}
<|im_end|>

2.2、Llama2 Format

llama2多对话格式

<s>[INST] <<SYS>>
{一段角色定位的话}
<</SYS>>
{一段用户的话}
[/INST]
{一段助理的话}
<\s>

多轮对话长这个样子

<s>[INST] <<SYS>>
{一段角色定位的话}
<</SYS>>
{一段用户的话}
[/INST]
{一段助理的话}
</s>
<s>[INST]
{一段用户的话}
[/INST]
{一段助理的话}
</s>

2.3、Alpaca Format

Llama2 Format和ChatML都略微复杂，Alpaca Format是一个比较简单，没有添加特殊token。具体如下

{一段角色定位的话}
\n
\n
### Instruction:
{一段用户的话}
### Response
{一段助理的话}
</s>

三、一些典型的Chat Format

3.1、魔改llama-cpp-python

这三个格式llama-cpp-python均有实现。但是总有一些模型采用了比较特殊格式。我们像要实现自定义就必须去改变llama-cpp-python的源码。有没有一种比较优雅的方式呢？有人提出一个方案 #875Integration of Jinja2 Templating，目前还没有接受。笔者这里也提出一个方案，方便魔改。首先说明一下版本

llama-cpp-python==0.2.15
openai==1.1.1
pip install pyautogen==0.2.0b3 #pyautogen 0.2即将来临

llama-cpp-pythonV0.2.15增加了seed参数。方便大家调试强烈建议升级到此版本。

修改llama-cpp-pythonV0.2.15的llama_cpp/llama_chat_format.py
添加如下两个函数

# 添加一个自定义chat_format
def _format_custom(
    system_message: str, messages: List[Tuple[str, Optional[str]]], sep: str, sep2: str
) -> str:
    """Format the prompt with the custom style."""
    seps = [sep, sep2]
    ret = system_message
    for role, message in messages:
        if message:
            ret += role + message + sep
        else:
            ret += role
    return ret


@register_chat_format("custom")
def format_custom(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_template = "{system_message}"
    _roles = dict(user="", assistant="")
    _messages = _map_roles(messages, _roles)
    system_message = _get_system_message(messages)
    if system_message:
        system_message = _system_template.format(system_message=system_message)
    _prompt = _format_custom(system_message, _messages, " ", "</s>")
    # _prompt = system_message + _messages
    return ChatFormatterResponse(prompt=_prompt)

这里我们去除了格式。我们仅在多轮对话的末尾增加了终止token。这样方便我们在输出时候魔改

{一段角色定位的话}
{一段用户的话}
{一段助理的话}
</s>
{一段用户的话}
{一段助理的话}

启动服务的时候，我们这样启动，即添加–chat_format custom

1 2	python -m llama_cpp.server --model ../models/ggml-model-Q6_K.gguf \ --n_gpu_layers 32 --n_ctx 2048 --chat_format custom

3.2、任意聊天格式实现

上文我们通过修改了llama-cpp-python服务，去掉了格式。这使得我们可以在调用的时候自定义chat Format。

我们先定义一下类来规范输入输出

from pathlib import Path
from typing import Literal, TypedDict, Optional, Union, List, Protocol

# 规范输入格式
class Chat_Completion_Request_System_Message(TypedDict):
    role: Literal["system"]
    content: Optional[str]


class Chat_Completion_Request_User_Message(TypedDict):
    role: Literal["user"]
    content: Optional[str]


class Chat_Completion_Request_Assistant_Message(TypedDict):
    role: Literal["assistant"]
    content: Optional[str]


Chat_Completion_Request_Message = Union[
    Chat_Completion_Request_System_Message,
    Chat_Completion_Request_User_Message,
    Chat_Completion_Request_Assistant_Message,
]

# 规范Chat_Template格式
class Role_Fix(TypedDict):
    prefix: str
    suffix: str


class Chat_Template(TypedDict):
    system: Role_Fix
    user: Role_Fix
    assistant: Role_Fix


# 定义一个接口
class LLM(Protocol):
    def __call__(self, messages: List[Chat_Completion_Request_Message], **kwargs):
        ...

接下来我们实现一个装饰器

import rtoml
class Chat_Format:
    @classmethod
    def build(cls, chat_template_path: Path):
        def decorator(func: LLM):
            def wrapper(*args, **kwargs):
                with chat_template_path.open("r") as f:
                    chat_template = rtoml.load(f)
                    ...
                if args:
                    # 这里可以修改参数
                    args = (cls.gen_fix_messages(args[0], chat_template),) + args[1:0]
                elif kwargs:
                    kwargs['messages'] = cls.gen_fix_messages(kwargs['messages'], chat_template)
                    ...
                
                func(*args, **kwargs)
            
            return wrapper
        
        return decorator
    
    @classmethod
    def add_fix(cls, item: Chat_Completion_Request_Message, chat_template: Chat_Template):
        item['content'] = f"""{chat_template[item['role']]['prefix']}{item['content']}{chat_template[item['role']]['suffix']}"""
        return item
    
    @classmethod
    def gen_fix_messages(cls, messages: List[Chat_Completion_Request_Message], chat_template: Chat_Template) \
            -> List[Chat_Completion_Request_Message]:
        fix_messages = []
        for item in messages:
            fix_messages.append(cls.add_fix(item, chat_template))
        return messages

下面我们来使用这个这个装饰器，注意我们使用的openai-v1.1

from openai import OpenAI

api_key = "NULL"
organization = "limoncc"
base_url = "http://127.0.0.1:8000/v1"

client = OpenAI(api_key=api_key, organization=organization, base_url=base_url)

@Chat_Format.build(Path("./openbuddy.toml"))
def llm(messages: List[Chat_Completion_Request_Message], temperature: int = 0):
    print("test")
    stream = client.chat.completions.create(
        model="mistral-7b",
        messages=messages,  # type: ignore
        temperature=temperature,
        n=1,
        top_p=1.0,
        presence_penalty=1.1,
        stop=["</s>"],
        max_tokens=3024,
        seed=101,
        stream=True
    )
    
    for part in stream:
        print(part.choices[0].delta.content or "", end='')  # type: ignore

我们的toml模版长这样

[system]
prefix = ""
suffix = "\n\n"

[user]
prefix = "### Instruction:"
suffix = ""

[assistant]
prefix = "### Response"
suffix = "</s>"

最后我们来尝试一下，能否正确遵循指令，对于开源模型最好是多举例子。描述详细。

inst = f"""你是一个翻译专家，你会翻译用户的输入为英文，不要回答多余的话。翻译完后，不要做多余的输出。
下面是一些例子：

输入: 你好。
英文: hello.
输入: 大模型将如何影响未来工作？
英文: How will big models affect the future of work?
输入: 在街道上一个兔子起自行车.
英文: In the street a rabbit picked up a bicycle.

"""
msg = [
    {"role": "user", "content": inst+"输入: 我想提高自己的技能水平，这样能提高自己未来的工资。"}
]
llm(msg)
# 英文: I want to improve my skills so that I can increase my future salary.

如果你对大模型应用有兴趣，欢迎加入AutogenQQ交流群：593623958

版权声明
	由引线小白创作并维护的柠檬CC博客采用署名-非商业-禁止演绎4.0国际许可证。本文首发于柠檬CC [ https://www.limoncc.com ] , 版权所有、侵权必究。
本文永久链接	httpss://www.limoncc.com/post/14476f03eace0cbc/

如果您需要引用本文，请参考：

引线小白. (Oct. 25, 2023). 《不得不说的Chat Format——大模型CPU部署系列03》[Blog post]. Retrieved from https://www.limoncc.com/post/14476f03eace0cbc

@online{limoncc-14476f03eace0cbc,
title={不得不说的Chat Format——大模型CPU部署系列03},
author={引线小白},
year={2023},
month={Oct},
date={25},
url={\url{https://www.limoncc.com/post/14476f03eace0cbc}},
}