Langchain (未允禁转)

2024-04-22 本文已影响0人 9_SooHyun

Langchain Overview

set this aside and go to the following chapters if you are not familiar with LangChain. Back here after you learn all the important concepts of Langchain.

Runnable interface

Langchain中，我们将一系列Runnable实现串联起来，就构成了Chain。那么，我们先来关注构成Chain的基本元素：Runnable interface

methods of Runnable

The core method of Runnable interface is invoke:

invoke/ainvoke: Transforms a single input into an output.

and other key methods are:

batch/abatch: Efficiently transforms multiple inputs into outputs.
stream/astream: Streams output from a single input as it's produced.
astream_log: Streams output and selected intermediate results from an input.

these other key methods are based on invoke/ainvoke

Runnables schematic information

Runnables expose schematic information about their input, output and config via the input_schema property, the output_schema property and config_schema method

每一个Runnable实现都具有特定的input schema/output schema，可以通过访问input_schema output_schema属性获得

需要注意的是，不同的Runnable实现具有不同的input schema/output schema，前一Runnable实现的output schema 必须匹配后一Runnable实现的input schema，才能顺利完成连接

Runnable sequences

One key advantage of the Runnable interface is that any two runnables can be “chained” together into sequences. The output of the previous runnable’s .invoke() call is passed as input to the next runnable. This can be done using the pipe operator (|), or the more explicit .pipe() method

例如，创建一个LLMChain类的实例 chain = prompt | llm | output_parser，里面的prompt``llm``output_parser都是Runnable实现

如果定义一个和prompt输出数据结构不同的Runnable实例，那么这个实例将无法与llm直接连接。为了让它们能够连接在一起，你需要确保这个新定义的Runnable 实例与llm之间的数据交换是可行的。这可能需要重新定义你的实例或在它们之间插入一个适配器来转换输入/输出数据结构，以便让这两个部分能够顺利地相互协作

Core Implementation of Runnable: LLMChain

LLMChain 是 LangChain 里面最核心、应用最基础最广泛的Runnable实现。上文其实已经提到，LLMChain有3个重要组成部分

prompt: BasePromptTemplate。是提供语言模型的指令，可以控制LLM的原始输出

大多数LLM应用不会直接将用户输入传递到LLM中。通常，它们会将用户输入添加到一个更大的文本片段中，称为提示模板 Prompt Template，该模板提供了描述特定任务的附加上下文

在LangChain中的相关组件主要有PromptTemplate
llm: Union[Runnable[LanguageModelInput, str], Runnable[LanguageModelInput, BaseMessage]]。语言模型是这条Chain里的核心推理引擎，是大脑

大模型在LangChain中分为2种，它们的输入输出不同
- LLM. string in and string out
- ChatModel. sequence ChatMessages in, one ChatMessage out
ChatModel抽象了Chat这一场景下LLM的使用模式，是LLM的再封装

LangChain为两者提供了一个标准接口，标准接口有两种方法:
- predict: 接受一个字符串，返回一个字符串
- predict_messages: 接受一个ChatMessage列表，返回一个ChatMessage
```
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

llm = OpenAI()
chat_model = ChatOpenAI()

llm.predict("hi!")
>>> "Hi"

chat_model.predict("hi!")
>>> "Hi"
```
output_parser: BaseLLMOutputParser。用于将LLM的原始输出转换为目标格式

OutputParsers将LLM的原始输出转换为可以在下游使用的格式。输出解析器有几种主要类型，包括:
- 将LLM的文本转换为结构化信息（例如JSON）
- 将ChatMessage转换为字符串
- 将除消息之外的其他信息（如OpenAI函数调用）转换为字符串
你可以自定义OutputParsers，让它和你的Prompt配合使用，或者满足其他需要

以下是一个完整的组合Prompt Template, Output Parser生成LLMChain的具体用例

from langchain_cohere import ChatCohere
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains.llm import LLMChain
def basic_chain_example():
    """
    basic_chain_example
    """

    print("-"*20 + "basic_chain_example start" + "-"*20)
    
    # generate a prompt
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are world class technical documentation writer."),
        ("user", "{input}")
    ])

    # get a llm
    llm = ChatCohere(cohere_api_key=load_config().cohere_api_key)

    # get a output_parser
    output_parser = StrOutputParser()

    # make a chain
    # chain = prompt | llm | output_parser
    chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

    # do a query
    resp = chain.invoke({"input": "how can langsmith help with testing?"})
    print(f"basic_chain_example resp is {resp}")
    print("-"*20 + "basic_chain_example end" + "-"*20)

chain = prompt | llm | output_parser和chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)这两种写法在功能上是相同的，但它们在语法上有所不同。

chain = prompt | llm | output_parser使用了管道（|）操作符。这种写法通常用于将多个操作连接在一起，使得前一个操作的输出成为下一个操作的输入。在这个例子中，prompt、llm和output_parser被连接在一起，形成一个chain对象。这种写法简洁且易于阅读，但需要在背后实现一些特殊的方法（如重写__or__方法）来支持管道操作符。例如，这里ChatPromptTemplate->BasePromptTemplate->RunnableSerializable内重写了__or__方法

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)这种写法更明确地表明了我们正在创建LLMChain类的一个实例，并且可以更容易地在调用时传递其他参数

显式调用类构造函数的写法更常见，推荐使用

Runnable framework overview

            Runnable interface
              /     |     \     \
            impl   impl   impl     Chain interface
            /       |       \       / (LLMChain as an impl)
      ------------------------------------
      |  prompt --> llm --> output_parser |      
      ------------------------------------

Agent

什么是Agent

我们经常说，AI Agent. 什么是Agent？

Agent 是一个具体类，其实例具有必须属性llm_chain: LLMChain、allowed_tools和output_parser，方法from_llm_and_tools可直观佐证。简单来说，Agent是对LLMChain的再封装，融合了Tools(Tool将在下文介绍)

Agent什么作用

calls the language model
and parse the language model's output to decide the next action.

Agent核心目标是使用LLM进行推理来选择要采取的下一动作

这里面的关键逻辑是:

基于【self.allowed_tools】和【self.llm_chain】之前产生的输出作为中间结果（如有） render a concrete prompt
call predict on llm_chain with the concrete prompt generated above
use output_parser to parse the llm_chain's output to get the next action

以上逻辑被封装到Agent的核心方法Agent.plan

class Agent(BaseSingleActionAgent):
    llm_chain: LLMChain
    output_parser: AgentOutputParser
    allowed_tools: Optional[List[str]] = None

    ...

    def plan(
        self,
        intermediate_steps: List[Tuple[AgentAction, str]],
        callbacks: Callbacks = None,
        **kwargs: Any,
    ) -> Union[AgentAction, AgentFinish]:
        """Given input, decided what to do.

        Args:
            intermediate_steps: Steps the LLM has taken to date,
                along with observations
            callbacks: Callbacks to run.
            **kwargs: User inputs.

        Returns:
            Action specifying what tool to use.
        """
        # make full_inputs for llm_chain
        full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
        # call `predict` on `llm_chain` with the concrete prompt
        full_output = self.llm_chain.predict(callbacks=callbacks, **full_inputs)
        # use `output_parser` to parse the `llm_chain`'s output, get -> (AgentAction | AgentFinish)
        return self.output_parser.parse(full_output)

我们看到，本质上，Agent对LLM的调用和一条普通的LLMChain调用在链路上没有差别，都是 生成具体prompt->call LLMChain->parse output from llm

那Agent为什么能够 decide the next action based on the output of LLMChain？很简单

对于开发者而言，LLM模型本身是不可修改的，直接调用。开发者能够控制的，只有prompt和output_parser。因此，Agent的关键设计其实就在于，提供了特定的prompt，融合了tools描述和回答格式，来引导LLM模型进行推理决策并让LLM模型进行格式化的输出；同时，配置了特定的output_parser，去parse LLM模型的格式化输出，得到python runtime的 AgentAction | AgentFinish 实例对象

例如，ZeroShotAgent默认PromptTemplate如下

from langchain_google_community import GoogleSearchAPIWrapper
from langchain_core.tools import Tool
from langchain.agents import ZeroShotAgent

search = GoogleSearchAPIWrapper(
        google_api_key="xxx", google_cse_id="yyy")
google_tool = Tool(name="google search",
                    description="For any questions, you must use this tool to search Google for helpful results", func=search.run)

prompt = ZeroShotAgent.create_prompt(
    tools=[google_tool],
    input_variables=["input", "agent_scratchpad"]
)
print(f"prompt.template is:\n\n{prompt.template}")

output:

prompt.template is:

Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

google search: For any questions, you must use this tool to search Google for helpful results

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question



Begin! 

Question: {input}
Thought:{agent_scratchpad}

可以看到，prompt.template给出了tools的描述，并且要求llm按照Thought/Action/Action Input/Observation的格式进行输出

Tool

在Langchain中，一个Tool定义了一个工具。根据开发者的需要，工具可以做任何事情，如db查询工具、搜索引擎工具、分词工具、情感计算工具等等

如何定义一个Tool

接受一个name参数，作为Tool的ID
接受一个description参数，用于llm判断在特定任务下是否选择该Tool
接受一个func: Callable[..., str]参数，这个func定义了这个Tool的功能

# google_tool 是一个google搜索工具
search = GoogleSearchAPIWrapper(
        google_api_key="xxx", google_cse_id="yyy")
google_tool = Tool(name="google search",
                   description="For any questions, you must use this tool to search Google for helpful results",
                   func=search.run)  # func: Callable[..., str]

Tool是如何被LLM选择的呢？实际上，Agent的Tools会被render成plain txt，然后填充到在Agent.llm的prompt上的，如ZeroShotAgent.create_prompt方法（上文提到过）

class ZeroShotAgent(Agent):
    ...

    @classmethod
    def create_prompt(
        cls,
        tools: Sequence[BaseTool],
        prefix: str = PREFIX,
        suffix: str = SUFFIX,
        format_instructions: str = FORMAT_INSTRUCTIONS,
        input_variables: Optional[List[str]] = None,
    ) -> PromptTemplate:
        """Create prompt in the style of the zero shot agent.

        Args:
            tools: List of tools the agent will have access to, used to format the
                prompt.
            prefix: String to put before the list of tools.
            suffix: String to put after the list of tools.
            input_variables: List of input variables the final prompt will expect.

        Returns:
            A PromptTemplate with the template assembled from the pieces here.
        """
        tool_strings = render_text_description(list(tools))  # rendor tools to txt
        tool_names = ", ".join([tool.name for tool in tools])
        format_instructions = format_instructions.format(tool_names=tool_names)
        template = "\n\n".join([prefix, tool_strings, format_instructions, suffix])  # generate PromptTemplate
        if input_variables:
            return PromptTemplate(template=template, input_variables=input_variables)
        return PromptTemplate.from_template(template)

调用create_prompt我们可以得到融合了tools描述的prompt:

Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

# This line is noted by wxx, not part of prompt. Below is the render plain txt of tools, for the llm to comprehend and choose
google search: For any questions, you must use this tool to search Google for helpful results
langsmith_search: Search for information about LangSmith. For any questions about LangSmith, you must use this tool!

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google search, langsmith_search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question



Begin! 

Question: {input}
Thought:{agent_scratchpad}

AgentExecutor

AgentExecutor是Chain接口的一个具体实现，核心数据结构如下

class AgentExecutor(Chain):
    """Agent that is using tools."""

    agent: Union[BaseSingleActionAgent, BaseMultiActionAgent]
    """The agent to run for creating a plan and determining actions
    to take at each step of the execution loop."""
    tools: Sequence[BaseTool]
    """The valid tools the agent can call."""

有了Agent，为什么还需要AgentExecutor？因为Agent仅仅负责决策（使用LLM决策下一动作），而不负责具体执行，而AgentExecutor则负责听从Agent的决策从而执行具体的tool

因此我们可以看到，AgentExecutor持有agent: Union[BaseSingleActionAgent, BaseMultiActionAgent] tools: Sequence[BaseTool]的引用，AgentExecutor从agent中获得”指令“，执行目标tool

具体地，The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions that agent chooses, passes the action-related tool's outputs back to the agent, and repeats. In pseudocode, this looks roughly like:

# logic of agent executor
next_action = agent.get_action(...)  # agent 只持有 tools render txt，render txt被融合到prompt供llm判断next_action
while next_action != AgentFinish:
    observation = run(next_action)  # agent executor 持有 tools 本体，可执行目标tool
    next_action = agent.get_action(..., next_action, observation)
return next_action

While this may seem simple, there are several complexities this runtime handles for you, including:

Handling cases where the agent selects a non-existent tool
Handling cases where the tool errors
Handling cases where the agent produces output that cannot be parsed into a tool invocation
Logging and observability at all levels (agent decisions, tool calls) to stdout and/or to LangSmith

AgentExecutor实现Agent runtime的核心方法:

class AgentExecutor(Chain):
    ...

    def _call(
        self,
        inputs: Dict[str, str],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        """Run text through and get agent response."""
        # Construct a mapping of tool name to tool for easy lookup
        name_to_tool_map = {tool.name: tool for tool in self.tools}
        # We construct a mapping from each tool to a color, used for logging.
        color_mapping = get_color_mapping(
            [tool.name for tool in self.tools], excluded_colors=["green", "red"]
        )
        intermediate_steps: List[Tuple[AgentAction, str]] = []
        # Let's start tracking the number of iterations and time elapsed
        iterations = 0
        time_elapsed = 0.0
        start_time = time.time()
        # We now enter the agent loop (until it returns something).
        while self._should_continue(iterations, time_elapsed):
            next_step_output = self._take_next_step(
                name_to_tool_map,
                color_mapping,
                inputs,
                intermediate_steps,
                run_manager=run_manager,
            )
            if isinstance(next_step_output, AgentFinish):
                return self._return(
                    next_step_output, intermediate_steps, run_manager=run_manager
                )

            intermediate_steps.extend(next_step_output)
            if len(next_step_output) == 1:
                next_step_action = next_step_output[0]
                # See if tool should return directly
                tool_return = self._get_tool_return(next_step_action)
                if tool_return is not None:
                    return self._return(
                        tool_return, intermediate_steps, run_manager=run_manager
                    )
            iterations += 1
            time_elapsed = time.time() - start_time
        output = self.agent.return_stopped_response(
            self.early_stopping_method, intermediate_steps, **inputs
        )
        return self._return(output, intermediate_steps, run_manager=run_manager)

LangChain中，AgentExecutor是如何调用Tool的

在Langchain中，Agent选择Next Tool，AgentExecutor实际调用Tool

Agent负责生成一个包含动作和动作输入的输出，具体地，通过Agent.plan()返回一个Union[AgentAction, AgentFinish]类型的指令，然后AgentExecutor根据这个指令来调用相应的Tool：

首先，AgentExecutor将用户的输入传递给Agent。通过调用agent_executor.run("your question?")实现。

Agent调用plan()方法通过绑定的LLMChain生成一个包含动作和动作输入的输出Union[AgentAction, AgentFinish]。具体来说，LLMChain将用户的输入(input)和Agent的内部状态（agent_scratchpad，它保存的是Agent(即LLMChain+Tool)的历史输出）传递给LLM模型，如：

{
    'input': 'How many people live in Canada as of 2023?',
    'agent_scratchpad': 'Thought: Arrrr, time t\' find out how many landlubbers be livin\' up in Canada, arrr!\nAction: google search\nAction Input: canada population 2023\nObservation:\nObservation: Canada ranks 37th by population among countries of the world, comprising about 0.5% of the world\'s total, with more than 40.7 million Canadians. London is a city in southwestern Ontario, Canada, along the Quebec City–Windsor Corridor. The city had a population of 422,324 according to the 2021\xa0... Quebec City officially Québec is the capital city of the Canadian province of Quebec. As of July 2021, the city had a population of 549,459,\xa0... This is a list of countries and dependencies by population. It includes sovereign states, inhabited dependent territories and, in some cases,\xa0... British Columbia is a Canadian province with a population of about 5.6 million people. The province represents about 13.2% of the population of the Canadian\xa0... ... population is shrinking (US Census Bureau, 2018). This trend has been observed in other White-majority countries including Canada (Statistics Canada, 2017)\xa0... In the 2021 Canadian census conducted by Statistics Canada, Vancouver had a population ... Observer, and ... ^ "Top Public Universities in Canada 2023 [uniRank]". Victoria Day is a federal Canadian public holiday observed on the last Monday preceding May 25 to honour Queen Victoria, who is known as the "Mother of\xa0... DST is observed in parts of this time zone. In Canada, the provinces of New Brunswick, Nova Scotia, and Prince Edward Island are in this zone\xa0... As of 2010, the Association of Southeast Asian Nations (ASEAN) has 10 member states, one candidate member state, and one observer state.\nThought:',
    'stop': ['\nObservation:', '\n\tObservation:']
}

然后模型再根据这些信息生成一个输出。如：

Thought: Arrrr, time t' find out how many landlubbers be livin' up in Canada, arrr!
Action: google search
Action Input: canada population 2023
Observation:

如果模型认为可以得到最终答案，那么这个输出可能类似于：

Thought: Arrrr, me hearties! I be searchin' fer the number o' scurvy dogs livin' in them Canadian lands, and I be findin' out that thar be more than 40.7 million people walkin' about on them cold, northern soils! A fine number o' potential crew members, arrr!
Final Answer: Thar be more than 40.7 million mateys livin' in Canada, as o' 2023, ye scurvy dog!

Agent将这个输出parse得到AgentAction or AgentFinish，传递给AgentExecutor
如果得到AgentAction，AgentExecutor根据解析出的动作和动作输入调用相应的Tool。在上面例子中，AgentExecutor会调用google search工具，并将动作输入作为参数传递给它。
Tool执行相应的操作并返回结果。在这个例子中，google search工具会执行搜索操作，并返回一个包含加拿大2023年人口信息的相关结果。
AgentExecutor将Tool的结果传递回Agent。Agent更新其内部状态（agent_scratchpad），然后继续生成下一个动作和动作输入。这个过程会重复进行，直到Agent生成一个最终答案。

最后，AgentExecutor返回Agent生成的最终答案。