Langchain (未允禁转)

2024-04-22  本文已影响0人  9_SooHyun

Langchain Overview

Langchain Overview

set this aside and go to the following chapters if you are not familiar with LangChain. Back here after you learn all the important concepts of Langchain.

Runnable interface

Langchain中,我们将一系列Runnable实现串联起来,就构成了Chain。那么,我们先来关注构成Chain的基本元素:Runnable interface

methods of Runnable

The core method of Runnable interface is invoke:

and other key methods are:

these other key methods are based on invoke/ainvoke

Runnables schematic information

Runnables expose schematic information about their input, output and config via the input_schema property, the output_schema property and config_schema method

每一个Runnable实现都具有特定的input schema/output schema,可以通过访问input_schema output_schema属性获得

需要注意的是,不同的Runnable实现具有不同的input schema/output schema,前一Runnable实现的output schema 必须匹配后一Runnable实现的input schema,才能顺利完成连接

Runnable sequences

One key advantage of the Runnable interface is that any two runnables can be “chained” together into sequences. The output of the previous runnable’s .invoke() call is passed as input to the next runnable. This can be done using the pipe operator (|), or the more explicit .pipe() method

例如,创建一个LLMChain类的实例 chain = prompt | llm | output_parser,里面的prompt``llm``output_parser都是Runnable实现

如果定义一个和prompt输出数据结构不同的Runnable实例,那么这个实例将无法与llm直接连接。为了让它们能够连接在一起,你需要确保这个新定义的Runnable 实例与llm之间的数据交换是可行的。这可能需要重新定义你的实例或在它们之间插入一个适配器来转换输入/输出数据结构,以便让这两个部分能够顺利地相互协作

Core Implementation of Runnable: LLMChain

LLMChain 是 LangChain 里面最核心、应用最基础最广泛的Runnable实现。上文其实已经提到,LLMChain有3个重要组成部分

以下是一个完整的组合Prompt Template, Output Parser生成LLMChain的具体用例

from langchain_cohere import ChatCohere
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains.llm import LLMChain
def basic_chain_example():
    """
    basic_chain_example
    """

    print("-"*20 + "basic_chain_example start" + "-"*20)
    
    # generate a prompt
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are world class technical documentation writer."),
        ("user", "{input}")
    ])

    # get a llm
    llm = ChatCohere(cohere_api_key=load_config().cohere_api_key)

    # get a output_parser
    output_parser = StrOutputParser()

    # make a chain
    # chain = prompt | llm | output_parser
    chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

    # do a query
    resp = chain.invoke({"input": "how can langsmith help with testing?"})
    print(f"basic_chain_example resp is {resp}")
    print("-"*20 + "basic_chain_example end" + "-"*20)

chain = prompt | llm | output_parserchain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)这两种写法在功能上是相同的,但它们在语法上有所不同。

chain = prompt | llm | output_parser使用了管道(|)操作符。这种写法通常用于将多个操作连接在一起,使得前一个操作的输出成为下一个操作的输入。在这个例子中,prompt、llm和output_parser被连接在一起,形成一个chain对象。这种写法简洁且易于阅读,但需要在背后实现一些特殊的方法(如重写__or__方法)来支持管道操作符。例如,这里ChatPromptTemplate->BasePromptTemplate->RunnableSerializable内重写了__or__方法

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)这种写法更明确地表明了我们正在创建LLMChain类的一个实例,并且可以更容易地在调用时传递其他参数

显式调用类构造函数的写法更常见,推荐使用

Runnable framework overview

            Runnable interface
              /     |     \     \
            impl   impl   impl     Chain interface
            /       |       \       / (LLMChain as an impl)
      ------------------------------------
      |  prompt --> llm --> output_parser |      
      ------------------------------------

Agent

什么是Agent

我们经常说,AI Agent. 什么是Agent?

Agent 是一个具体类,其实例具有必须属性llm_chain: LLMChainallowed_toolsoutput_parser,方法from_llm_and_tools可直观佐证。简单来说,Agent是对LLMChain的再封装,融合了Tools(Tool将在下文介绍)

Agent什么作用

Agent核心目标是使用LLM进行推理来选择要采取的下一动作

这里面的关键逻辑是:

以上逻辑被封装到Agent的核心方法Agent.plan

class Agent(BaseSingleActionAgent):
    llm_chain: LLMChain
    output_parser: AgentOutputParser
    allowed_tools: Optional[List[str]] = None

    ...

    def plan(
        self,
        intermediate_steps: List[Tuple[AgentAction, str]],
        callbacks: Callbacks = None,
        **kwargs: Any,
    ) -> Union[AgentAction, AgentFinish]:
        """Given input, decided what to do.

        Args:
            intermediate_steps: Steps the LLM has taken to date,
                along with observations
            callbacks: Callbacks to run.
            **kwargs: User inputs.

        Returns:
            Action specifying what tool to use.
        """
        # make full_inputs for llm_chain
        full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
        # call `predict` on `llm_chain` with the concrete prompt
        full_output = self.llm_chain.predict(callbacks=callbacks, **full_inputs)
        # use `output_parser` to parse the `llm_chain`'s output, get -> (AgentAction | AgentFinish)
        return self.output_parser.parse(full_output)

我们看到,本质上,Agent对LLM的调用和一条普通的LLMChain调用在链路上没有差别,都是 生成具体prompt->call LLMChain->parse output from llm

那Agent为什么能够 decide the next action based on the output of LLMChain?很简单

对于开发者而言,LLM模型本身是不可修改的,直接调用。开发者能够控制的,只有prompt和output_parser。因此,Agent的关键设计其实就在于,提供了特定的prompt,融合了tools描述和回答格式,来引导LLM模型进行推理决策并让LLM模型进行格式化的输出;同时,配置了特定的output_parser,去parse LLM模型的格式化输出,得到python runtime的 AgentAction | AgentFinish 实例对象

例如,ZeroShotAgent默认PromptTemplate如下

from langchain_google_community import GoogleSearchAPIWrapper
from langchain_core.tools import Tool
from langchain.agents import ZeroShotAgent

search = GoogleSearchAPIWrapper(
        google_api_key="xxx", google_cse_id="yyy")
google_tool = Tool(name="google search",
                    description="For any questions, you must use this tool to search Google for helpful results", func=search.run)

prompt = ZeroShotAgent.create_prompt(
    tools=[google_tool],
    input_variables=["input", "agent_scratchpad"]
)
print(f"prompt.template is:\n\n{prompt.template}")

output:

prompt.template is:

Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

google search: For any questions, you must use this tool to search Google for helpful results

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question



Begin! 

Question: {input}
Thought:{agent_scratchpad}

可以看到,prompt.template给出了tools的描述,并且要求llm按照Thought/Action/Action Input/Observation的格式进行输出

Tool

在Langchain中,一个Tool定义了一个工具。根据开发者的需要,工具可以做任何事情,如db查询工具、搜索引擎工具、分词工具、情感计算工具等等

如何定义一个Tool

# google_tool 是一个google搜索工具
search = GoogleSearchAPIWrapper(
        google_api_key="xxx", google_cse_id="yyy")
google_tool = Tool(name="google search",
                   description="For any questions, you must use this tool to search Google for helpful results",
                   func=search.run)  # func: Callable[..., str]

Tool是如何被LLM选择的呢?实际上,Agent的Tools会被render成plain txt,然后填充到在Agent.llm的prompt上的,如ZeroShotAgent.create_prompt方法(上文提到过)

class ZeroShotAgent(Agent):
    ...

    @classmethod
    def create_prompt(
        cls,
        tools: Sequence[BaseTool],
        prefix: str = PREFIX,
        suffix: str = SUFFIX,
        format_instructions: str = FORMAT_INSTRUCTIONS,
        input_variables: Optional[List[str]] = None,
    ) -> PromptTemplate:
        """Create prompt in the style of the zero shot agent.

        Args:
            tools: List of tools the agent will have access to, used to format the
                prompt.
            prefix: String to put before the list of tools.
            suffix: String to put after the list of tools.
            input_variables: List of input variables the final prompt will expect.

        Returns:
            A PromptTemplate with the template assembled from the pieces here.
        """
        tool_strings = render_text_description(list(tools))  # rendor tools to txt
        tool_names = ", ".join([tool.name for tool in tools])
        format_instructions = format_instructions.format(tool_names=tool_names)
        template = "\n\n".join([prefix, tool_strings, format_instructions, suffix])  # generate PromptTemplate
        if input_variables:
            return PromptTemplate(template=template, input_variables=input_variables)
        return PromptTemplate.from_template(template)

调用create_prompt我们可以得到融合了tools描述的prompt:

Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:

# This line is noted by wxx, not part of prompt. Below is the render plain txt of tools, for the llm to comprehend and choose
google search: For any questions, you must use this tool to search Google for helpful results
langsmith_search: Search for information about LangSmith. For any questions about LangSmith, you must use this tool!

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google search, langsmith_search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question



Begin! 

Question: {input}
Thought:{agent_scratchpad}

AgentExecutor

AgentExecutor是Chain接口的一个具体实现,核心数据结构如下

class AgentExecutor(Chain):
    """Agent that is using tools."""

    agent: Union[BaseSingleActionAgent, BaseMultiActionAgent]
    """The agent to run for creating a plan and determining actions
    to take at each step of the execution loop."""
    tools: Sequence[BaseTool]
    """The valid tools the agent can call."""

有了Agent,为什么还需要AgentExecutor?因为Agent仅仅负责决策(使用LLM决策下一动作),而不负责具体执行,而AgentExecutor则负责听从Agent的决策从而执行具体的tool

因此我们可以看到,AgentExecutor持有agent: Union[BaseSingleActionAgent, BaseMultiActionAgent] tools: Sequence[BaseTool]的引用,AgentExecutor从agent中获得”指令“,执行目标tool

具体地,The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions that agent chooses, passes the action-related tool's outputs back to the agent, and repeats. In pseudocode, this looks roughly like:

# logic of agent executor
next_action = agent.get_action(...)  # agent 只持有 tools render txt,render txt被融合到prompt供llm判断next_action
while next_action != AgentFinish:
    observation = run(next_action)  # agent executor 持有 tools 本体,可执行目标tool
    next_action = agent.get_action(..., next_action, observation)
return next_action

While this may seem simple, there are several complexities this runtime handles for you, including:

AgentExecutor实现Agent runtime的核心方法:

class AgentExecutor(Chain):
    ...

    def _call(
        self,
        inputs: Dict[str, str],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        """Run text through and get agent response."""
        # Construct a mapping of tool name to tool for easy lookup
        name_to_tool_map = {tool.name: tool for tool in self.tools}
        # We construct a mapping from each tool to a color, used for logging.
        color_mapping = get_color_mapping(
            [tool.name for tool in self.tools], excluded_colors=["green", "red"]
        )
        intermediate_steps: List[Tuple[AgentAction, str]] = []
        # Let's start tracking the number of iterations and time elapsed
        iterations = 0
        time_elapsed = 0.0
        start_time = time.time()
        # We now enter the agent loop (until it returns something).
        while self._should_continue(iterations, time_elapsed):
            next_step_output = self._take_next_step(
                name_to_tool_map,
                color_mapping,
                inputs,
                intermediate_steps,
                run_manager=run_manager,
            )
            if isinstance(next_step_output, AgentFinish):
                return self._return(
                    next_step_output, intermediate_steps, run_manager=run_manager
                )

            intermediate_steps.extend(next_step_output)
            if len(next_step_output) == 1:
                next_step_action = next_step_output[0]
                # See if tool should return directly
                tool_return = self._get_tool_return(next_step_action)
                if tool_return is not None:
                    return self._return(
                        tool_return, intermediate_steps, run_manager=run_manager
                    )
            iterations += 1
            time_elapsed = time.time() - start_time
        output = self.agent.return_stopped_response(
            self.early_stopping_method, intermediate_steps, **inputs
        )
        return self._return(output, intermediate_steps, run_manager=run_manager)

LangChain中,AgentExecutor是如何调用Tool的

在Langchain中,Agent选择Next Tool,AgentExecutor实际调用Tool

Agent负责生成一个包含动作和动作输入的输出,具体地,通过Agent.plan()返回一个Union[AgentAction, AgentFinish]类型的指令,然后AgentExecutor根据这个指令来调用相应的Tool:

最后,AgentExecutor返回Agent生成的最终答案。

上一篇 下一篇

猜你喜欢

热点阅读