打造客製化的 Chatbot：從 RAG 到 Langchain Agent 的實作

更新於 2024/08/22發佈於 2024/08/22閱讀時間約 2 分鐘

自從 ChatGPT 開始熱門以來，許多企業也都開始想打造自己的chatbot 來服務客戶，求職市場上也開始多了像是LLM engineer 甚至 chatbot engineer/trainer/manager 等等的職位。

透過LLM 打造chatbot 並且讓LLM 能回覆一些自己公司內部的資訊，其實除了insert資訊以及呼叫 LLM API 之外還有很多其他 preprocess, postprocess的配套措施需要客製化開發，比如insert的資料有大量影像甚至影片或是database裡很多張的table、LLM output 的回應如何客製化產品資訊連結、圖像說明，等等的

不過撇除那些繁雜的前、後處理，打造客製化且能隨時更新資訊的chatbot 基本上可以透過 RAG 以及 agent 來達成。

以下介紹如何透過Langchain agent打造一個有檢索功能以及search功能的chatbot

Retrieval Augmented Generation(RAG)

當我們希望LLM額外學會的資訊(e.g. 企業內部資訊、FAQ,...) 的文本長度太長時，我們沒辦法把他們全部寫進去prompt裡(input token limit)，這時我們需要透過 RAG 來讓LLM 進行檢索。

我覺得下面這張圖說明的很清楚

Gao, Yunfan et al. “Retrieval-Augmented Generation for Large Language Models: A Survey.” (2023).

我們從右上角的 indexing 區塊開始看，首先我們準備好要讓LLM 額外具備的知識的文本 Documents，然後把這份長文本切成很多chunks後轉換成embeddings存放在vectorstore。

接下來當 User Query的句子進來之後，我們一樣把query 給embedding，然後在vectorstore裡進行向量相似度計算找到最接近的k個chunks，然後我們將這些chunks的原文 insert 進去 prompt 裡讓LLM具備相關知識來回答 User 的問題。

概念蠻好理解的，但實際上其實還有很多可以改進的做法，在 Retrieval-Augmented Generation for Large Language Models: A Survey 這篇論文裡有講到很多做法，如果基本的RAG沒有辦法滿足需求，可以參考看看。

接下來來看一下如何在Langchain 上搞出一個RAG。假設我要檢索的文本是這份中華民國民法的PDF，總共有171頁。

廢話不多說直接上code (我是用 langchain 0.2.11)

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA, create_retrieval_chain
from langchain.chat_models import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain


system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

def get_qa_chain(pdf_path):
    # read file
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()
    
    # split your docs into texts chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    texts = text_splitter.split_documents(documents)
    
    # embed the chunks into vectorstore (FAISS)
    embeddings = OpenAIEmbeddings()
    vectorstore = FAISS.from_documents(texts, embeddings)
    
    # create retriever and rag chain
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
    question_answer_chain = create_stuff_documents_chain(llm=ChatOpenAI(model_name='gpt-4o-mini', temperature=0),
                                                         prompt=rag_prompt)
    rag_chain = create_retrieval_chain(retriever, question_answer_chain)
    
    return rag_chain
    
rag_chain = get_qa_chain('民法.pdf')

基本上只是按照官方的 tutorials 複製貼上而已XD

可以看到langchain已經包裝好很多東西了，包含 load pdf, embedding, vectorstore, vector search, 串接LLM，光想到這些要自己刻就累了💦

這邊我使用openAI的 gpt-4o, 所以也要記得把你的 token設定到環境變數。

那就來看看rag_chain的對話狀況吧：

看起來還行，另外我們也看以查看 result['context'] 來看 rag 檢索到的最相近的k個chunk的內容為何。

你會發現，不論你問什麼問題，他都會去檢索然後產生回應，即使你只是打聲招呼。這時我們會想，如果他可以自己判斷該問題要不要檢索，根據不同問題執行不同的功能，這就是agent的概念了。

Agent

我們現在有個目標是讓 LLM 自己判斷該client的問題要不要檢索，並且根據不同問題執行不同的功能。

我們可以自己另外架一個LLM來完成這件事，也就是像下面這張圖，用兩個LLM來達成，綠色的LLM來判斷 client question要用哪個tool(RAG, google search or no need)，然後執行tool取得資訊，再把資訊塞進去藍色LLM的prompt裡叫他要利用這些資訊來產生回覆給 client

自行實作出 agent 可能的流程

我們也可以透過Langchain的agent 來完成這件事

這部分我覺得各有好壞，自己刻不複雜，就透過prompt engineer來讓LLM判斷使用哪個tool，而透過langchian agent 可以用更少的力氣完成，但缺點就是會變得比較複雜一點不容易debug而且 langchain 版本不斷更新還有指令的應用也一直在修正。

但我有點懶得寫太多，所以還是直接用langchain來實作 agent吧XD

這邊設我除了上一段檢索的 rag_chain 外，我還想給agent google search的能力，也就是說我會給agent兩個tools: rag 來檢索民法、google search。

rag_chain我們在上一段一經完成了，而google search的功能我們透過searchAPI 來完成，這邊我使用 Serper，到 Serper官網註冊就能取得token，且有 2500次 free queries。而langchain也有GoogleSerperAPIWrapper 來協助我們只用一行實現他XD

from langchain_community.utilities import GoogleSerperAPIWrapper
search = GoogleSerperAPIWrapper()

簡單來測試看看search的功能:

好的 search 可以幫我們進行google search並把查到的內容已字串方式給我們

把 rag_chain 和 search 一起包成tool

from langchain.agents import Tool, AgentExecutor, create_react_agent

tools = [
    Tool(
        name="RAG Legal",
        func=rag_chain.invoke,
        description="Useful when you're asked legal-related questions"
    ),
    Tool(
        name="Google Search",
        description="For answering questions that are not related to legal or when you don't know the answer, use Google search to find the answer",
        func=search.run,
    )
]

這邊 tool 的 description 就是LLM 判斷問題要使用哪個tool的依據，所以依照你的需求盡量把description描述清楚。

接著我們設計一個prompt 來限制 agent的思考方式，最終產出我們要的回覆:

character_prompt = """Answer the following questions as best you can. You have access to the following tools:
{tools}

For any questions requiring tools, you should first search the provided knowledge base. If you don't find relevant information from provided knowledge base, then use Google search to find related information.

To use a tool, you MUST use the following format:
1. Thought: Do I need to use a tool? Yes
2. Action: the action to take, should be one of [{tool_names}]
3. Action Input: the input to the action
4. Observation: the result of the action

When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the following format:
1. Thought: Do I need to use a tool? No
2. Final Answer: [your response here]

It's very important to always include the 'Thought' before any 'Action' or 'Final Answer'. Ensure your output strictly follows the formats above.

Begin!

Previous conversation history:
{chat_history}

Question: {input}
Thought: {agent_scratchpad}
"""

大家可能會想說我怎麼生出上面這段 prompt的，你可以在 langchain的 prompt hub 找到一些範例然後來進行一些修改:

from langchain import hub
hub.pull("hwchase17/react")

接著就使用 create_react_agent 把tool, prompt 包成agent，然後使用AgentExecutor 來執行agent產生回覆吧，這邊我也順手把memory的功能也加進去:

from langchain.prompts.prompt import PromptTemplate
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

chat_model = ChatOpenAI(model_name='gpt-4',
                        temperature=0,
                        streaming=True,
                        verbose=True,
                        max_tokens=1024,
                        )

prompt = PromptTemplate.from_template(character_prompt)
agent = create_react_agent(chat_model, tools, prompt)

memory = ConversationBufferWindowMemory(memory_key='chat_history', k=5, return_messages=True, output_key="output")
agent_chain = AgentExecutor(agent=agent,
                            tools=tools,
                            memory=memory,
                            max_iterations=5,
                            handle_parsing_errors=True,
                            verbose=True,
                            )

這個agent_chain 就可以拿來進行對話啦，當user input question後，他會去判斷是否要用tool，要用哪個tool，執行action 得到tool的回應後觀察看是否有足夠的資訊了，如果有，就根據蒐集到的資訊產生回覆，如果沒有，就會繼續執行action直到max_iteration。

這邊其實很容易遭遇到一個問題，就是LLM沒有按照我們prompt裡要求他回應的格式來產生回覆，就會產生 LLM parse error，你就會發現他明明已經搜集到足以回答問題的資訊了，卻沒辦法拿來回答，這裡我嘗試用prompt engineering來減緩，可以看到上面我在prompt裡不斷強調 you MUST use the following format、It's very important to always include the 'Thought' before any 'Action' or 'Final Answer'. Ensure your output strictly follows the formats above.

然後我也使用到了 gpt-4，因為我發現越大的模型越能夠follow我們的 prompt instruct。

那最後就來看看agent_chain的對話結果吧