一個 工程級的進階 AI Agent 系統藍圖 + 最小可運行範例。
整體會做到要求的:
- ✅ Tool Calling 微調資料生成
- ✅ LLaMA / Qwen LoRA 微調腳本
- ✅ RAG (向量檢索)
- ✅ Auto Tool Selection
- ✅ Graph Memory
- ✅ Multi-step Reasoning Agent
- ✅ ChatGPT 類介面
一、最終系統架構
┌─────────────────┐
│ User Chat UI │
└────────┬────────┘
│
▼
┌──────────────────┐
│ Agent Planner │
│ (Multi-step LLM) │
└────────┬─────────┘
│
┌───────────────┼───────────────┐
▼ ▼
Tool Selection RAG Retriever
│ │
▼ ▼
Tool Executor Vector Database
│ │
▼ ▼
└───────────────┬───────────────┘
▼
Graph Memory
│
▼
Final Response
二、專案目錄
建議結構:
ai_agent/
│
├── train/
│ ├── generate_data.py
│ ├── train_lora.py
│
├── rag/
│ ├── build_index.py
│ ├── retriever.py
│
├── tools/
│ ├── calendar_tool.py
│ ├── database_tool.py
│ ├── search_tool.py
│
├── memory/
│ ├── graph_memory.py
│
├── agent/
│ ├── planner.py
│ ├── executor.py
│
├── server/
│ ├── api.py
│
└── ui/
├── chat_app.py
三、100 條 Tool Calling 訓練資料生成器
train/generate_data.py
</>Python
import json
import random
tools = [
("query_database", "查詢預算資料"),
("add_calendar_event", "新增行事曆"),
("search_web", "搜尋網路")
]
queries = [
"幫我查一下預算",
"查詢銷售數據",
"找一下AI新聞",
"幫我安排會議",
]
def generate_example():
tool = random.choice(tools)
if tool[0] == "query_database":
return {
"instruction": random.choice(queries),
"output": {
"tool": "query_database",
"params": {"query_str": "budget"}
}
}
if tool[0] == "add_calendar_event":
return {
"instruction": "明天下午三點安排會議",
"output": {
"tool": "add_calendar_event",
"params": {
"title": "會議",
"start_time": "tomorrow 15:00"
}
}
}
if tool[0] == "search_web":
return {
"instruction": "幫我找AI新聞",
"output": {
"tool": "search_web",
"params": {"query": "AI news"}
}
}
data = [generate_example() for _ in range(100)]
with open("tool_data.json", "w") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
四、LLaMA / Qwen LoRA 微調腳本
train/train_lora.py
需要:
pip install transformers peft trl datasets accelerate bitsandbytes</>Python
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer
model_name = "Qwen/Qwen2-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True,
device_map="auto"
)
dataset = load_dataset("json", data_files="tool_data.json")
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj","v_proj"],
lora_dropout=0.05,
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
peft_config=lora_config,
dataset_text_field="instruction",
)
trainer.train()
trainer.save_model("agent_model")
五、RAG 系統
建立向量資料庫
rag/build_index.py
</>Python
from sentence_transformers import SentenceTransformer
import faiss
import pickle
docs = [
"2026 公司預算為 300 萬",
"AI 市場預計成長 20%",
]
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(docs)
index = faiss.IndexFlatL2(len(embeddings[0]))
index.add(embeddings)
faiss.write_index(index, "vector.index")
pickle.dump(docs, open("docs.pkl", "wb"))
檢索器
rag/retriever.py
</>Python
import faiss
import pickle
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
index = faiss.read_index("vector.index")
docs = pickle.load(open("docs.pkl","rb"))
def retrieve(query):
emb = model.encode([query])
D,I = index.search(emb,3)
return [docs[i] for i in I[0]]
六、Tool 系統
tools/database_tool.py
</>Python
def query_database(query_str):
db = {
"budget":"2026 預算 300 萬"
}
return db.get(query_str,"not found")
七、Graph Memory
memory/graph_memory.py
</>Python
import networkx as nx
class GraphMemory:
def __init__(self):
self.graph = nx.DiGraph()
def add_memory(self, user, info):
self.graph.add_node(user)
self.graph.add_node(info)
self.graph.add_edge(user, info)
def query(self, user):
return list(self.graph.neighbors(user))
八、Multi-step Reasoning Agent
agent/planner.py
</>Python
from transformers import pipeline
llm = pipeline("text-generation", model="agent_model")
def plan(task):
prompt = f"""
Task: {task}
Think step by step and decide tools.
"""
return llm(prompt)[0]["generated_text"]
九、Tool Executor
agent/executor.py
</>Python
import json
from tools.database_tool import query_database
def execute(tool_call):
data = json.loads(tool_call)
if data["tool"] == "query_database":
return query_database(**data["params"])
十、ChatGPT 類 API
server/api.py
pip install fastapi uvicorn</>Python
from fastapi import FastAPI
from agent.planner import plan
app = FastAPI()
@app.post("/chat")
def chat(msg: str):
result = plan(msg)
return {"response": result}
啟動:
uvicorn api:app --reload
十一、簡單 Chat UI
ui/chat_app.py
pip install streamlit</>Python
import streamlit as st
import requests
st.title("AI Agent")
msg = st.text_input("Message")
if st.button("Send"):
r = requests.post(
"http://localhost:8000/chat",
json={"msg":msg}
)
st.write(r.json())
十二、4090 本地訓練流程
conda create -n agent python=3.10
conda activate agent
安裝:
pip install torch transformers accelerate peft trl bitsandbytesGPU需求:
RTX4090 24GB
可訓練:
Qwen2 7B QLoRA
Llama3 8B QLoRA
Mistral 7B
十三、完整 Agent 流程
實際推理:
User
↓
Planner LLM
↓
Tool Selection
↓
Tool Execution
↓
RAG Retrieval
↓
Graph Memory 更新
↓
LLM Final Answer十四、升級到 AutoGPT 等級
再加:
1️⃣ Task Decomposition
Goal → Subtasks2️⃣ Long-term memory
Weaviate
ChromaDB
Milvus
3️⃣ Self Reflection
LLM critic
4️⃣ Planning Graph
LangGraph十五、最強本地開源組合(2026)
推薦:
LLM
Qwen2.5 14B
Embedding
bge-m3
Vector DB
Milvus
Agent
LangGraph
Fine-tune
QLoRA
4090 可以跑。














