更新於 2024/10/21閱讀時間約 3 分鐘

AI說書 - 從0開始 - 220 | GPT 4 & RAG 文章檢索器

我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。


延續 AI說書 - 從0開始 - 218 | OpenAI GPT 4 & RAG 安裝完相關依賴,今天來撰寫一個函數,此函數能執行文章內容檢索:

def fetch_and_summarize(user_query):
urls = select_urls_based_on_query(user_query)
summarizer = pipeline("summarization", model = "sshleifer/distilbart-cnn-12-6")
summaries = []

for url in urls:
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
article = soup.find('article')
if article:
article_text = article.get_text()
else:
paragraphs = soup.find_all('p')
article_text = ' '.join([para.get_text() for para in paragraphs])

if len(article_text) > 1024:
article_text = article_text[:1024]

summary = summarizer(article_text, max_length = 130, min_length = 30, do_sample = False)[0]['summary_text']
summaries.append(summary)
return summaries




分享至
成為作者繼續創作的動力吧!
© 2024 vocus All rights reserved.