我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
延續 AI說書 - 從0開始 - 218 | OpenAI GPT 4 & RAG 安裝完相關依賴,今天來撰寫一個函數,此函數能執行文章內容檢索:
def fetch_and_summarize(user_query):
urls = select_urls_based_on_query(user_query)
summarizer = pipeline("summarization", model = "sshleifer/distilbart-cnn-12-6")
summaries = []
for url in urls:
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
article = soup.find('article')
if article:
article_text = article.get_text()
else:
paragraphs = soup.find_all('p')
article_text = ' '.join([para.get_text() for para in paragraphs])
if len(article_text) > 1024:
article_text = article_text[:1024]
summary = summarizer(article_text, max_length = 130, min_length = 30, do_sample = False)[0]['summary_text']
summaries.append(summary)
return summaries