在AWS部署多大型多語言語言模型BLOOM

2023/07/10 更新2023/07/07 發佈閱讀 10 分鐘

大型語言模型(Large Language Model, LLM)是當前AI/ML的熱門領域，短時間內在自然語言處理和文本生成等領域的應用上有顯著突破。兩年內，隨著深度學習和硬體的發展，大型語言模型取得了顯著進展，並在語言處理相關領域帶來革命性影響。
大型語言模型的潮流可以追溯到開源模型如Transformer, GPT（Generative Pre-trained Transformer）和BERT（Bidirectional Encoder Representations from Transformers）的出現。這些模型利用深度神經網絡和注意力機制等技術，通過大規模的無監督預訓練來學習語言的統計特徵和語義關聯性。這些預訓練模型可以針對不同情境進行微調，以適應特定的任務，如文本分類、命名實體識別和情感分析等。

大型語言模型的應用情境非常廣泛。它們可以用於自然語言理解和生成任務，如機器翻譯、文本摘要、對話系統和問答系統等。並可以應用於文本生成，如自動寫作、劇本生成和詩歌創作等進階任務。大型語言模型在資訊檢索、推薦系統和知識圖譜構建等領域也有顯著進展。

本文將介紹在AWS SageMaker上部署及應用多語言大型語言模型BLOOM模型：bloom-176B的方法。

多語言大型語言模型BLOOM(bloom-176B)

BLOOM是一個預訓練的大型語言模型，基於Decoder-only Transformer架構。該模型架構類似於GP3（175B參數），並且經過優化以用於文本生成。作為語言模型，BLOOM基本原理便是利用輸入的前文來預測下一個詞，並且往復進行，直至產生完整文句。作為BLOOM系列的其中一個預訓練模型，bloom-176B使用如下模型架構及目標函數：

Model Architecture and Objective
Modified from Megatron-LM GPT2
Decoder-only architecture
Layer normalization applied to word embeddings layer (StableEmbedding)
ALiBI positional encodings, with GeLU activation functions

176,247,271,424 parameters:
    3,596,615,680 embedding parameters
    70 layers, 112 attention heads
    Hidden layers are 14336-dimensional
    Sequence length of 2048 tokens used (see BLOOM tokenizer, tokenizer description)
    Objective Function: Cross Entropy with mean reduction.

於AWS部署Bloom-176B

部署系統需求

AWS 帳戶。
加入 Amazon SageMaker Studio。(通常需要花費 10 分鐘左右執行初始設定)
對於 BLOOM-176B，建議使用 ml.p4d.24xlarge。

使用Amazon SageMaker

Bloom-176是可用的最大Bloom模型。我們可以使用SageMaker Deep Learning Container (DLC) 來部署模型。

萾於模型的大小，模型部署我們需要使用較大型的虛擬機型。藉由AWS SageMaker對分散式運算支援的便利性，將模型層和參數分散到多個GPU。在本文的操作中，我們使用DeepSpeed進行張量(Tensor)運算的平行化。

按照以下步驟來部署Bloom-176，利用現有的AWS Jupyter筆記本：

在選擇的AWS區域中打開Amazon SageMaker Studio。在選擇AWS區域時，請考慮可用的機型。此模型需要使用ml.p4d.24xlarge。
在Amazon SageMaker Studio中，複製amazon-sagemaker-examples。
到路徑inference/nlp/realtime/llm/bloom_176b/，並打開notebook “djl_deepspeed_deploy.ipynb”。
執行筆記本中的所有cell，只需注意最後5個單元格將進行資源清理，可以在結束使用前執行即可，並且注意notebook中的兩個可設置選項：

指定從Hugging Face Hub下載模型

此筆記本默認將重複使用從Hugging Face下載的位於Amazon S3存儲桶“sagemaker-sample-files”中的Bloom模型。但是，如果希望從Hugging Face下載模型並將其存儲在您自己的Amazon S3存儲桶中，可以將變量install_model_locally設置為True。

使用VpcConfig指定創建端點

如果您想要為模型端點指定VpcConfig，則可以使用此部分。出於安全原因，建議將AWS資源保持在您自己的VPC中運行。如果選擇使用VpcConfig，您需要執行此部分中的可選單元格，並un-comment“在創建endpoint的步驟中的: VpcConfig=privateVpcConfig。

BLOOM模型應用

創建endpoint後，可以使用notebook中的 “Leverage the Boto3 to invoke the endpoint”。如以下範例：

Query:

%%time

smr_client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(
        {
            "input": [
　　　　　　　　"Cloud computing advances", 
　　　　　　　　"AWS is the best" 
　　　　　　　　],
            "gen_kwargs": {
                "min_length": 20,
                "max_new_tokens": 100,
                "temperature": 0.8,
                "num_beams": 5,
                "no_repeat_ngram_size": 2,
            },
        }
    ),
    ContentType="application/json",
)["Body"].read().decode("utf8")

Response:

CPU times: user 18.6 ms, sys: 0 ns, total: 18.6 ms
Wall time: 11.8 s

'[\n  "Cloud computing advances in the last few years have made
 it possible to store and process large amounts of data in a 
cost-effective manner. Cloud computing is a model for enabling
 ubiquitous, convenient, on-demand network access to a shared 
pool of configurable computing resources (e.g., networks, 
servers, storage, applications, and services) that can be 
rapidly provisioned and released with minimal management 
effort or service provider interaction. This cloud model 
promotes availability and is composed of five essential 
characteristics, three service models",\n  "AWS is the best
 cloud computing service provider in the world. It provides a
 wide range of services to its customers. The services provided
 by the company are as follows:\\nThe company has a large number 
of data centers in different parts of the globe. This is done to
 ensure that the services are available to the customers at all 
times."\n]'