使用Amazon SageMaker
Bloom-176是可用的最大Bloom模型。我們可以使用SageMaker Deep Learning Container (DLC) 來部署模型。 萾於模型的大小,模型部署我們需要使用較大型的虛擬機型。藉由AWS SageMaker對分散式運算支援的便利性,將模型層和參數分散到多個GPU。在本文的操作中,我們使用DeepSpeed進行張量(Tensor)運算的平行化。 按照以下步驟來部署Bloom-176,利用現有的AWS Jupyter筆記本:
- 在選擇的AWS區域中打開Amazon SageMaker Studio。在選擇AWS區域時,請考慮可用的機型。此模型需要使用ml.p4d.24xlarge。
- 在Amazon SageMaker Studio中,複製amazon-sagemaker-examples。
- 到路徑inference/nlp/realtime/llm/bloom_176b/,並打開notebook “djl_deepspeed_deploy.ipynb”。
- 執行筆記本中的所有cell,只需注意最後5個單元格將進行資源清理,可以在結束使用前執行即可,並且注意notebook中的兩個可設置選項:
此筆記本默認將重複使用從Hugging Face下載的位於Amazon S3存儲桶“sagemaker-sample-files”中的Bloom模型。但是,如果希望從Hugging Face下載模型並將其存儲在您自己的Amazon S3存儲桶中,可以將變量install_model_locally設置為True。
如果您想要為模型端點指定VpcConfig,則可以使用此部分。出於安全原因,建議將AWS資源保持在您自己的VPC中運行。如果選擇使用VpcConfig,您需要執行此部分中的可選單元格,並un-comment“在創建endpoint的步驟中的: VpcConfig=privateVpcConfig。
創建endpoint後,可以使用notebook中的 “Leverage the Boto3 to invoke the endpoint”。如以下範例: Query:
"input": [
"Cloud computing advances",
"AWS is the best"
"gen_kwargs": {
"min_length": 20,
"max_new_tokens": 100,
"temperature": 0.8,
"num_beams": 5,
"no_repeat_ngram_size": 2,
CPU times: user 18.6 ms, sys: 0 ns, total: 18.6 ms
Wall time: 11.8 s
'[\n "Cloud computing advances in the last few years have made
it possible to store and process large amounts of data in a
cost-effective manner. Cloud computing is a model for enabling
ubiquitous, convenient, on-demand network access to a shared
pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management
effort or service provider interaction. This cloud model
promotes availability and is composed of five essential
characteristics, three service models",\n "AWS is the best
cloud computing service provider in the world. It provides a
wide range of services to its customers. The services provided
by the company are as follows:\\nThe company has a large number
of data centers in different parts of the globe. This is done to
ensure that the services are available to the customers at all