Day 3 Start with the Vertex AI Gemini API and Python SDK

2266

發佈於Cloud AI Study Jams

更新於 2024/10/17發佈於 2024/10/17閱讀時間約 21 分鐘

所有內容來自於 Beginner: Introduction to Generative AI Learning Path 這個課程內容，擷取自己想紀錄的內容，詳情請移至Google提供的課程觀看。

今天有兩個作業

用用看Gemini 1.0 Pro model
用用看Gemini 1.0 Pro Vision model

準備：進入Google Cloud Console點選左上角的「三」，選Vertex AI > Workbench

Gemini 1.0 Pro model

# load model
model = GenerativeModel("gemini-1.0-pro")
# import 需要的套件
from vertexai.generative_models import (GenerationConfig, 
																							GenerativeModel, 
																							Image, 
																							Part)

1. 輸入一般的問題

responses = model.generate_content("Why is the sky blue?", stream=True)

# responses是一個<generator object _GenerativeModel._generate_content_streaming at 0x7f8439187d80>物件
# 要利用for迴圈print出來

for response in responses:
    print(response.text, end="")

The sky appears blue because of a phenomenon called Rayleigh scattering. This occurs when sunlight, which is made up of all the colors of the rainbow, interacts with the Earth's atmosphere. The blue wavelengths of light are scattered more easily by the tiny particles in the atmosphere than the longer wavelengths like red and yellow.

This scattered blue light reaches our eyes from all directions in the sky, making it appear blue. The scattering is most efficient at shorter wavelengths, which is why the sky appears a deeper blue when looking directly upwards, where the light has to travel through more atmosphere.

Here's a little more information about Rayleigh scattering:

* It was first described by Lord Rayleigh in the late 1800s.
* It's why sunsets are often red or orange. As the sun sets, the sunlight has to travel through more atmosphere, and the blue light is scattered away, leaving the longer wavelengths like red and orange to reach our eyes.
* It's also responsible for the blue color of some birds' feathers and the clear blue eyes of some humans.

I hope this explanation helps! Let me know if you have any other questions.

2. Model 參數介紹

1. temperature: 決定生成的回覆隨機性，越低LLM的回覆就越具確定性，但相對的也比較沒有創意性和開放性。詳細內容可以參考：https://docs.cohere.com/docs/temperature
2. top_p & top_k: LLM輸出生成的回應前，會先輸出一堆詞彙，這些詞彙會依據品質排名。top_p又名Nucleus Sampling，候選詞彙個數是不固定的，從詞彙的百分比選擇詞彙，例如top_p=0.15，就表示選擇前15％的詞彙作為候選詞彙。而top_k則是從token中選擇k個候選，根據他們的likelihood分數來選擇。不論是top_p或top_k都是值越小的時候生成的內容越固定。詳細的可以參考：https://docs.cohere.com/docs/controlling-generation-with-top-k-top-p

generation_config = GenerationConfig(
    temperature=0.9, 
    top_p=1.0,
    top_k=32,
    candidate_count=1,
    max_output_tokens=8192,
)

responses = model.generate_content(
    "Why is the sky blue?",
    generation_config=generation_config,
    stream=True,
)

for response in responses:
    print(response.text, end="")

3. Gemini支援多次對話（Memory）

# 建立對話
chat = model.start_chat()

# 第一輪對話
prompt = """My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.
Suggest another movie I might like.
"""
responses = chat.send_message(prompt, stream=True)

# 第二輪對話
prompt2 = "Are my favorite movies based on a book series?"
responses = chat.send_message(prompt2, stream=True)

# 把歷史對話print出來
print(chat.history)

[role: "user"
parts {
  text: "My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.\n\nSuggest another movie I might like.\n"
}
, role: "model"
parts {
  text: "Hi Ned, since you love Lord of the Rings and The Hobbit, both fantasy epics with grand adventures, mythical creatures, and themes of good versus evil, you might enjoy:\n\n* **Willow (1988):**  A classic fantasy film with a similar whimsical feel to LOTR, featuring a dwarf warrior on a quest to protect a baby from an evil queen. \n* **The Chronicles of Narnia (series):**  Another epic tale about a group of children who discover a magical world.\n\nLet me know what you think!  I can offer more suggestions based on what you liked most about LOTR and The Hobbit.  Did you enjoy the battles, the friendships, the magical creatures, or something else entirely?  \360\237\230\212 \n"
}
, role: "user"
parts {
  text: "Are my favorite movies based on a book series?"
}
, role: "model"
parts {
  text: "Yes, Ned, both *The Lord of the Rings* and *The Hobbit* are based on book series written by J.R.R. Tolkien! \n\n* **The Hobbit** is a standalone book.\n* **The Lord of the Rings** is a trilogy, consisting of:\n    * *The Fellowship of the Ring*\n    * *The Two Towers*\n    * *The Return of the King*\n\nMany people consider them to be some of the greatest fantasy novels ever written! Have you read any of them? \n"
}
]

Gemini 1.0 Pro Vision model

# load model
multimodal_model = GenerativeModel("gemini-1.0-pro-vision")

讀取圖片＆影片的方式，下面有範例

# 從local讀取圖片
image = Image.load_from_file("image.jpg")
# 從gcs讀取圖片
image = Part.from_uri(gcs_uri, mime_type="image/jpeg")
# 直接從URL讀取圖片
image = load_image_from_url(image_url) # convert to bytes
# 讀取影片
video = Part.from_uri(video_uri, mime_type="video/mp4")

Example 1. 利用local圖片產出文字

# Download an image from Google Cloud Storage
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

# Load from local file
image = Image.load_from_file("image.jpg")

# Prepare contents
prompt = "Describe this image?"
contents = [image, prompt]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

Example 2. 從gcs讀取圖片

# Load image from Cloud Storage URI
gcs_uri = "gs://cloud-samples-data/generative-ai/image/boats.jpeg"

# Prepare contents
image = Part.from_uri(gcs_uri, mime_type="image/jpeg")
prompt = "Describe the scene?"
contents = [image, prompt]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

Example 3. 從URL讀取圖片

# Load image from Cloud Storage URI
image_url = (
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/boats.jpeg"
)
image = load_image_from_url(image_url)  # convert to bytes

# Prepare contents
prompt = "Describe the scene?"
contents = [image, prompt]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

Example 4. 多個圖片＋few-shot prompting

# Load images from Cloud Storage URI
image1_url = "https://storage.googleapis.com/github-repo/img/gemini/intro/landmark1.jpg"
image2_url = "https://storage.googleapis.com/github-repo/img/gemini/intro/landmark2.jpg"
image3_url = "https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg"
image1 = load_image_from_url(image1_url)
image2 = load_image_from_url(image2_url)
image3 = load_image_from_url(image3_url)

# Prepare prompts
prompt1 = """{"city": "London", "Landmark:", "Big Ben"}"""
prompt2 = """{"city": "Paris", "Landmark:", "Eiffel Tower"}"""

# Prepare contents
contents = [image1, prompt1, image2, prompt2, image3]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

Example 5 讀取影片並要求回覆格式為json

prompt = """
Answer the following questions using the video only:
What is the profession of the main person?
What are the main features of the phone highlighted?
Which city was this recorded in?
Provide the answer JSON.
"""

video = Part.from_uri(video_uri, mime_type="video/mp4")
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

for response in responses:
    print(response.text, end="")

```json
{
  "person": {
    "profession": "photographer"
  },
  "phone": {
    "features": [
      "Video Boost",
      "Night Sight"
    ]
  },
  "city": "Tokyo"
}
```