方格子 vocus

有沒有特化走差,Gemini2.0Flash thinking與特化版比較-生成圖逆向工程

林楷庭（熊三飛彈）

發佈於科科科科科

2025/03/25 更新2025/03/25 發佈閱讀 11 分鐘

由於Google把自家實驗室的新玩意都放在

AI studio網頁裡面導致說大部分人對Gemini的評價不是很好，這篇文紀錄的是 Image to Text（prompts）

規則很簡單，筆者以前生成的東西丟進去Gemini2.0Flash thinking與放在AI studio網頁的圖像特化版去做比較預設設定什麼都沒有調整

Gemini2.0Flash thinking

根據您上傳的圖片，我推測可能的 prompt 包含以下元素： * 主體描述: 一位動漫風格的年輕女性，穿著列車長的制服。 * 服裝細節: 黑色的外套，金色的滾邊，白色的多層次裙子，紅色的手套，黑色的長靴，頭戴列車長帽子。 * 背景描述: 看起來像是在火車站的月台或列車內部，有鐵軌、列車等元素。 * 畫面風格: 動漫風格，細緻的畫風。綜合以上，一個可能的 prompt 範例如下： anime girl, young woman, train conductor uniform, black jacket with gold trim, white ruffled skirt, red gloves, black boots, conductor hat, train station platform, train in background, detailed, high quality 當然，這只是眾多可能的 prompt 之一

（你可以看到非常重點簡易，前面還有一陀廢話但很不錯了，有把核心弄出來）

圖像特化版的thinking

Core Subject: （核心指令） A young woman: This is the central figure. （人物是這張畫面的中心） Anime style / Manga style: Evident in the large eyes, detailed hair, and overall aesthetic. （根據圖像得知這是動漫,漫畫風格） Train conductor uniform inspired: The hat, jacket with gold trim, potentially the gloves, and the overall formal yet slightly stylized look suggest this. （金邊外套、帽子與手套（可能是手套）以及整體正式略帶風格化的外觀證明這點）（服裝靈感來自車長制服的證明） Long black hair: The flowing, dark hair is a prominent feature. （關於頭髮的部分，細節沒弄好不過對模型來講那是不必要的細節） Red or pink eyes: The eye color is clearly visible. （瞳孔顏色，顏色清晰可見

顏色這點就算是Grok也辦不太到

它跑出來的遠看一致細看異色瞳） Clothing Details: （關於服裝的細節） Conductor's hat: Likely with a specific design or emblem (though partially obscured). （帽子上面有特別設計或是徽章，但是被擋住）（實際上是Lora訓練的時候就有的問題） Dark jacket with gold trim/detailing: The jacket has a structured shape and noticeable gold accents. （帶細節與金邊的深色外套,那個結構很明顯是外套然後金邊很明顯） White frilly blouse/shirt: Visible at the neckline and possibly cuffs. （白色荷葉邊襯衫/上衣） Corset or fitted waist: The outfit seems to cinch at the waist. （收腰/緊身設計） Long, light-colored skirt with ruffles: The skirt is voluminous and has layered ruffles at the hem. （帶折邊的淺色大長裙） Red gloves: Bright red gloves stand out. （紅色手套,用紅色很突出） Dark boots or stockings: Visible below the skirt. （深色靴子或長襪,這邊應該是辨識不太出來）

（人物原型：まいてつ的八六） Setting/Background: （場景,風格設定與細節） Inside a train or train station: The presence of train carriages, overhead structures, pipes, and possibly platform elements indicates this. （月台或車廂內）（很接近了，原設我是寫駕駛室） Slightly industrial or vintage feel: The visible machinery and architectural details hint at this. （略帶工業/復古感）（蒸氣車頭欸，能不復古嗎） Soft, diffused lighting: The lighting doesn't seem harsh, creating a slightly dreamy atmosphere. （漫射光線，光線不刺眼，看起來有夢幻感）

（原設是太陽光） Artistic Style and Details（風格與細節） Detailed linework: Common in anime and manga. （細緻線條） Smooth shading: The rendering of the character and background suggests smooth gradients. （平滑陰影） Emphasis on hair detail: Individual strands and flow of the hair are well-defined. （頭髮細節）（對人類來說還不夠細，對電腦來說很細了） Slightly low angle perspective: The viewer seems to be looking slightly up at the character. （低角度）

到這邊你會發現它把每個抓到的細節全部提出來，且解釋為什麼要把這個細節提出來，詞彙也比原始思考版精細不少。

但真正拉開差距的呢

是接下來的

多個可能指令

Possible Prompts (Variations): Here are a few ways these elements could have been combined in prompts: Simple Prompt: （最簡要的版本） anime girl train conductor uniform, long black hair, red eyes, inside a train, detailed, soft lighting More Specific Prompt: （更加具體描述） young woman with long black flowing hair and red eyes wearing a stylized train conductor uniform with gold trim and a white frilly skirt, red gloves, inside a vintage train carriage, detailed anime art style, soft ambient light, low angle view Prompt with Artistic Style Keywords: （帶有風格化關鍵詞） anime illustration of a beautiful girl in a modified train conductor outfit, black hair, pink eyes, frilly white blouse, long ruffled light beige skirt, red gloves, standing inside a train station with visible train details, detailed shading, high-quality Prompt Emphasizing Mood: （強調氛圍的版本） dreamy anime girl, train conductor uniform inspired, black hair flowing, captivating red eyes, delicate white ruffles, long skirt, inside an old train, atmospheric lighting, detailed background （負向提示詞（不該在畫面上出現的）） Negative Prompts (Things to exclude, though not always explicitly stated): realistic, photorealistic, messy, blurry, bad anatomy, multiple people Key Takeaways for Reversing Prompts: （逆向工程prompts的幾個重點）

我的天啊它還教你如何逆向工程

Identify the core subject and its defining features. Break down the visual details of clothing, setting, and artistic style. Consider the overall mood and atmosphere. Think about keywords commonly used in image generation for desired styles (e.g., "anime," "detailed," "soft lighting"). Recognize that there isn't always one exact prompt, but rather a combination of descriptive terms. The generated image likely used a combination of these elements, and potentially more specific details or stylistic keywords depending on the AI model used.

留言