
譯者 | 李睿
審校 | 重樓
想象一下,不必再依賴網(wǎng)絡(luò)上的通用素材,就能發(fā)送完全由自己定制的表情包與卡通貼紙——那會(huì)是怎樣的體驗(yàn)?使用OpenAI公司最新推出的GPT-Image-1模型,用戶可以將自己的自拍照或日常照片轉(zhuǎn)化為妙趣橫生或風(fēng)格獨(dú)特的個(gè)性化貼紙。本文將介紹如何使用Python在Colab中構(gòu)建WhatsApp貼紙生成器,它支持多種藝術(shù)風(fēng)格處理,包括漫畫(huà)風(fēng)格與皮克斯風(fēng)格濾鏡等。
具體包括如何設(shè)置OpenAI圖像編輯API,在Colab中捕獲或上傳圖像,選擇預(yù)設(shè)趣味文本或輸入自定義內(nèi)容,并利用多個(gè)API 密鑰并行生成三種不同風(fēng)格的貼紙,大幅提升生成效率。最終,將創(chuàng)建一個(gè)基于GPT-Image-1和自定義文本提示驅(qū)動(dòng)的貼紙制作工具。
為什么選擇GPT-Image-1?
本文基于Leonardo.ai平臺(tái)上,對(duì) Gemini 2.0 Flash、Flux 及 Phoenix 等前沿圖像生成模型進(jìn)行了評(píng)估。研究發(fā)現(xiàn),這些模型在準(zhǔn)確呈現(xiàn)文本與表情方面普遍存在困難。例如:
- Google的Gemini 2.0圖像API即便接收到明確指令,其生成結(jié)果仍頻繁出現(xiàn)拼寫(xiě)錯(cuò)誤或文字混亂。例如,輸入“Big Sale Today!”時(shí),輸出可能是“Big Sale Todai”或隨機(jī)亂碼。
- 盡管Flux模型生成的圖像整體質(zhì)量較高,但用戶普遍反映其“容易在渲染文本時(shí)引入細(xì)微錯(cuò)誤”。隨著文本長(zhǎng)度增加,拼寫(xiě)錯(cuò)誤或亂碼現(xiàn)象會(huì)愈發(fā)明顯。此外,該模型還傾向于生成高度相似的面部特征,除非施加嚴(yán)格約束,否則易出現(xiàn)“千人一面”的問(wèn)題。
- Phoenix 模型雖然在圖像保真度和提示詞遵循方面進(jìn)行了優(yōu)化,但與多數(shù)擴(kuò)散模型一樣,它仍將文本視為視覺(jué)元素而非語(yǔ)義內(nèi)容進(jìn)行處理,因此常出現(xiàn)文本生成錯(cuò)誤。研究發(fā)現(xiàn),Phoenix只能偶爾生成措辭正確的貼紙,并且對(duì)于相似提示會(huì)反復(fù)輸出相同的默認(rèn)面部特征。
總之,現(xiàn)有模型的這些局限性促使OpenAI開(kāi)發(fā)了GPT-Image-1。與上述模型不同,GPT-Image-1采用了專門(mén)調(diào)序的提示管道,能夠明確強(qiáng)制模型生成正確的文本和豐富的表情變化。
GPT-Image-1如何進(jìn)行圖像編輯
GPT-Image-1是OpenAI推出的旗艦多模態(tài)模型。它可以從文本和圖像提示中創(chuàng)建和編輯圖像,生成并輸出高質(zhì)量的圖像。其核心能力在于,可以依據(jù)文本指令對(duì)原始圖像進(jìn)行指定編輯。在本文的案例中,通過(guò)調(diào)用GPT-Image-1的API,對(duì)輸入照片施加趣味幽默的濾鏡效果并疊加文字,從而生成個(gè)性化貼紙。
通過(guò)精心構(gòu)建的提示詞,約束模型輸出符合貼紙規(guī)格(1024×1024PNG格式)的圖像。GPT-Image-1實(shí)際上成為了人工智能驅(qū)動(dòng)的貼紙創(chuàng)建者,它既能智能改變照片主體的外觀,又能為其添加幽默文本,最終完成貼紙的自動(dòng)化創(chuàng)作。
Python
# Set up OpenAI clients for each API key (to run parallel requests)
clients = [OpenAI(api_key=key) for key in API_KEYS]因此,為每個(gè)API密鑰分別創(chuàng)建了一個(gè)OpenAI客戶端。通過(guò)配置三個(gè)獨(dú)立密鑰,用戶即可實(shí)現(xiàn)三次API調(diào)用的同步執(zhí)行。這種基于多密鑰與多線程的技術(shù)方案,依托ThreadPoolExecutor實(shí)現(xiàn)并行處理,使得每次運(yùn)行都能同時(shí)生成三張貼紙。正如代碼輸出所顯示,系統(tǒng)正通過(guò)“3個(gè)API密鑰并行生成”的方式,顯著提升貼紙的創(chuàng)建速度。
分步指南
許多人可能認(rèn)為創(chuàng)建自己的人工智能貼紙生成器是一項(xiàng)復(fù)雜的任務(wù),但本指南將化繁為簡(jiǎn)。首先從在Google Colab中配置開(kāi)發(fā)環(huán)境開(kāi)始,接著介紹API的使用方法、理解提示詞類(lèi)別、驗(yàn)證文本,學(xué)習(xí)如何生成不同風(fēng)格的貼紙,最終實(shí)現(xiàn)并行生成多張貼紙。每個(gè)步驟均配有詳細(xì)的代碼示例和說(shuō)明,幫助用戶輕松上手。現(xiàn)在開(kāi)始編寫(xiě)代碼。
在Colab中安裝和運(yùn)行
合適的配置是成功生成貼紙的前提。本項(xiàng)目將使用PIL和rembg等Python庫(kù)進(jìn)行基礎(chǔ)圖像處理,并通過(guò)google-genai庫(kù)在Colab實(shí)例中調(diào)用相關(guān)服務(wù)。第一步是在Colabnotebook中直接安裝這些必備依賴項(xiàng)。
Python
!pip install --upgrade google-genai pillow rembg
!pip install --upgrade onnxruntime
!pip install python-dotenvOpenAI集成和API密鑰
在安裝完成后,導(dǎo)入模塊并設(shè)置API密鑰。腳本為每個(gè)API密鑰創(chuàng)建一個(gè)OpenAI客戶端。這允許代碼在多個(gè)密鑰之間并行分發(fā)圖像編輯請(qǐng)求。然后,客戶端列表被貼紙生成函數(shù)使用。
Python
API_KEYS = [ # 3 API keys
"API KEY 1",
"API KEY 2",
"API KEY 3"
]
"""# Stickerverse
"""
import os
import random
import base64
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed
from openai import OpenAI
from PIL import Image
from io import BytesIO
from rembg import remove
from google.colab import files
from IPython.display import display, Javascript
from google.colab.output import eval_js
import time
clients = [OpenAI(api_key=key) for key in API_KEYS]圖像上傳和拍攝(邏輯)
現(xiàn)在,下一步是調(diào)用攝像頭以拍攝照片或上傳圖像文件。capture_photo()使用注入Colab的JavaScript打開(kāi)網(wǎng)絡(luò)攝像頭并返回捕獲的圖像。upload_image()使用Colab的文件上傳組件,并使用PIL庫(kù)對(duì)文件格式進(jìn)行驗(yàn)證。
Python
# Camera capture via JS
def capture_photo(filename='photo.jpg', quality=0.9):
js_code = """
async function takePhoto(quality) {
const div = document.createElement('div');
const video = document.createElement('video');
const btn = document.createElement('button');
btn.textContent = '?? Capture';
div.appendChild(video);
div.appendChild(btn);
document.body.appendChild(div);
const stream = await navigator.mediaDevices.getUserMedia({video: true});
video.srcObject = stream;
await video.play();
await new Promise(resolve => btn.onclick = resolve);
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getTracks().forEach(track => track.stop());
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
"""
display(Javascript(js_code))
data = eval_js("takePhoto(%f)" % quality)
binary = base64.b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
print(f"Saved: {filename}")
return filename
# Image upload function
def upload_image():
print("Please upload your image file...")
uploaded = files.upload()
if not uploaded:
print("No file uploaded!")
return None
filename = list(uploaded.keys())[0]
print(f"Uploaded: {filename}")
# Validate if it's an image
try:
img = Image.open(filename)
img.verify()
print(f"?? Image verified: {img.format} {img.size}")
return filename
except Exception as e:
print(f"Invalid image file: {str(e)}")
return None
# Interactive image source selection
def select_image_source():
print("Choose image source:")
print("1. Capture from camera")
print("2. Upload image file")
while True:
try:
choice = input("Select option (1-2): ").strip()
if choice == "1":
return "camera"
elif choice == "2":
return "upload"
else:
print("Invalid choice! Please enter 1 or 2.")
except KeyboardInterrupt:
print("\nGoodbye!")
return None輸出:

類(lèi)別和短語(yǔ)示例
接下來(lái)將創(chuàng)建不同的短語(yǔ)類(lèi)別,用于貼紙內(nèi)容的生成。為此定義了一個(gè)包含多種主題的PHRASE_CATEGORIES字典,涵蓋企業(yè)宣傳、寶萊塢、好萊塢、托萊塢、體育賽事和網(wǎng)絡(luò)表情包等類(lèi)別。當(dāng)用戶選擇某一類(lèi)別后,系統(tǒng)會(huì)從該類(lèi)別中隨機(jī)選取三個(gè)不同的短語(yǔ),并分別應(yīng)用于三種貼紙樣式的生成。
Python
PHRASE_CATEGORIES = {
"corporate": [
"Another meeting? May the force be with you!",
"Monday blues activated!",
"This could have been an email, boss!"
],
"bollywood": [
"Mogambo khush hua!",
"Kitne aadmi the?",
"Picture abhi baaki hai mere dost!"
],
"memes": [
"Bhagwan bharose!",
"Main thak gaya hoon!",
"Beta tumse na ho payega!"
]
}類(lèi)別和自定義文本
生成器內(nèi)置了一個(gè)預(yù)設(shè)的短語(yǔ)類(lèi)別字典。用戶既可從指定類(lèi)別中隨機(jī)選取趣味短語(yǔ),也可自由輸入個(gè)性化文本。系統(tǒng)同時(shí)提供了交互式選擇輔助功能,并包含一個(gè)簡(jiǎn)易的文本長(zhǎng)度校驗(yàn)函數(shù),用于確保自定義短語(yǔ)符合貼紙生成的規(guī)范要求。
Python
def select_category_or_custom():
print("\nChoose your sticker text option:")
print("1. Pick from phrase category (random selection)")
print("2. Enter my own custom phrase")
while True:
try:
choice = input("Choose option (1 or 2): ").strip()
if choice == "1":
return "category"
elif choice == "2":
return "custom"
else:
print("Invalid choice! Please enter 1 or 2.")
except KeyboardInterrupt:
print("\nGoodbye!")
return None
# NEW: Function to get custom phrase from user
def get_custom_phrase():
while True:
phrase = input("\nEnter your custom sticker text (2-50 characters): ").strip()
if len(phrase) < 2:
print("Too short! Please enter at least 2 characters.")
continue
elif len(phrase) > 50:
print("Too long! Please keep it under 50 characters.")
continue
else:
print(f"Custom phrase accepted: '{phrase}'")
return phrase對(duì)于自定義短語(yǔ),在接受之前檢查輸入長(zhǎng)度(2~50個(gè)字符)。
短語(yǔ)驗(yàn)證和拼寫(xiě)防護(hù)機(jī)制
Python
def validate_and_correct_spelling(text):
spelling_prompt = f"""
Please check the spelling and grammar of the following text and return ONLY the corrected version.
Do not add explanations, comments, or change the meaning.
Text to check: "{text}"
"""
response = clients[0].chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": spelling_prompt}],
max_tokens=100,
temperature=0.1
)
corrected_text = response.choices[0].message.content.strip()
return corrected_text現(xiàn)在將創(chuàng)建一個(gè)示例build_prompt函數(shù),為代理設(shè)置一些基本級(jí)別的指令。另外需要注意,build_prompt()會(huì)調(diào)用拼寫(xiě)驗(yàn)證器,然后將校正后的文本嵌入到嚴(yán)格的模板提示中:
Python
# Concise Prompt Builder with Spelling Validation
def build_prompt(text, style_variant):
corrected_text = validate_and_correct_spelling(text)
base_prompt = f"""
Create a HIGH-QUALITY WhatsApp sticker in {style_variant} style.
OUTPUT:
- 1024x1024 transparent PNG with 8px white border
- Subject centered, balanced composition, sharp details
- Preserve original facial identity and proportions
- Match expression to sentiment of text: '{corrected_text}'
TEXT:
- Use EXACT text: '{corrected_text}' (no changes, no emojis)
- Bold comic font with black outline, high-contrast colors
- Place text in empty space (top/bottom), never covering the face
RULES:
- No hallucinated elements or decorative glyphs
- No cropping of head/face or text
- Maintain realistic but expressive look
- Ensure consistency across stickers
"""
return base_prompt.strip()風(fēng)格變體:漫畫(huà)vs皮克斯
這三種風(fēng)格模板存放于 STYLE_VARIANTS 中。前兩種為漫畫(huà)夸張化變形處理,第三種為皮克斯風(fēng)格的3D外觀。這些字符串將直接傳入提示詞生成器中,并決定最終的視覺(jué)風(fēng)格。
Python
STYLE_VARIANTS = [ "Transform into detailed caricature with slightly exaggerated facial features...", "Transform into expressive caricature with enhanced personality features...", "Transform into high-quality Pixar-style 3D animated character..."]并行生成貼紙
該項(xiàng)目的真正優(yōu)勢(shì)在于貼紙的并行生成能力。系統(tǒng)通過(guò)多線程技術(shù),使用三個(gè)獨(dú)立的API密鑰同時(shí)運(yùn)行三個(gè)生成任務(wù),從而顯著縮短了等待時(shí)間。
Python
# Generate single sticker using OpenAI GPT-image-1 with specific client (WITH TIMING)
def generate_single_sticker(input_path, output_path, text, style_variant, client_idx):
try:
start_time = time.time()
thread_id = threading.current_thread().name
print(f"[START] Thread-{thread_id}: API-{client_idx+1} generating {style_variant[:30]}... at {time.strftime('%H:%M:%S', time.localtime(start_time))}")
prompt = build_prompt(text, style_variant)
result = clients[client_idx].images.edit(
model="gpt-image-1",
image=[open(input_path, "rb")],
prompt=prompt,
# input_fidelity="high"
quality = 'medium'
)
image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)
with open(output_path, "wb") as f:
f.write(image_bytes)
end_time = time.time()
duration = end_time - start_time
style_type = "Caricature" if "caricature" in style_variant.lower() else "Pixar"
print(f"[DONE] Thread-{thread_id}: {style_type} saved as {output_path} | Duration: {duration:.2f}s | Text: '{text[:30]}...'")
return True
except Exception as e:
print(f"[ERROR] API-{client_idx+1} failed: {str(e)}")
return False
# NEW: Create stickers with custom phrase (all 3 styles use the same custom text)
def create_custom_stickers_parallel(photo_file, custom_text):
print(f"\nCreating 3 stickers with your custom phrase: '{custom_text}'")
print(" ? Style 1: Caricature #1")
print(" ? Style 2: Caricature #2")
print(" ? Style 3: Pixar Animation")
# Map futures to their info
tasks_info = {}
with ThreadPoolExecutor(max_workers=3, thread_name_prefix="CustomSticker") as executor:
start_time = time.time()
print(f"\n[PARALLEL START] Submitting 3 API calls SIMULTANEOUSLY at {time.strftime('%H:%M:%S', time.localtime(start_time))}")
# Submit ALL tasks at once (non-blocking) - all using the same custom text
for idx, style_variant in enumerate(STYLE_VARIANTS):
output_name = f"custom_sticker_{idx+1}.png"
future = executor.submit(generate_single_sticker, photo_file, output_name, custom_text, style_variant, idx)
tasks_info[future] = {
'output_name': output_name,
'text': custom_text,
'style_variant': style_variant,
'client_idx': idx,
'submit_time': time.time()
}
print("All 3 API requests submitted! Processing as they complete...")
completed = 0
completion_times = []
# Process results as they complete
for future in as_completed(tasks_info.keys(), timeout=180):
try:
success = future.result()
task_info = tasks_info[future]
if success:
completed += 1
completion_time = time.time()
completion_times.append(completion_time)
duration = completion_time - task_info['submit_time']
style_type = "Caricature" if "caricature" in task_info['style_variant'].lower() else "Pixar"
print(f"[{completed}/3] {style_type} completed: {task_info['output_name']} "
f"(API-{task_info['client_idx']+1}, {duration:.1f}s)")
else:
print(f"Failed: {task_info['output_name']}")
except Exception as e:
task_info = tasks_info[future]
print(f"Error with {task_info['output_name']} (API-{task_info['client_idx']+1}): {str(e)}")
total_time = time.time() - start_time
print(f"\n [FINAL RESULT] {completed}/3 custom stickers completed in {total_time:.1f} seconds!")
# UPDATED: Create 3 stickers in PARALLEL (using as_completed)
def create_category_stickers_parallel(photo_file, category):
if category not in PHRASE_CATEGORIES:
print(f" Category '{category}' not found! Available: {list(PHRASE_CATEGORIES.keys())}")
return
# Choose 3 unique phrases for 3 stickers
chosen_phrases = random.sample(PHRASE_CATEGORIES[category], 3)
print(f" Selected phrases for {category.title()} category:")
for i, phrase in enumerate(chosen_phrases, 1):
style_type = "Caricature" if i <= 2 else "Pixar Animation"
print(f" {i}. [{style_type}] '{phrase}' → API Key {i}")
# Map futures to their info
tasks_info = {}
with ThreadPoolExecutor(max_workers=3, thread_name_prefix="StickerGen") as executor:
start_time = time.time()
print(f"\n [PARALLEL START] Submitting 3 API calls SIMULTANEOUSLY at {time.strftime('%H:%M:%S', time.localtime(start_time))}")
# Submit ALL tasks at once (non-blocking)
for idx, (style_variant, text) in enumerate(zip(STYLE_VARIANTS, chosen_phrases)):
output_name = f"{category}_sticker_{idx+1}.png"
future = executor.submit(generate_single_sticker, photo_file, output_name, text, style_variant, idx)
tasks_info[future] = {
'output_name': output_name,
'text': text,
'style_variant': style_variant,
'client_idx': idx,
'submit_time': time.time()
}
print("All 3 API requests submitted! Processing as they complete...")
print(" ? API Key 1 → Caricature #1")
print(" ? API Key 2 → Caricature #2")
print(" ? API Key 3 → Pixar Animation")
completed = 0
completion_times = []
# Process results as they complete (NOT in submission order)
for future in as_completed(tasks_info.keys(), timeout=180): # 3 minute total timeout
try:
success = future.result() # This only waits until ANY future completes
task_info = tasks_info[future]
if success:
completed += 1
completion_time = time.time()
completion_times.append(completion_time)
duration = completion_time - task_info['submit_time']
style_type = "Caricature" if "caricature" in task_info['style_variant'].lower() else "Pixar"
print(f"[{completed}/3] {style_type} completed: {task_info['output_name']} "
f"(API-{task_info['client_idx']+1}, {duration:.1f}s) - '{task_info['text'][:30]}...'")
else:
print(f"Failed: {task_info['output_name']}")
except Exception as e:
task_info = tasks_info[future]
print(f"Error with {task_info['output_name']} (API-{task_info['client_idx']+1}): {str(e)}")
total_time = time.time() - start_time
print(f"\n[FINAL RESULT] {completed}/3 stickers completed in {total_time:.1f} seconds!")
if len(completion_times) > 1:
fastest_completion = min(completion_times) - start_time
print(f"Parallel efficiency: Fastest completion in {fastest_completion:.1f}s")在這里,generate_single_sticker() 函數(shù)負(fù)責(zé)構(gòu)建提示詞并調(diào)用圖像編輯接口,其參數(shù) client_idx 用于指定特定的API客戶端。并行處理層通過(guò)創(chuàng)建最大工作線程數(shù)為3的 ThreadPoolExecutor 線程池,同步提交全部生成任務(wù),并借助 as_completed 方法對(duì)完成結(jié)果進(jìn)行實(shí)時(shí)收集與處理。這一機(jī)制確保了每個(gè)貼紙生成完成后均可被腳本立即記錄。
系統(tǒng)日志將完整追蹤各線程的執(zhí)行狀態(tài),詳細(xì)記錄任務(wù)耗時(shí)及所應(yīng)用的風(fēng)格類(lèi)型(夸張漫畫(huà)或皮克斯風(fēng)格),為運(yùn)行狀態(tài)監(jiān)控與效果分析提供全面支持。
主執(zhí)行塊
在腳本的底部,__main__保護(hù)塊默認(rèn)運(yùn)行sticker_from_camera()。不過(guò),可以根據(jù)需要注釋或取消注釋相關(guān)代碼,以運(yùn)行interactive_menu()、create_all_category_stickers()或其他函數(shù)。
Python
# Main execution
if __name__ == "__main__":
sticker_from_camera()輸出視頻:

https://cdn.analyticsvidhya.com/wp-content/uploads/2025/10/final_op_stickerverse_1.mp4
輸出圖像:

有關(guān)這個(gè)WhatsApp貼紙生成器代碼的完整版本,可以訪問(wèn)這個(gè)GitHub存儲(chǔ)庫(kù)。
結(jié)論
本文詳細(xì)介紹了如何配置GPT-Image-1調(diào)用、構(gòu)建貼紙生成提示、通過(guò)拍攝或上傳獲取圖像、選擇預(yù)設(shè)趣味短語(yǔ)或輸入自定義文本,并實(shí)現(xiàn)三種風(fēng)格變體的并行生成。整個(gè)項(xiàng)目?jī)H用數(shù)百行代碼,即可將普通照片轉(zhuǎn)化為漫畫(huà)風(fēng)格的個(gè)性化貼紙。
通過(guò)將OpenAI視覺(jué)模型與創(chuàng)意提示工程及多線程技術(shù)相結(jié)合,用戶能在數(shù)秒內(nèi)生成趣味十足的個(gè)性化貼紙。最終構(gòu)建的人工智能驅(qū)動(dòng)型WhatsApp貼紙生成器,支持一鍵生成貼紙并即時(shí)分享至所有好友與群組。現(xiàn)在就用你的精彩照片和最?lèi)?ài)的幽默短語(yǔ),開(kāi)啟專屬貼紙創(chuàng)作之旅吧!
常見(jiàn)問(wèn)題解答
Q1.人工智能驅(qū)動(dòng)的WhatsApp貼紙生成器有什么功能?
A.:它利用OpenAI的GPT-Image-1模型,將用戶上傳或拍攝的照片轉(zhuǎn)換成有趣且風(fēng)格化的WhatsApp貼紙,并可添加文字。
Q2.為什么GPT-Image-1比其他圖像模型更優(yōu)秀?
A:GPT-Image-1在處理文本準(zhǔn)確性和面部表情方面優(yōu)于Gemini、Flux或Phoenix等模型,確保貼紙上的文字準(zhǔn)確無(wú)誤并且視覺(jué)效果富有表現(xiàn)力。
Q3.腳本如何加快貼紙生成速度?
A:它使用三個(gè)OpenAI API密鑰和一個(gè)ThreadPoolExecutor來(lái)并行生成三個(gè)貼紙,從而縮短了處理時(shí)間。
原文標(biāo)題:Create an AI-Powered WhatsApp Sticker Generator using Python
作者:Vipin Vashisth文章鏈接:https://www.analyticsvidhya.com/blog/2025/10/whatsapp-sticker-generator/




































