マルチモーダル入力

Claudeはテキストだけでなく、画像やPDFも入力として受け取れます。

画像入力

base64エンコード

import anthropic
import base64
from pathlib import Path

client = anthropic.Anthropic()

# 画像をbase64に変換
image_data = base64.standard_b64encode(Path("photo.jpg").read_bytes()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data,
                    }
                },
                {
                    "type": "text",
                    "text": "この画像に何が写っていますか？"
                }
            ]
        }
    ]
)

URL指定

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/image.png"
                    }
                },
                {
                    "type": "text",
                    "text": "この画像を説明してください"
                }
            ]
        }
    ]
)

対応フォーマット

形式	media_type
JPEG	`image/jpeg`
PNG	`image/png`
GIF	`image/gif`
WebP	`image/webp`

複数画像の比較

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {"type": "base64", "media_type": "image/png", "data": image1_b64}
                },
                {
                    "type": "image",
                    "source": {"type": "base64", "media_type": "image/png", "data": image2_b64}
                },
                {
                    "type": "text",
                    "text": "この2つのUIデザインの違いを分析してください"
                }
            ]
        }
    ]
)

PDF入力

import base64
from pathlib import Path

pdf_data = base64.standard_b64encode(Path("report.pdf").read_bytes()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "このPDFの内容を要約してください"
                }
            ]
        }
    ]
)

PDF処理の制限

最大100ページまで対応
1ページあたり約1,600トークン消費
スキャン画像のPDFも処理可能（OCR的に読む）

ユースケース例

1. UI/デザインレビュー

content = [
    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot_b64}},
    {"type": "text", "text": (
        "このWebアプリのスクリーンショットをレビューしてください。\n"
        "- アクセシビリティの問題\n"
        "- UIの一貫性\n"
        "- 改善提案"
    )}
]

2. グラフ・チャートの分析

content = [
    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": chart_b64}},
    {"type": "text", "text": "このグラフのトレンドと主要なデータポイントを解説してください"}
]

3. コードのスクリーンショット解析

content = [
    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": code_screenshot_b64}},
    {"type": "text", "text": "このコードのスクリーンショットをテキストに変換し、バグがあれば指摘してください"}
]

4. 請求書・帳票の読み取り

content = [
    {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": invoice_b64}},
    {"type": "text", "text": (
        "この請求書から以下を抽出してJSON形式で返してください:\n"
        '{"company": string, "date": string, "total": number, "items": [{name, quantity, price}]}'
    )}
]

ベストプラクティス

画像は指示テキストの前に配置する（Claudeが画像を先に見てから指示を読む）
高解像度の画像はトークン消費が大きい → 必要に応じてリサイズ
テキストが読めるスクリーンショットは、テキスト抽出してからテキストとして渡す方がトークン効率がいい場合もある

参考リンク

Vision Guide — 画像入力の公式ガイド
PDF Support — PDF処理の詳細
Anthropic Cookbook — マルチモーダル関連レシピ