# Multimodal & Content Ingestion

> Handling rich media: using from_file for automatic MIME resolution and direct constructor instantiation for in-memory bytes (Image, Video, Document).

- Repository: google-antigravity/antigravity-sdk-python
- GitHub: https://github.com/google-antigravity/antigravity-sdk-python
- Human wiki: https://grok-wiki.com/public/wiki/google-antigravity-antigravity-sdk-python-2abd361a7867
- Complete Markdown: https://grok-wiki.com/public/wiki/google-antigravity-antigravity-sdk-python-2abd361a7867/llms-full.txt

## Source Files

- `google/antigravity/types.py`
- `examples/getting_started/multimodal.py`
- `examples/deep_dives/multimodal_pipeline.py`
- `examples/resources/sample_doc.txt`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [google/antigravity/types.py](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/google/antigravity/types.py)
- [examples/getting_started/multimodal.py](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/examples/getting_started/multimodal.py)
- [examples/deep_dives/multimodal_pipeline.py](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/examples/deep_dives/multimodal_pipeline.py)
- [examples/resources/sample_doc.txt](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/examples/resources/sample_doc.txt)
</details>

# Multimodal & Content Ingestion

The Google Antigravity SDK provides first-class support for multimodal interactions, allowing agents to process and generate rich media such as images, videos, audio, and documents. By encapsulating raw bytes with semantic metadata, the SDK ensures that models receive content in a format they can accurately interpret without leaking implementation details like file paths or temporary storage locations.

This system is built around a set of immutable media primitives that handle validation and MIME-type resolution automatically. Whether you are loading assets from disk or streaming in-memory bytes, the SDK provides a unified interface for content ingestion that remains portable across different model providers.

## Media Content Primitives

The core of the ingestion system is a hierarchy of media classes derived from a common base. These classes ensure that the data passed to the agent is correctly typed and supported by the underlying model's capabilities.

### Available Media Classes

| Class | Description | Supported Formats (Examples) |
| :--- | :--- | :--- |
| `Image` | Static visual data. | PNG, JPEG, WEBP, BMP |
| `Video` | Motion visual data. | MP4, WEBM, MOV, AVI |
| `Audio` | Sound data. | MP3, WAV, OGG, FLAC |
| `Document` | Structured or unstructured text/data files. | PDF, TXT, JSON, CSV, HTML |

Sources: [google/antigravity/types.py:1007-1053](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/google/antigravity/types.py#L1007-L1053)

### Class Hierarchy

```mermaid
classDiagram
    class _BaseMedia {
        +bytes data
        +str mime_type
        +str description
        +from_file(path, description)
    }
    class Image {
        +validate_mime_type(v)
    }
    class Video {
        +validate_mime_type(v)
    }
    class Audio {
        +validate_mime_type(v)
    }
    class Document {
        +validate_mime_type(v)
    }
    _BaseMedia <|-- Image
    _BaseMedia <|-- Video
    _BaseMedia <|-- Audio
    _BaseMedia <|-- Document
```

## Content Ingestion Methods

There are two primary ways to ingest content: automatic resolution from local files and direct manual instantiation for in-memory data.

### 1. Automatic Resolution with `from_file`

The `types.from_file` function is the recommended way to load local assets. It automatically detects the file's MIME type using standard system utilities and returns the appropriate specialized class instance.

```python
from google.antigravity import types

# The SDK detects this is a PNG and returns an Image object
media = types.from_file("path/to/screenshot.png", description="Application error state")

# You can also call it on the specific class if the type is known
image = types.Image.from_file("path/to/image.jpg")
```

Sources: [google/antigravity/types.py:1072-1103](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/google/antigravity/types.py#L1072-L1103)

### 2. Direct Constructor Instantiation

For in-memory bytes (e.g., from a network request, a database, or a dynamically generated buffer), you can instantiate the classes directly. This requires specifying the MIME type explicitly, which prevents the agent from receiving ambiguous data.

```python
from google.antigravity import types

# Example: Loading bytes from an existing file handle
with open("report.pdf", "rb") as f:
    pdf_bytes = f.read()

# Direct instantiation for in-memory data
doc = types.Document(
    data=pdf_bytes, 
    mime_type="application/pdf", 
    description="Q3 Financial Report"
)
```

Sources: [examples/deep_dives/multimodal_pipeline.py:158-164](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/examples/deep_dives/multimodal_pipeline.py#L158-L164)

## Integrating with the Agent

Multimodal primitives are passed to the `Agent.chat` method as part of a `Content` list or as a single primitive. The agent processes these alongside text prompts to provide contextual responses.

```python
async with Agent(config) as my_agent:
    image = types.Image.from_file("example_image.png")
    doc = types.Document.from_file("sample_doc.txt")
    
    # Passing mixed content types in a single turn
    prompt = ["Analyze the chart in this image based on the data in this doc.", image, doc]
    response = await my_agent.chat(prompt)
    print(await response.text())
```

Sources: [examples/getting_started/multimodal.py:48-62](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/examples/getting_started/multimodal.py#L48-L62)

## Summary

Multimodal ingestion in the Antigravity SDK simplifies complex media handling into a few predictable patterns. By using `from_file` for local assets and direct constructors for buffers, developers can build agents that "see," "hear," and "read" with minimal boilerplate. This architecture maintains provider neutrality by focusing on standard MIME types and raw byte payloads, ensuring that your multimodal logic remains portable across different agent backends.

Sources: [google/antigravity/types.py:1055-1069](file:///var/folders/_h/3ssd63kx3_j7pxq42wmfq5980000gn/T/grok-wiki-local-cli-workspace-7VAz95/repo/google/antigravity/types.py#L1055-L1069)
