Structured Outputs with Pydantic
Structured Outputs with Pydantic
Large language models return text. In production you usually need structured data: JSON you can store, pass to APIs, or load into typed objects. Parsing free-form prose with regular expressions or ad hoc prompts is fragile and easy to break when the model phrasing shifts.
CrewAI lets you attach a Pydantic schema to a task so the crew validates and materializes the model’s answer into a BaseModel instance or a plain dictionary—so downstream code reads stable fields instead of hunting through strings.
output_pydantic on a Task
Define a model that matches the shape you want. Pass it as output_pydantic on the Task:
from pydantic import BaseModel
class BlogPost(BaseModel):
title: str
content: str
tags: list[str]
task = Task(
description="Write a blog post about AI agents",
expected_output="A blog post with title, content, and tags",
agent=writer,
output_pydantic=BlogPost,
)After result = crew.kickoff(), read validated fields on the crew result:
result.pydantic.titleresult.pydantic.tags
You get a real Pydantic instance (IDE autocomplete, .model_dump(), nested models, validators).
output_json alternative
If you pass the same model to output_json instead of output_pydantic, CrewAI still uses the schema to shape the output but gives you a plain dict rather than a model instance:
task = Task(
description="Write a blog post about AI agents",
expected_output="A blog post with title, content, and tags",
agent=writer,
output_json=BlogPost,
)Access fields with normal dict lookups, for example result.json_dict["title"].
Nested and richer models
Real outputs are often hierarchical. Pydantic composes naturally:
from pydantic import BaseModel, Field
class Citation(BaseModel):
source: str
url: str
class Section(BaseModel):
heading: str
body: str
class ResearchBrief(BaseModel):
topic: str
summary: str
sections: list[Section]
citations: list[Citation] = Field(default_factory=list)Use ResearchBrief as output_pydantic (or output_json) on a research-style task; nested lists and models are validated end to end.
When to use output_pydantic vs output_json
output_pydantic: Prefer this when you want type safety, Pydantic validators, and methods likemodel_dump()/model_dump_json(). Best when the rest of your Python code already works with models.output_json: Prefer this when you only need a flexible bag of data (dict) for logging, JSON APIs, or dynamic keys, and you do not need a liveBaseModelinstance.
In both cases the schema still guides the LLM and reduces unstructured drift compared to a single unstructured string.
Key takeaways
- Raw LLM text is hard to rely on; schemas turn answers into data you can depend on.
output_pydanticyields a validated model; useresult.pydantic.<field>.output_jsonuses the same schema but returns a dict; useresult.json_dict["field"].- Nested Pydantic models scale to sections, citations, and other structured reports.
- Choose Pydantic for typed Python workflows and JSON when a dictionary is enough.