AutoGen (Microsoft): Multi-Agent Conversations and Code Generation

LangGraph gives you graphs. CrewAI gives you teams. AutoGen gives you conversations.

Microsoft’s AutoGen framework was rebuilt from scratch in 2025 and is nothing like the original. The new architecture drops the old autobuild system for a clean agent-as-program model — agents are regular Python objects that communicate through typed messages. It shipped native Azure AI integration, enterprise security, and a code execution sandbox that doesn’t require Docker.

47,000+ GitHub stars (before the rebuild) and a second life after the rewrite. AutoGen is Microsoft’s bet on multi-agent conversations as the primary interaction pattern for AI systems.

What Is AutoGen?#

AutoGen is a multi-agent conversation framework from Microsoft Research. Agents communicate through structured messages — think of it as a messaging platform where each participant is an AI agent with specific capabilities.

1
from autogen_agent import Agent, ChatAgent, ToolAgent
2
from autogen_runtime import Runtime, GroupChat
3

4
class DataAnalyst(ChatAgent):
5
    def __init__(self):
6
        super().__init__(name="analyst")
7
        self.system_prompt = "You analyze data and create visualizations"
8
        self.tools = [query_database, generate_chart]
9

10
class CodeReviewer(ChatAgent):
11
    def __init__(self):
12
        super().__init__(name="reviewer")
13
        self.system_prompt = "You review code for bugs and security issues"
14
        self.tools = [run_linter, check_security]

Agents register with a Runtime and participate in a GroupChat. Messages flow between agents based on routing rules or designated roles.

The Rebuild: What Changed in 2025#

The original AutoGen (2023-2024) was complex. The autobuild system generated agents dynamically, debugging was painful, and state management was implicit. The 2025 rewrite fixed every major complaint:

Dimension	Original AutoGen	AutoGen 2025
Agent model	Autobuild (dynamic)	Agent-as-program (explicit)
Communication	Implicit message passing	Typed messages, declared contracts
State	Hidden in agent internals	Explicit state on Runtime
Debugging	Black box	Full execution traces
Code execution	Docker required	Built-in sandbox (no Docker)
Azure integration	Manual	Native Azure AI + Entra ID
Memory	None	Persistent across sessions

The rebuild was controversial — some existing users had to rewrite their agents. But production teams report significantly fewer runtime surprises.

Core Concepts#

Agents#

Every agent inherits from ChatAgent:

1
class CustomerSupportAgent(ChatAgent):
2
    def __init__(self):
3
        super().__init__(name="support")
4
        self.system_prompt = """You are an L1 support agent.
5
        Handle common issues:
6
        - Password reset: use reset_password tool
7
        - Billing: escalate to billing_agent
8
        - Technical: escalate to tech_agent
9
        """
10
        self.tools = [reset_password, lookup_account, search_kb]
11

12
    async def on_message(self, msg: Message, ctx: Context):
13
        if msg.intent == "billing":
14
            await ctx.send_to("billing_agent", msg)
15
        elif msg.intent == "technical":
16
            await ctx.send_to("tech_agent", msg)
17
        else:
18
            result = await self.process_with_tools(msg)
19
            await ctx.reply(result)

Key methods:

on_message — Called when the agent receives a message
process_with_tools — Runs the LLM with available tools
ctx.send_to — Routes a message to another agent
ctx.reply — Sends a response back

Typed Messages#

Messages have typed fields, not freeform text:

1
@dataclass
2
class SupportTicket(Message):
3
    ticket_id: str
4
    customer_email: str
5
    issue_type: str  # "billing" | "technical" | "account"
6
    description: str
7
    priority: int

This is a significant difference from CrewAI’s freeform text passing. Typed messages mean:

Agents can validate message structure
Routing decisions are based on structured fields, not text parsing
Debugging is easier — you can inspect exact message schemas

GroupChat#

The GroupChat manages message routing:

1
chat = GroupChat(
2
    agents=[triage, billing, tech, escalation],
3
    routing="round_robin",  # or "role_based", "broadcast"
4
    max_turns=10,
5
    admin_agent=escalation  # can override routing
6
)

Routing modes:

Mode	Behavior	Use Case
`round_robin`	Each agent speaks in turn	Debates, brainstorming
`role_based`	Messages routed by `to` field	Customer support, workflows
`broadcast`	All agents receive all messages	Information sharing
`custom`	User-defined routing function	Complex orchestration

Code Execution Sandbox#

AutoGen’s code execution is its standout feature. The 2025 version includes a built-in sandbox — no Docker required:

1
from autogen_code import CodeExecutionAgent
2

3
coder = CodeExecutionAgent(
4
    name="code_runner",
5
    language="python",
6
    sandbox="built-in",  # or "docker"
7
    timeout=30,
8
    max_output_size=10000
9
)

The built-in sandbox works by:

Creating an isolated process per execution
Restricting filesystem access to a temp directory
Limiting network access (configurable allowlist)
Enforcing memory and CPU limits
Killing processes that exceed timeouts

For teams that prefer Docker isolation, the Docker sandbox is still available and unchanged.

Azure Integration#

AutoGen is the only framework with native Azure AI integration. This matters for Microsoft shops:

1
from autogen_azure import AzureRuntime
2

3
runtime = AzureRuntime(
4
    model="gpt-4o",
5
    deployment="my-deployment",
6
    endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
7
    auth="entra_id",
8
    content_filter=True
9
)

Features:

Entra ID authentication — No API keys in code
Content filtering — Azure AI content safety built in
Audit logging — All agent activity logged to Azure Monitor
Managed identity — No credential management
Private networking — Agents can access VNet-protected resources

For enterprises already on Azure, this integration alone justifies AutoGen over other frameworks.

Production Patterns#

Pattern 1: Code Generation & Review Loop#

AutoGen excels at code generation tasks because of its built-in code execution sandbox:

1
specialist = ChatAgent(name="specialist", tools=[write_code])
2
reviewer = ChatAgent(name="reviewer", tools=[run_tests, check_style])
3
runner = CodeExecutionAgent(name="runner")
4

5
chat = GroupChat(
6
    agents=[specialist, reviewer, runner],
7
    routing="role_based"
8
)

The workflow:

Specialist generates code
Runner executes it in sandbox
Reviewer checks output and style
If tests fail, specialist retries
Loop until all tests pass

Pattern 2: Enterprise Support Pipeline#

A Microsoft enterprise customer runs this in production across 20,000+ daily tickets:

1
agents = [
2
    TriageAgent(name="triage", tools=[lookup_account, search_kb]),
3
    BillingAgent(name="billing", tools=[query_invoices, process_refund]),
4
    TechAgent(name="tech", tools=[check_logs, run_diagnostics]),
5
    EscalationManager(name="escalation", human_handoff=True)
6
]
7

8
chat = GroupChat(
9
    agents=agents,
10
    routing="role_based",
11
    max_turns=5,
12
    admin_agent=escalation
13
)

The triage agent routes by issue_type field. If no agent resolves within 5 turns, the escalation manager takes over with human handoff.

AutoGen vs LangGraph vs CrewAI#

Dimension	AutoGen	LangGraph	CrewAI
Core metaphor	Conversations	Graphs	Roles
Message model	Typed messages	Shared state	Freeform text
Code execution	Built-in sandbox	External tools	External tools
Azure native	Yes	No	No
Learning curve	Medium	High	Low
Best for	Code gen, enterprise support	Complex workflows, compliance	Rapid prototyping, pipelines

Limitations#

Heavy Azure dependency — Features like content filtering and managed identity only work with Azure. OpenAI/Bedrock users miss most enterprise features.
Typed messages require upfront planning — You define message schemas before agents. This is good for correctness but slows prototyping.
GroupChat can deadlock — Poor routing rules or conflicting agent instructions can cause infinite loops. The max_turns limit is essential.
Smaller ecosystem — Fewer community tools and integrations compared to LangChain. You’ll write more custom code.

Next in the Series#

Post	Framework
1	LangGraph
2	CrewAI
3	AutoGen (this)
4	Claude Agent SDK
5	OpenAI Agents SDK
6	Head-to-Head Comparison

Series: AI Agent Frameworks 2026 — Production Comparison. Post 3: AutoGen. Post 4: Claude Agent SDK → coming next.