Continuum Suite · v1.0.0

Continuum

The agent platform. Build production AI agents on the open Framework, route every call through Aura, and govern the whole system with Provenance.

pip install shyftlabs-continuum Explore the Framework → View on GitHub

One platform · three layers

Continuum is a product suite, not a single library. The open Framework builds and runs your agents, Aura routes every call to the right model, and Provenance keeps the whole system governed, safe, and auditable.

01 · Framework

Continuum

The open-source agent runtime. BaseAgent, 9 composable workflow patterns, durable Temporal jobs, two-tier memory, MCP-native tools, and time-travel decision traces.

Open sourceExplore →

02 · Inference

Aura

The Smart Inference layer. One OpenAI-compatible endpoint to 250+ models across 45+ providers; a classifier picks the cheapest model that clears the quality bar, per prompt.

ProprietaryExplore →

03 · Governance

Provenance

Prompt management, guardrails, PII redaction, and policy. The governance & security layer that makes every agent decision observable, safe, and compliant.

Coming soonPreview →

250+

Models routed

45+

LLM providers

Workflow patterns

100%

Auto-traced

From notebook to production

Three layers, one workflow. Build your agents, let Aura route every model call, and let Provenance keep the whole system governed and audit-ready.

01 · Framework

Build

Define agents with BaseAgent, compose nine workflow patterns, attach MCP tools and two-tier memory.

02 · Aura

Route

One OpenAI-compatible endpoint. A classifier picks the cheapest model that clears the quality bar, per prompt.

03 · Provenance

Govern

Prompt versioning, guardrails, PII redaction, and policy, so every decision is observable and audit-ready.

Up and running in minutes

Install the open-source framework and ship your first agent in a few lines. Aura and Provenance plug in when you need them.

# pip install shyftlabs-continuum
from continuum.agent import BaseAgent
from continuum.agent.runner import AgentRunner

agent = BaseAgent(name="assistant", instructions="You are helpful.", model="gpt-4o-mini")
response = await AgentRunner().run(agent, "Hello!")
print(response.content)

Build on Continuum

Open-source framework today. Proprietary inference and governance when you scale.

pip install shyftlabs-continuum Explore the Framework → GitHub

Continuum Suite · Open Source · v1.0.0

Continuum Framework

The open-source agent runtime for builders who ship. Production-grade reasoning, durable multi-agent workflows, two-tier memory, and full observability, out of the box.

pip install shyftlabs-continuum Python 3.13 250+ models via Aura Temporal · durable workflows Langfuse · auto-traced 9 multi-agent patterns

What you get

◆

Aura · Smart Inference routing

One OpenAI-compatible endpoint, 250+ models. A classifier scores every prompt; the router picks the cheapest model that clears the quality bar. Per-agent strict / modest / quality tiers.

▣

Two-tier persistent memory

Long-term semantic recall via mem0 + Milvus/Qdrant; short-term session in Redis. Four isolation scopes (USER, AGENT, SHARED, CONVERSATION) for multi-tenant safety.

▤

9 composable workflow patterns

Sequential · Parallel · Loop · Reflection · Router · Planner · Debate · Scatter · SupervisedSequential. Mix and nest freely, every step is independently traceable.

◇

MCP-native tooling

Stdio, SSE, or StreamableHTTP, connect any MCP server with zero adapters. Top-k Tool Attention promotes only the relevant tools each turn, cutting prompt tokens 30–60%.

◐

Durable workflows · Temporal

Long-running multi-agent jobs that survive crashes, restarts, and deploys. Human-in-the-loop approval gates with timeout escalation, signals, and full audit trails.

▦

Observability is non-negotiable

Every LLM call, tool invocation, handoff, and memory op auto-traced in Langfuse. Custom spans via @observe. Build golden eval sets from production traces.

◈

Safe by default

Input/output PII redaction. Configurable scrubbers on memory writes. Cycle detection on agent handoffs. Graceful shutdown with in-flight trace flush. Bearer-scoped budgets at the gateway.

▢

Production primitives

Dependency-injection container, health checks for every dependency, lifecycle hooks, FastAPI helpers, fakeredis for tests. Async-native, protocol-based, code-first.

Get started

Install the open-source framework from PyPI and have your first agent running in a few lines. Full setup, the environment reference, and the end-to-end walkthrough live under Build Agents.

# Python 3.13
pip install shyftlabs-continuum

Build your first agent → View on GitHub

Build Agents

From a minimal single agent to complex multi-agent workflows, everything starts with BaseAgent.

Single-agent example: playground/gateway-local-shop, one agent with MCP tools over HTTP.

Multi-agent example: playground/gateway-multi-agent-shop.

Installation & setup

Install
The fastest path, install the released package straight from PyPI:

# Python 3.13 required
python3.13 -m venv .venv
source .venv/bin/activate
pip install shyftlabs-continuum                 # latest release (v1.0.0)
pip install "shyftlabs-continuum[temporal,eval]"    # optional extras

Prefer to work from source? Clone the repository and install it editable:

git clone https://github.com/shyftlabs/continuum.git
cd continuum
python3.13 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"                          # add [temporal] / [eval] as needed

Setup environment and spin up infrastructure

cp .env.template .env

Next, configure your .env file:

# ── LLM Provider Keys ────────────────────────────────────────────────────────
OPENAI_API_KEY=your-openai-api-key        # optional 
GEMINI_API_KEY=your-gemini-api-key        # optional
# ANTHROPIC_API_KEY=your-anthropic-api-key  # optional

# ── Default LLM ──────────────────────────────────────────────────────────────
DEFAULT_LLM_MODEL=gemini/gemini-2.5-flash
FALLBACK_LLM_MODEL=gpt-4o-mini
DEFAULT_LLM_TEMPERATURE=0.7
DEFAULT_LLM_MAX_TOKENS=4096
LLM_REQUEST_TIMEOUT=300
LLM_MAX_RETRIES=3
LLM_ENABLE_FALLBACK=true

# ── Embeddings ────────────────────────────────────────────────────────────────
EMBEDDER_PROVIDER=openai
EMBEDDER_MODEL=openai/text-embedding-3-small  # use provider/model prefix when routing through gateway
EMBEDDING_DIMS=1536
# EMBEDDER_API_KEY=  # explicit key (falls back to SMART_GATEWAY_API_KEY when EMBEDDER_API_BASE is set)
EMBEDDER_API_BASE=https://continuum.shyftops.io/v1

# ── Memory (mem0 + Milvus) ───────────────────────────────────────────────────
MEMORY_ENABLED=true
VECTOR_STORE_PROVIDER=milvus
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=orchestrator_memories_openai
MILVUS_HOST=localhost
MILVUS_PORT=19530
MILVUS_TOKEN=
MILVUS_COLLECTION=orchestrator_memories
MEMORY_LLM_MODEL=gemini/gemini-2.5-flash
MEMORY_LLM_TEMPERATURE=0.1
MEMORY_ISOLATION=user
MEMORY_SEARCH_LIMIT=5
MEMORY_HISTORY_DB_PATH=~/.orchestrator/memory_history.db

# ── Session (Redis) ───────────────────────────────────────────────────────────
SESSION_ENABLED=true
SESSION_REDIS_HOST=localhost
SESSION_REDIS_PORT=6380
SESSION_REDIS_PASSWORD=sdk123456789
SESSION_REDIS_DB=0
SESSION_REDIS_SSL=false
SESSION_TTL_SECONDS=172800
SESSION_MAX_MESSAGES=1000
SESSION_KEY_PREFIX=orchestrator:session

# ── Langfuse Observability ────────────────────────────────────────────────────
LANGFUSE_ENABLED=true
LANGFUSE_PUBLIC_KEY=your-langfuse-public-key
LANGFUSE_SECRET_KEY=your-langfuse-secret-key
LANGFUSE_HOST=http://localhost:3000
LANGFUSE_BASE_URL=http://localhost:3000
NEXTAUTH_SECRET=your-nextauth-secret-here
LANGFUSE_SAMPLE_RATE=1.0
LANGFUSE_FLUSH_INTERVAL=1
LANGFUSE_FLUSH_AT=15
LANGFUSE_DEBUG=false

# ── Temporal (durable workflows) ─────────────────────────────────────────────
TEMPORAL_ENABLED=true
TEMPORAL_HOST=localhost:7233
TEMPORAL_NAMESPACE=default
TEMPORAL_TASK_QUEUE=orchestrator-agents
TEMPORAL_ENABLE_HUMAN_IN_LOOP=true
TEMPORAL_APPROVAL_TIMEOUT_SECONDS=86400
TEMPORAL_WORKFLOW_EXECUTION_TIMEOUT=604800
TEMPORAL_ACTIVITY_START_TO_CLOSE_TIMEOUT=300
TEMPORAL_ACTIVITY_RETRY_MAX_ATTEMPTS=3

# ── Smart Gateway ─────────────────────────────────────────────────────────────
SMART_GATEWAY_URL=https://continuum.shyftops.io/v1
SMART_GATEWAY_API_KEY=your-smart-gateway-api-key

# ── Misc ──────────────────────────────────────────────────────────────────────
ENVIRONMENT=development
LOG_LEVEL=INFO
SHARED_SERVICES_ENABLED=true
MEM0_TELEMETRY=false
ANONYMIZED_TELEMETRY=false
TOKENIZERS_PARALLELISM=false

Finally, spin up the infrastructure:

docker compose up -d       # Redis (:6380) + Milvus (:19530)

Create and run your first agent

import asyncio
from continuum.agent import BaseAgent
from continuum.agent.runner import AgentRunner

agent = BaseAgent(
    name="assistant",
    instructions="You are a helpful assistant.",
    model="gpt-4o-mini",
)

async def main():
    runner = AgentRunner()
    response = await runner.run(agent, "Hello!", user_id="u-001")
    print(response.content)

asyncio.run(main())

Base Agent

from continuum.agent import BaseAgent

agent = BaseAgent(
    name="support-bot",
    instructions="You are a support agent for Acme Corp. Be concise.",
    model="gpt-4o-mini",
    temperature=0.3,
)

BaseAgent is a dataclass. Every field has a sensible default, only name and instructions are required in practice.

Parameter	Type	Description
name	str	Unique identifier (alphanumeric, hyphens, underscores)
instructions	str	System prompt. Supports `{slot}` placeholders.
model	str	LLM model string, defaults to `DEFAULT_LLM_MODEL` env var
temperature	float	Default `0.7`
max_tokens	int \| None	Response token cap
tools	list[ToolDefinition]	Static tool definitions for the LLM
mcp_servers	list[MCPServer]	MCP servers whose tools are dynamically loaded
handoffs	list[Handoff]	Agents this agent can delegate to
memory_config	AgentMemoryConfig	Long-term memory search/store behavior
config	AgentConfig	Execution config, turns, timeouts, retries, react_mode
output_schema	type[BaseModel] \| None	Pydantic model for structured output
input_schema	type[BaseModel] \| None	Validates input before execution
template_vars	dict	Static values injected into `{slot}` placeholders
examples	list[dict]	Few-shot examples, each needs `input` and `output` keys
instruction_modifiers	list[Callable]	Dynamic prompt modifiers applied at runtime
metadata	dict	Arbitrary key-value metadata
tags	list[str]	Categorization tags for filtering / routing

Stateless vs Stateful

Based on your needs, your agent can be stateless or stateful; if stateless, disable long-term and short-term memories; if stateful, you may only need either long-term memories(Memory) or short-term memories(Session) or both, you can control them by setting the parameters in AgentConfig and AgentMemoryConfig.

# Stateless, no Redis, no mem0 calls (fastest for prototypes)
agent = BaseAgent(
    name="stateless",
    instructions="...",
    memory_config=AgentMemoryConfig(search_memories=False, store_memories=False),
    config=AgentConfig(log_to_session=False, session_history_turns=0),
)

# Stateful, loads last 20 turns + long-term memories (default)
agent = BaseAgent(
    name="stateful",
    instructions="...",
    memory_config=AgentMemoryConfig(search_memories=True, store_memories=True),
    config=AgentConfig(log_to_session=True, session_history_turns=None),
)

Memory (default):

search_memories=True, looks up long-term memories before responding
store_memories=True, saves long-term memories after responding

Session (default):

log_to_session=True, saves to session history (short-term memory)
session_history_turns=None, loads last 20 turns of history (short-term memory)

session_history_turns behaviour

Value	Behaviour
`None` (default)	Load last 20 turns from Redis
`0`	Skip Redis entirely, load nothing
`5`	Load last 5 turns from Redis

Lifecycle Hooks

Hooks are callables attached directly to BaseAgent. They receive the agent instance and the run context dict.

async def on_start(agent, context):
    print(f"Starting {agent.name}")

async def on_tool(agent, tool_name, context):
    print(f"Calling tool: {tool_name}")

agent = BaseAgent(
    name="my-agent",
    instructions="...",
    on_start=on_start,
    on_end=lambda agent, ctx: print("Done"),
    on_error=lambda agent, exc, ctx: print(f"Error: {exc}"),
    on_tool_call=on_tool,
    on_handoff=lambda agent, target, ctx: print(f"Handing off to {target}"),
)

Prompt Engineering

Template Variables

agent = BaseAgent(
    name="regional-agent",
    instructions="You serve customers in {region}. Currency: {currency}.",
    template_vars={"region": "North America", "currency": "USD"},
)

Few-shot Examples

agent = BaseAgent(
    name="classifier",
    instructions="Classify the sentiment of the input.",
    examples=[
        {"input": "I love this product!", "output": "positive"},
        {"input": "This is terrible.",  "output": "negative"},
    ],
)

Instruction Modifiers

Modifiers are called at runtime and receive (instructions: str, context: dict) → str.

def add_date(instructions, context):
    return instructions + f"\n\nToday is {context.get('date', 'unknown')}."

agent = BaseAgent(
    name="date-aware",
    instructions="You are an assistant.",
    instruction_modifiers=[add_date],
)

Structured Output

from pydantic import BaseModel

class Review(BaseModel):
    sentiment: str
    score: float
    summary: str

agent = BaseAgent(
    name="reviewer",
    instructions="Analyze the review and return structured output.",
    output_schema=Review,
)

response = await runner.run(agent, "The hotel was fantastic but expensive.")
review: Review = response.structured_output
print(review.score)  # e.g. 0.75

Multi-Agent Workflows

Continuum ships 9 built-in workflow patterns. Each has a class-based form and a factory shorthand. All patterns are composable, you can nest them or combine them with handoffs.

Sequential

Pipeline, each agent's output feeds the next

Parallel

All agents run concurrently on the same input

Loop

Iterates until a termination condition is met

Reflection

Agent runs, critic evaluates, agent retries if needed

Router

Dispatches to the most relevant specialist agent

Planner

Decomposes a goal into steps, executes each in turn

Debate

Pro/con agents argue; a judge synthesizes a verdict

Scatter

LLM splits input into slices; agents process in parallel

Custom Workflow

We strongly recommend building custom workflows using BaseAgent directly for anything closely tied to your project's business logic. A custom agent gives you full control over the flow, session saving, and memory behaviour, you decide which sub-agents are stateless and which are stateful.

Example: ParallelCoordinatorAgent in playground/multi-agent-shop/workflows.py

Instead of using ParallelAgent directly, the playground defines a custom BaseAgent subclass that orchestrates parallel search and synthesis manually:

class ParallelCoordinatorAgent(BaseAgent):
    synthesiser: BaseAgent | None = None
    parallel: ParallelAgent | None = None

    async def execute(self, input_text, runner, context, llm_client=None) -> AgentResponse:
        context.suppress_session_log = True

        # Step 1: parallel workers with a fresh stateless context (no history, no save)
        parallel_ctx = create_run_context(
            user_id=context.user_id,
            conversation_id=context.conversation_id,
        )
        parallel_result = await self.parallel.execute(input_text, runner, parallel_ctx)

        # Step 2: synthesiser uses session history + memory; suppress_session_log
        # blocks auto-save but context carries session_id so history loads normally
        final = await runner.run(
            agent=self.synthesiser,
            input=synthesis_input,
            context=context,
        )

        # Step 3: save exactly one clean turn
        await runner.save_turn(
            session_id=context.session_id,
            user_message=input_text,
            assistant_message=final.content,
        )

This gives the parallel workers a clean stateless context (no Redis calls) while the synthesiser still loads session history and memory, something the built-in ParallelAgent cannot do out of the box.

Built-in Workflows

Sequential Workflow

Executes agents as a pipeline, each agent receives the previous agent's output.

from continuum.agent.workflow import SequentialAgent

pipeline = SequentialAgent(
    name="research-pipeline",
    instructions="Research pipeline coordinator.",
    agents=[researcher, summarizer, formatter],
)

Or use the factory shorthand:

from continuum.agent import BaseAgent, AgentRunner, create_sequential_agent

researcher = BaseAgent(name="researcher", instructions="Research the topic. Output key facts.", model="gpt-4o-mini")
writer     = BaseAgent(name="writer",     instructions="Write a short report from the facts.",  model="gpt-4o-mini")
editor     = BaseAgent(name="editor",     instructions="Polish the report for clarity.",         model="gpt-4o-mini")

pipeline = create_sequential_agent(
    name="research-pipeline",
    agents=[researcher, writer, editor],
)
response = await AgentRunner().run(pipeline, "AI in healthcare")
print(response.content)  # editor's final output

Parallel Workflow

All agents receive the same input and run concurrently. Results are merged by strategy.

from continuum.agent.workflow import ParallelAgent
from continuum.agent.config import ParallelConfig, MergeStrategy

fan_out = ParallelAgent(
    name="multi-analyst",
    instructions="Run multiple analyses in parallel.",
    agents=[sentiment_agent, topic_agent, entity_agent],
    parallel_config=ParallelConfig(
        merge_strategy=MergeStrategy.LLM_SUMMARIZE,
    ),
)

MergeStrategy	Behavior
`CONCATENATE`	All outputs joined with newlines
`LLM_SUMMARIZE`	A secondary LLM call synthesizes all outputs
`STRUCTURED_DICT`	Returns a dict keyed by agent name
`FIRST_SUCCESS`	Returns the first successful agent result

Factory shorthand:

from continuum.agent import create_parallel_agent
from continuum.agent.types import MergeStrategy

parallel = create_parallel_agent(
    name="parallel-analysts",
    agents=[analyst_a, analyst_b, analyst_c],
    merge_strategy=MergeStrategy.CONCATENATE,
)

Loop Workflow

Iterates an agent until a termination condition is satisfied.

from continuum.agent.workflow import LoopAgent
from continuum.agent.config import TerminationConfig, TerminationType

loop = LoopAgent(
    name="refinement-loop",
    instructions="Iteratively refine output.",
    agent=writer_agent,
    termination=TerminationConfig(
        type=TerminationType.LLM_DECISION,  # LLM decides when done
        max_iterations=5,
    ),
)

TerminationType	When it stops
`LLM_DECISION`	The inner agent's LLM signals completion
`TOOL_CALL`	A specific tool name is called
`OUTPUT_MATCH`	Output matches a regex pattern
`MAX_ITERATIONS`	Always run exactly N times
`CUSTOM`	User-supplied callable returns `True`

Factory shorthand with OUTPUT_MATCH:

from continuum.agent import create_loop_agent
from continuum.agent.types import TerminationType

loop = create_loop_agent(
    name="refinement-loop",
    agent=refiner,
    termination_type=TerminationType.OUTPUT_MATCH,
    termination_pattern=r"\bDONE\b",
    max_iterations=3,
)

Reflection Workflow

The agent runs, a critic evaluates the output, and the agent retries if the critique is NEEDS IMPROVEMENT.

from continuum.agent.workflow import ReflectionAgent
from continuum.agent.config import ReflectionConfig

reflective = ReflectionAgent(
    name="quality-writer",
    instructions="Improve output quality via self-critique.",
    agent=writer_agent,
    reflection_config=ReflectionConfig(
        max_reflections=3,
        critique_prompt="Is this response accurate and complete? Reply APPROVED or NEEDS IMPROVEMENT.",
    ),
)

Router Workflow

Routes requests to specialist agents based on content. Three strategies: LLM, rule-based, or hybrid.

from continuum.agent.workflow import RouterAgent, Route

router = RouterAgent(
    name="triage",
    instructions="Route customer requests to the right specialist.",
    routes=[
        Route(agent_name="billing-agent",  description="Billing, payments, invoices"),
        Route(agent_name="tech-support",  description="Technical issues and bugs"),
        Route(agent_name="sales-agent",   description="Pricing and upgrades"),
    ],
    fallback_agent_name="general-agent",
)

Or use the factory with tuple-based routes (recommended, the old Route(target=...) API is removed):

from continuum.agent import BaseAgent, AgentRunner, create_router_agent

billing   = BaseAgent(name="billing-agent",   instructions="Handle billing questions.")
technical = BaseAgent(name="technical-agent", instructions="Handle technical support.")
general   = BaseAgent(name="general-agent",   instructions="Handle general questions.")

router = create_router_agent(
    name="triage",
    routes=[
        ("billing-agent",   "billing, invoice, payment, subscription, refund"),
        ("technical-agent", "bug, error, crash, not working, how to"),
    ],
    fallback="general-agent",
    strategy="hybrid",
)

runner = AgentRunner(agent_registry={
    "billing-agent": billing, "technical-agent": technical, "general-agent": general,
})
response = await runner.run(router, "My payment failed twice this week")

Planner Workflow

Decomposes a goal into sub-tasks and executes them, either with a single agent or by routing each step to a specialist from a pool.

from continuum.agent.workflow import PlannerAgent
from continuum.agent.config import PlanningConfig

# Agent-pool mode: LLM routes each step to a specialist
planner = PlannerAgent(
    name="research-planner",
    instructions="Decompose research goals into steps.",
    agents=[web_researcher, analyst, writer],
    planning_config=PlanningConfig(
        max_steps=8,
        enable_replanning=True,
    ),
)

Advanced patterns

DebateAgent

Two agents argue pro/con, a judge synthesizes a final verdict. Good for nuanced decisions.

ScatterAgent

LLM splits input into N slices; each agent processes its own slice in parallel; results merged.

SupervisedSequential

Like Sequential but an LLM quality gate checks each step's output before proceeding.

DAGAgent

Dependency-aware parallel execution. Steps with dependencies wait; independent steps run together.

Handoffs

An agent can delegate to another agent mid-conversation. Handoffs appear to the LLM as callable tools.

from continuum.agent.handoff import Handoff

triage = BaseAgent(
    name="triage",
    instructions="Triage requests and hand off to specialists.",
    handoffs=[
        Handoff(
            target_agent="billing",
            description="Transfer to billing for payment questions.",
            return_to_parent=True,
        ),
        Handoff(
            target_agent="tech-support",
            description="Transfer for technical issues.",
        ),
    ],
)

Handoff History Modes

Control how much context is passed when handing off between agents.

Mode	What's passed	Best for
`FULL`	Complete conversation history	Short conversations, full context needed
`SUMMARY`	LLM-generated abstract of the conversation	Long conversations, context window limits
`RECENT_N`	Last N turns only	When only recent context matters
`HYBRID`	Summary of older messages + full recent N turns	Best of both, default recommendation

from continuum.agent.handoff import Handoff, HistorySummarizationMode

Handoff(
    target_agent="specialist",
    description="Escalate complex issues.",
    summarization_mode=HistorySummarizationMode.HYBRID,
    recent_turns=4,
)

Run Agents

AgentRunner is the execution engine, it orchestrates LLM calls, tool invocations, memory retrieval, and handoffs.

Single-agent example: playground/gateway-local-shop, one agent with MCP tools over HTTP.

Multi-agent example: playground/gateway-multi-agent-shop

AgentRunner

Create one AgentRunner instance per application. It manages internal service clients and can be shared across concurrent runs.

from continuum.agent.runner import AgentRunner
from continuum.agent.config import RunnerConfig

runner = AgentRunner(
    config=RunnerConfig(
        default_max_turns=20,
        parallel_tool_calls=True,
        max_parallel_tools=5,
        circuit_breaker_threshold=5,
    )
)

runner.run()

response = await runner.run(
    agent,
    "What is my account balance?",
    user_id="user-123",
    session_id="sess-456",  # optional, loads Redis history
    conversation_id="conv-789",
    max_turns=15,
    metadata={"channel": "web"},
    tags=["prod"],
)

print(response.content)         # main text output
print(response.status)          # SUCCESS, ERROR, MAX_TURNS_REACHED …
print(response.usage.total_tokens)
print(response.latency_ms)
print(response.trace_id)        # Langfuse trace link

Streaming

Use run_stream() to yield tokens and events as they occur. Ideal for WebSocket or SSE endpoints.

from continuum.agent.types import EventType

async for event in runner.run_stream(agent, "Explain quantum entanglement.", user_id="u-1"):
    if event.type == EventType.CONTENT_DELTA:
        print(event.data["content"], end="", flush=True)

    elif event.type == EventType.TOOL_CALL_START:
        print(f"\n[calling tool: {event.data['tool_name']}]")

    elif event.type == EventType.RUN_END:
        print(f"\nDone, {event.data['usage']['total_tokens']} tokens")

EventType Reference

EventType	data keys
`RUN_START`	`run_id`, `agent_name`
`CONTENT_DELTA`	`content` (text chunk)
`CONTENT_COMPLETE`	`content` (full response)
`TOOL_CALL_START`	`tool_name`, `arguments`
`TOOL_CALL_END`	`tool_name`, `result`
`TOOL_CALL_ERROR`	`tool_name`, `error`
`HANDOFF_START`	`from_agent`, `to_agent`, `reason`
`HANDOFF_END`	`to_agent`, `result`
`MEMORY_RETRIEVAL`	`query`, `results`
`RUN_END`	`status`, `usage`, `latency_ms`
`RUN_ERROR`	`error`

Session History

Pass a session_id to automatically load and save conversation history from Redis.

# Turn 1, user's first message
response1 = await runner.run(agent, "My name is Alice.", session_id="sess-1", user_id="u-1")

# Turn 2, agent remembers context from Redis
response2 = await runner.run(agent, "What's my name?", session_id="sess-1", user_id="u-1")
# → "Your name is Alice."

Control how many turns are loaded with AgentConfig.session_history_turns. Set log_to_session=False in AgentConfig to disable session writes for intermediate pipeline agents.

Creating sessions with get_or_create_session()

If you want session history to persist across requests, create a session before calling runner.run(). Passing a session_id that was never created will silently fail to save or load history.

# Step 1: create or retrieve the session
session_id = await session_client.get_or_create_session(
    session_id=session_id,          # pass existing ID to resume
    user_id="user-123",
    conversation_id="conv-456",   # optional, see below
)

# Step 2: run with that session_id
response = await runner.run(
    agent=agent,
    input="Hello!",
    session_id=session_id,
    user_id="user-123",
)

How session_id is computed

get_or_create_session() derives a deterministic key from the arguments you pass:

Arguments passed	Computed session_id
explicit `session_id`	used as-is
`conversation_id` + `user_id`	`c:{conversation_id}:u:{user_id}`
`user_id` only	`u:{user_id}`
neither	random UUID

When to use conversation_id

Use conversation_id when a single user can have multiple independent chat windows. Without it, all conversations for a user share one session (u:{user_id}). With it, each window gets its own isolated session (c:{conversation_id}:u:{user_id}).

Chat UI projects: generate a conversation_id on the backend when the user opens a new chat window; pass it back with each request.
Task-based or webhook projects: use your natural entity ID (ticket ID, invoice ID, job ID) as conversation_id. Never reuse IDs across unrelated tasks.

Multi-agent session saving pattern

Every workflow agent calls runner.run() once or multiple times internally. Without intervention, each sub-agent call auto-saves a turn, creating noisy intermediate history the user never saw. Prevent this with suppress_session_log + save_turn():

async def execute(self, input_text, runner, context) -> AgentResponse:
    context.suppress_session_log = True  # blocks auto-save for ALL sub-agent runs

    response = await runner.run(
        agent=sub_agent,
        input=current_input,
        context=context,   # same context object passed every time
    )
    # ... more sub-agent calls ...

    # Save exactly one clean turn at the end
    await runner.save_turn(
        session_id=context.session_id,
        user_message=input_text,        # what user originally sent
        assistant_message=final_output, # what user actually sees
    )

Custom workflow agents: If you build a custom workflow by subclassing BaseAgent, you must follow this pattern yourself. Forgetting suppress_session_log = True will save every sub-agent turn to session history.

Deciding which agent's output is the final response

In a multi-agent workflow, you must explicitly decide which agent's output is the final response, this is what you pass to save_turn(). Here are two examples

Sequential: agents run one after another, each passing output to the next. The last agent may produce the final response:

await runner.save_turn(session_id, user_input, last_agent_response.content)

Handoff / Router: sub-agents do work and their results are injected back into the top-level agent's message list. The top-level agent then synthesizes its own final response, so the top-level agent's output is what to save, not the sub-agents' intermediate results:

await runner.save_turn(session_id, user_input, top_level_agent_response.content)

Saving the wrong output: If you save an intermediate agent's output by mistake, session history will contain turns the user never saw, and future turns will load them as prior context.

Context Management

When a conversation approaches the model's context window, Continuum automatically compresses older messages.

from continuum import AgentConfig, ContextManagementConfig, CompressionStrategy

agent = BaseAgent(
    name="long-conv",
    instructions="...",
    config=AgentConfig(
        context_management=ContextManagementConfig(
            compression_strategy=CompressionStrategy.SMART,  # SMART / SUMMARIZE_OLD / TRUNCATE_OLDEST
        )
    ),
)

Tip: Set CONTEXT_COMPRESSION_THRESHOLD=0.8 (default) to trigger compression at 80% of the model's context window. CONTEXT_KEEP_RECENT_MESSAGES=10 ensures the last 10 messages are never truncated.

Time-Travel (Decision Trace)

Continuum can record every decision in a run, each LLM call, tool call, handoff, and workflow step, as a structured, replayable ledger. Once recorded, you can rewind to any step, edit it, and re-execute only what's downstream; everything upstream replays from the saved checkpoint. git rebase, for an agent run.

Two halves: record (the trace, what happened) and what-if (fork, rewind, change one input, replay). The feature is off by default and costs one boolean check per turn when disabled.

Worked examples: playground/decision-trace-glassbox runs across all nine workflow patterns (+ handoff), so you can fork and compare each one.

Enable & configure

Disabled by default. Opt in with environment variables (or set the same fields on settings programmatically). Recording is independent of forkability: turn on CHECKPOINT only when you want to rewind, since it stores per-step message snapshots.

# .env, everything off unless you opt in
DECISION_TRACE_ENABLED=true
DECISION_TRACE_DETAIL=full         # off | full
DECISION_TRACE_STORE=redis         # redis | memory | null
DECISION_TRACE_CHECKPOINT=true     # per-step snapshots → enables fork/rewind
DECISION_TRACE_TTL_DAYS=14         # auto-expiry of persisted traces

…or flip the same switches programmatically before the first run, which is what the decision-trace-glassbox playground does:

from continuum.config import settings
settings.decision_trace_enabled = True
settings.decision_trace_checkpoint = True   # only if you want fork/rewind
settings.decision_trace_store = "redis"

Setting	Default	Controls
`DECISION_TRACE_ENABLED`	`false`	Master switch. When off, no recorder is created and capture is skipped entirely.
`DECISION_TRACE_DETAIL`	`full`	What to attach to the response (the full trace is always persisted): `off` = persist only, attach nothing; `full` = attach the complete trace.
`DECISION_TRACE_STORE`	`redis`	Where traces persist: `redis`, `memory`, or `null`.
`DECISION_TRACE_CHECKPOINT`	`false`	Store per-step message snapshots. Required for `fork()`.
`DECISION_TRACE_TTL_DAYS`	`14`	Redis TTL for persisted traces.

A note on storage cost. CHECKPOINT=true stores a full message snapshot at every step, so a persisted trace can be sizeable and Redis usage grows with run volume. That's the price of being able to rewind, and it's worth it when you need it. To keep it modest: leave CHECKPOINT off unless you actually fork, and rely on TTL_DAYS to auto-expire old traces. Note that DETAIL only affects the trace returned on the response, the persisted trace is always full, so it isn't a storage lever. With the whole feature off (the default), nothing is stored at all.

Inspect a trace

When enabled, the trace is attached to the response. You can also reload any past run by id from the configured store.

response = await runner.run(agent, "Run the month-end close.")

# Attached to the response (when tracing is enabled)
trace = response.decision_trace        # dict: steps, metrics, final_response…

# …or reload any past run by id from the configured store
from continuum.agent.trace.config import get_trace_store
from continuum.agent.trace.types import TraceDetail

stored = await get_trace_store().get(response.run_id)
data = stored.to_dict(TraceDetail.FULL)   # steps with message checkpoints

Fork & rewind

runner.fork() resumes a past run at from_step: steps before it replay from the saved checkpoint (no LLM or tool calls), an optional override edits that step, and the loop re-executes forward. The parent run is never mutated, the fork is a new run that records its lineage.

# Rewind run X to a step, change one input, re-run only what's downstream
forked = await runner.fork(
    run_id,
    from_step="s11",                      # the step to resume at
    override={"set_tool_result": {          # "what if the tool had returned X?"
        "tool_call_id": "call_abc",
        "content": '{"materiality_threshold_usd": 1000000}',
    }},
    label="threshold $1M",
)
print(forked.content)                    # the new downstream outcome
print(forked.decision_trace["parent_run_id"])   # lineage back to the original

Requires checkpoints. fork() needs the parent run to have been recorded with DECISION_TRACE_CHECKPOINT=true and a persisting store (redis or memory). Without a checkpoint at from_step it raises a clear error.

The override applies a small, well-defined edit to the restored messages before the loop re-runs:

system	str	Replace (or prepend) the system instruction, a "what-if the policy were…" edit.
set_tool_result	{tool_call_id, content}	Override a recorded tool result, the "what if the tool had returned X?" knob.
replace_last_user	str	Replace the most recent user message content.
append	message dict	Append an extra message (e.g. one more instruction).

Forking works across multi-agent runs too: it resumes the agent that produced the step and restores the handoff stack. All nine workflow orchestrators, Sequential, Router, Loop, Reflection, Supervised, Planner, Parallel, Scatter, and Debate, implement the Forkable protocol and resume from the stage that owns the step. Pipeline patterns replay earlier stages from cache; concurrent patterns (Parallel, Scatter, Debate) re-run only the forked branch and replay sibling branches' outputs from the saved trace.

Branch & diff

Because upstream replays from cache, forking the same step at several values is cheap, run them concurrently and compare. diff_traces() reports what changed between two runs.

import asyncio
from continuum.agent.trace import diff_traces
from continuum.agent.trace.config import get_trace_store

# Branch the same step at three values, concurrently
forks = await asyncio.gather(*[
    runner.fork(run_id, "s11", override={"replace_last_user": edit(v)})
    for v in (5_000_000, 2_500_000, 1_000_000)
])

# Compare a fork against its parent
parent = await get_trace_store().get(run_id)
child  = await get_trace_store().get(forks[-1].run_id)
delta = diff_traces(parent, child)   # final_response before/after, step deltas

Storage backends

DECISION_TRACE_STORE chooses where a finished trace is saved so it can be reloaded, and forked, later. A Redis backend that can't connect falls back to null so persistence never breaks a run.

Backend	Where traces live	Survives restart / multi-process	Supports fork?
`redis` (default)	External Redis server	Yes / Yes, needs Redis running	Yes
`memory`	A dict inside the running process	No / No, zero setup	Yes, in the same process
`null`	Discarded (no-op)	Not persisted	No, nothing to reload

Decision Trace is an execution ledger the runtime reads back to resume runs, separate from Langfuse, the human-facing observability sink, which keeps running in parallel.

Limitations

Return-to-parent handoffs aren't forkable. Resuming the child can't reconstruct the parent's final answer, so forking a step inside one raises a clear error, use return_to_parent=False for handoffs you intend to fork. However, you can rerun an upstream agent before the parent agent with return_to_parent=True.
Overrides depend on how a stage resumes. Workflow orchestrators re-run the resumed stage fresh, they re-call its tools, so set_tool_result is ignored there; use replace_last_user or system to change a workflow stage's input. set_tool_result only takes effect where the fork replays the snapshot verbatim (single-agent and handoff resume).
Parallel can't rewind from its merge step. Forking a branch works in every concurrent pattern, but re-running only the final merge is supported by Scatter (gather-stage fork) and not yet by Parallel. Fork a branch instead, or use Scatter if you need to re-merge cached branch outputs.
Planner mid-plan forks need an embedded plan. Forking a mid-plan step re-uses the plan recorded in the parent trace; traces recorded before plan-embedding existed can't mid-plan fork, fork from the plan stage (stage 0) to re-plan and re-execute instead.
LLM nondeterminism. Re-executed steps are fresh model calls, so their wording can differ between runs even where your edit had no effect, a diff may show cosmetic text changes. Keep outcome-determining logic in tools and use temperature=0 so verdicts and numbers stay stable; treat narrative diffs as informational.

App Lifecycle

Use OrchestratorLifecycle to initialise and cleanly shut down all shared services (Redis, Langfuse, vector store). Use Container to inject custom clients, useful in tests and multi-tenant setups.

from continuum.core import OrchestratorLifecycle, Container

lifecycle = OrchestratorLifecycle()
await lifecycle.startup()   # connects Redis, Langfuse, vector store

# Health checks
health = await lifecycle.health_check()
# → {"redis": "ok", "qdrant": "ok", "langfuse": "ok", "llm": "ok"}

await lifecycle.shutdown()  # flushes Langfuse, closes Redis connections

# Inject custom clients (e.g. in tests or multi-tenant setups)
from continuum.core import Container

container = Container()
container.set_llm_client(my_llm_client)
container.set_memory_client(my_memory_client)
container.set_session_client(my_session_client)

runner = AgentRunner(container=container)

FastAPI Server

from fastapi import FastAPI
from continuum.agent import BaseAgent
from continuum.agent.runner import AgentRunner

app = FastAPI()
runner = AgentRunner()
agent = BaseAgent(name="api-agent", instructions="You are an API assistant.")

@app.post("/chat")
async def chat(body: dict):
    response = await runner.run(
        agent,
        body["message"],
        user_id=body["user_id"],
        session_id=body.get("session_id"),
    )
    return {"reply": response.content, "session_id": body.get("session_id")}

Temporal Workers

from continuum.temporal import WorkerManager, AgentRegistry

registry = AgentRegistry()
registry.register(my_agent)

worker_manager = WorkerManager(agent_registry=registry)
await worker_manager.start_worker()  # connects to TEMPORAL_HOST

Input / Output Scanning

Attach scanner callables to AgentConfig to detect prompt injection, PII, or unsafe content before/after the LLM call.

def pii_scanner(text: str) -> str:
    # Replace detected emails with [REDACTED]
    import re
    return re.sub(r'\S+@\S+', '[REDACTED]', text)

agent = BaseAgent(
    name="safe-agent",
    instructions="...",
    config=AgentConfig(
        input_scanners=[pii_scanner],
        output_scanners=[pii_scanner],
        injection_detection=True,
    ),
)

Testing Guide

Continuum favors integration tests over mocks. The [dev] extra ships fakeredis, respx, and pytest-asyncio.

# conftest.py
import pytest
import fakeredis.aioredis as fakeredis
from continuum.core import Container

@pytest.fixture
async def container():
    c = Container()
    c.set_session_client(SessionClient(redis=fakeredis.FakeRedis()))
    return c

# test_agent.py
import pytest

@pytest.mark.asyncio
async def test_basic_response(container, real_llm_client):
    container.set_llm_client(real_llm_client)
    runner = AgentRunner(container=container)
    response = await runner.run(agent, "Say hello.", user_id="test-user")
    assert response.status.value == "success"
    assert len(response.content) > 0

Components

Continuum's building blocks, tools, memory, sessions, durable workflows.

MCP Servers

Every tool is exposed via the Model Context Protocol. Three transport types are supported:

from continuum.tools import MCPServerStdio, MCPServerSse, MCPServerStreamableHttp

# Spawn a subprocess (local Python script or shell command)
fs_server = MCPServerStdio(command="python", args=["-m", "mcp_filesystem"])

# Server-Sent Events (remote server)
sse_server = MCPServerSse(url="https://tools.example.com/mcp/sse")

# StreamableHTTP (recommended for production)
http_server = MCPServerStreamableHttp(url="https://tools.example.com/mcp")

agent = BaseAgent(
    name="tool-agent",
    instructions="You have access to the filesystem.",
    mcp_servers=[fs_server],
)

Passing tools explicitly with MCPUtil

If you need the tool definitions as Python objects (e.g. to inspect, filter, or pass them manually), use MCPUtil.get_function_tools():

from continuum.tools import MCPUtil, ToolExecutor

# Get tool definitions from a connected server
tool_defs = await MCPUtil.get_function_tools(server)
tools = [t.model_dump() for t in tool_defs]

executor = ToolExecutor({server: None})   # None = expose all tools
await executor.initialize()

agent = BaseAgent(
    name="agent",
    instructions="...",
    tools=tools,
    tool_executor=executor,
)

Tool Filtering

When you have many MCP tools, semantic tool filtering sends only the relevant subset to the LLM each turn, reducing token cost and noise.

from continuum.agent.config import AgentConfig, ToolAttentionConfig

agent = BaseAgent(
    name="commerce-agent",
    instructions="...",
    mcp_servers=[shop_server],    # 50+ tools
    config=AgentConfig(
        tool_attention=ToolAttentionConfig(
            enabled=True,
            max_tools=10,        # send at most 10 tools per turn
        )
    ),
)

Tool Context Injection

Some tools return a value (e.g. session_id, cart_token) that subsequent tool calls need as input. ToolContextState captures these values automatically and injects them into later calls, no agent prompt changes required.

from continuum.tools import MCPServerStreamableHttp

# Capture session_id from the login tool result, inject into every subsequent call
shop_server = MCPServerStreamableHttp(
    url="https://shop.example.com/mcp",
    tool_context={
        "capture": {"login": "session_id"},   # tool name → result field to capture
        "inject":  {"session_id": "session_id"}, # param name → captured key
    },
)

Run Artifacts

MCP tool responses can contain rich structured data (widgets, tables, charts). Access them via response.run_artifacts.

response = await runner.run(agent, "Show me the product catalog.")

if response.run_artifacts:
    for artifact_id, artifact in response.run_artifacts.items():
        widget_meta = artifact.get("meta")       # widget template
        structured  = artifact.get("structured_content")
        text        = artifact.get("text_content")

Long-term Memory

Continuum uses mem0 + Milvus (default) or Qdrant for persistent semantic memory. Facts are automatically extracted from conversations and stored as embeddings.

from continuum.agent.config import AgentMemoryConfig
from continuum.memory.scopes import MemoryScope

agent = BaseAgent(
    name="memory-agent",
    instructions="Remember user preferences.",
    memory_config=AgentMemoryConfig(
        search_memories=True,
        search_scope=MemoryScope.USER,
        search_limit=5,
        store_memories=True,
        store_scope=MemoryScope.USER,
        broadcast_learnings=True,  # share useful facts with other agents
    ),
)

Controlling what gets stored

Use extraction_prompt to tell mem0 exactly which facts to extract. Use pre_store_filter to remove PII or irrelevant facts after they are stored.

# extraction_prompt, override mem0's default extraction logic
AgentMemoryConfig(
    store_memories=True,
    extraction_prompt=(
        "Only extract long-term facts about the user's pets, animal preferences, "
        "and dietary needs. Do NOT store transient actions like adding to cart or searches."
    ),
)

# pre_store_filter, runs after storage; facts not returned are deleted
def remove_pii(facts: list[str]) -> list[str]:
    return [f for f in facts if "credit card" not in f]

AgentMemoryConfig(store_memories=True, pre_store_filter=remove_pii)

Memory management API

from continuum.core.container import get_container

memory_client = get_container().memory_client

# View all memories for a user
memories = await memory_client.get_all(user_id="user-123")
for m in memories:
    print(m.memory, m.id)

# Search by query
results = await memory_client.search("pet preferences", user_id="user-123")

# Delete a specific memory
await memory_client.delete(memory_id="abc-123")

# Delete all memories for a user (GDPR right-to-forget)
await memory_client.delete_all(user_id="user-123")

Privacy tip: Consider exposing memory management in your frontend, let users view and delete what the AI remembers about them. This matters for GDPR compliance and user trust and experience.

Memory Scopes

Scope	Isolation	Use case
`USER`	Per user_id	User preferences, personal context
`AGENT`	Per agent name	Agent-specific domain knowledge
`SHARED`	Cross-user, cross-agent	Global facts, product knowledge
`CONVERSATION`	Per conversation_id	Ephemeral conversation context

IntelligentMemoryClient

A drop-in replacement for MemoryClient that adds importance scoring, time-based decay, entity extraction, and user profiles. Low-relevance or stale memories are down-weighted before being injected into the prompt.

from continuum.memory import IntelligentMemoryClient, IntelligenceConfig

memory = IntelligentMemoryClient(
    intelligence_config=IntelligenceConfig(
        enable_scoring=True,       # LLM scores each memory at store time
        enable_decay=True,         # recent memories get a relevance boost
        prune_threshold=0.15,      # delete memories below this score
    )
)

# Accepts strings, list of strings, or message dicts
await memory.add(
    [{"role": "user", "content": "I prefer dark mode."}],
    user_id="user-123",
)

results = await memory.search("user preferences", user_id="user-123")

add() accepts three forms: a plain string, a list of strings, or a list of {"role", "content"} message dicts. The messages-style form is recommended because it gives mem0 more context for fact extraction.

Sessions (Redis)

Short-term conversation history stored in Redis. AgentRunner handles loading and saving automatically, you only use SessionClient directly when you need to manage sessions outside a run (e.g. building a chat history UI, clearing history, debugging).

from continuum.session import SessionClient

session = SessionClient()

# Create or resume a session
session_id = await session.get_or_create_session(user_id="user-123")

# Read history (e.g. to display in a chat UI)
messages = await session.get_conversation_history(session_id)

# Clear messages but keep the session
await session.clear_session(session_id)

# Delete the session entirely
await session.delete_session(session_id)

Temporal Integration

Build durable workflows that survive process restarts, with automatic retries and audit trails.

from continuum.temporal import TemporalClient
from continuum.temporal.workflows import AgentWorkflow
from continuum.temporal.types import AgentStep, ApprovalStep, ParallelStep

steps = [
    AgentStep(agent_name="researcher", input="Analyze market trends"),
    ApprovalStep(
        description="Review analysis before proceeding",
        approvers=["manager@acme.com"],
        timeout=86400,  # 24 hours
    ),
    AgentStep(agent_name="writer"),  # receives researcher output
]

client = TemporalClient()
handle = await client.execute_workflow(AgentWorkflow, {"steps": steps})

Step Types

Step	Purpose
`AgentStep`	Run a registered agent
`ApprovalStep`	Pause for human approval (email notification)
`ParallelStep`	Run multiple agents concurrently
`ConditionalStep`	Branch based on a condition agent's output
`WaitStep`	Delay execution (1 second to 7 days)

Loop Workflow (Temporal)

For iterative agentic work that must survive restarts, use LoopAgentWorkflow. The loop runs on Temporal and persists state between iterations.

from continuum.temporal.workflows import LoopAgentWorkflow

handle = await client.execute_workflow(
    LoopAgentWorkflow,
    {
        "agent_name": "refinement-agent",
        "initial_input": "Draft a product description.",
        "max_iterations": 5,
        "termination_condition": "output_match",
        "termination_pattern": "APPROVED",
    },
)

Human-in-the-Loop

Workflows pause at ApprovalStep and send a notification to approvers. The workflow resumes only when approved, or auto-approves after timeout.

ApprovalStep(
    description="Approve the generated email before sending to customers",
    approvers=["alice@company.com", "bob@company.com"],
    timeout=3600,           # 1 hour, then auto-expires
    auto_approve_if="low_risk",  # skip approval if condition agent returns this
)

Integrations

Continuum calls LLM providers directly via their official SDKs.

Provider Routing

Continuum routes to providers automatically based on the model string prefix, no configuration needed.

Model name prefix	Provider	Examples
`claude-…` or `anthropic/…`	Anthropic	`claude-sonnet-4-5`, `claude-opus-4-5`
`gemini/…`	Google Gemini	`gemini/gemini-2.0-flash`, `gemini/gemini-1.5-pro`
anything else	OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-5`, `o3-mini`

OpenAI

Default provider. Route by using any gpt-* model string.

agent = BaseAgent(name="gpt-agent", instructions="...", model="gpt-4o")

Model	Context	Notes
`gpt-5`	400k	Latest, highest capability
`gpt-4o`	128k	Strong overall quality
`gpt-4o-mini`	128k	Default model, fast & cheap
`o3-mini`	200k	Reasoning model

Required env: OPENAI_API_KEY, also used by mem0's default embedder.

Anthropic

Route by using any claude-* or anthropic/... prefixed model string.

agent = BaseAgent(name="claude-agent", instructions="...", model="claude-sonnet-4-5")

Model	Notes
`claude-sonnet-4-5`	Recommended, best balance
`claude-opus-4-5`	Highest capability
`claude-haiku-4-5`	Fastest & cheapest

Required env: ANTHROPIC_API_KEY

Google Gemini

Route by using a gemini/-prefixed model string.

agent = BaseAgent(name="gemini-agent", instructions="...", model="gemini/gemini-2.5-flash")

Model	Notes
`gemini/gemini-2.5-flash`	Fast, cost-effective
`gemini/gemini-2.0-pro`	Best Gemini quality
`gemini/gemini-1.5-pro`	1M context window

Required env: GEMINI_API_KEY

Azure OpenAI

from continuum.agent import BaseAgent
from continuum.llm import LLMConfig
import os

config = LLMConfig(
    model="azure/gpt-4o",
    api_key=os.environ["AZURE_API_KEY"],
    api_base=os.environ["AZURE_API_BASE"],
    api_version=os.environ["AZURE_API_VERSION"],
)
agent = BaseAgent(name="azure-agent", instructions="...", llm_config=config)

The azure/ prefix routes to OpenAI's SDK with your Azure endpoint. Required env: AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION.

Automatic Fallback

When LLM_ENABLE_FALLBACK=true (default), any provider error transparently retries on FALLBACK_LLM_MODEL. Set it to a different provider to get cross-provider resilience:

# .env, primary OpenAI, fallback to Gemini
DEFAULT_LLM_MODEL=gpt-4o
FALLBACK_LLM_MODEL=gemini/gemini-1.5-flash
LLM_ENABLE_FALLBACK=true

Langfuse Tracing

All agent runs, LLM calls, tool invocations, and memory operations are automatically traced to Langfuse. No code changes required.

# docker-compose.yml ships Langfuse at http://localhost:3000
# .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=http://localhost:3000

Each response.trace_id is a direct Langfuse trace link for debugging.

Tracing Decorators

Three decorators create Langfuse spans for custom functions, all imported from continuum.observability.

@observe, generic span

Add custom spans to any async function:

from continuum.observability import observe

@observe("preprocess_input")
async def preprocess(text: str) -> str:
    return text.strip().lower()

@trace_tool, for tool / API functions

from continuum.observability import trace_tool

@trace_tool("search_products")
async def search_products(query: str):
    # creates a tool-type span with input/output captured
    return await db.search(query)

@trace_agent, for custom agent wrappers

from continuum.observability import trace_agent

@trace_agent("my-specialist")
async def run_specialist(input: str):
    # span tagged as agent-type in Langfuse
    ...

Decorator	Langfuse span type	Best for
`@observe`	span	Generic business logic
`@trace_tool`	tool	Tool / external API call functions
`@trace_agent`	agent	Custom agent wrapper functions

Qdrant

# .env
VECTOR_STORE_PROVIDER=qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=orchestrator_memories

For Qdrant Cloud, also set QDRANT_API_KEY.

Milvus default

# .env (default, no change needed if using docker compose)
VECTOR_STORE_PROVIDER=milvus
MILVUS_HOST=localhost
MILVUS_PORT=19530

For Zilliz Cloud, set MILVUS_TOKEN and point MILVUS_HOST to your cloud endpoint.

Continuum Suite · Proprietary

Aura

The Smart Inference layer. A cost-aware, classifier-driven router that picks the optimal model per prompt, so your agents hit one OpenAI-compatible endpoint while Aura dispatches across 250+ models on 45+ providers, with per-1M-token pricing, a budget ledger, and dynamic output caps in the request path.

OpenAI-compatible Sub-millisecond overhead Classifier · Router · Budget 250+ models

ProprietaryAura is a proprietary Continuum product. The reference below shows how agents wire into the gateway. For an endpoint, a virtual key, and access, reach out to the Shyftlabs team.

Reach out to Shyftlabs →

TL;DR. Point Continuum at SMART_GATEWAY_URL, set agent_model="auto", and the gateway picks the cheapest model that meets the prompt's quality threshold. Switch tiers per-agent with gateway_mode="strict" | "modest" | "quality".

How it works

Every POST /v1/chat/completions request flows through a fixed middleware pipeline. Each stage is independently observable and skipped cleanly when its feature flag is off.

# Request lifecycle (Continuum agent → Aura gateway → provider)

inbound request
  ─► requireValidKey            # bearer → virtualKey lookup, fail-closed 401
  ─► requestValidator           # content-type, custom-host checks
  ─► hooks (pre)                # plugin pre-hooks (PII, prompt-shield…)
  ─► memoryCache                # semantic cache (Redis, opt-in)
  ─► classifier                 # stamp complexity + domain
  ─► budget                     # pre-flight cost + reservation
  ─► router                     # model namespace → candidate list → top pick
  ─► provider handler           # native API call (OpenAI / Anthropic / Google / …)
  ─► response transform         # normalise to OpenAI shape
  ─► hooks (post)               # plugin post-hooks
  ─► observability              # flush trace to Langfuse
client ◄┘

The classifier and the router consume two signals: the model namespace from the request body, and the metadata block (session_id, trace_id, optional complexity / domain overrides).

Routing modes

Three preset modes resolve to three quality tiers. The mode is picked per-agent via gateway_mode and per-request by the model field (auto/cheap, auto/mid, auto/quality).

Mode	Tier	Optimises for	Typical pick
`strict`	cheap	lowest cost	smallest model that clears the capability gate (gpt-4o-mini / gemini-flash / haiku class)
`modest`	mid	quality / cost balance	mid-tier with the best q/cost, claude-sonnet, gpt-4o
`quality`	quality	highest quality	top-tier candidates available in the registry

Capped by the registry. The router can only pick from src/services/router/registry.json. Models not in the registry are unreachable via auto; pin them explicitly with <provider>/<model_id> if needed.

Wire Continuum to the Gateway

Continuum routes all LLM calls through Aura when SMART_GATEWAY_URL is set. There is nothing else to import or subclass; GatewayProvider automatically replaces the per-provider clients.

Environment variables

# Continuum side, .env
SMART_GATEWAY_URL=https://continuum.shyftops.io/v1     # gateway base URL
SMART_GATEWAY_API_KEY=your-smart-gateway-api-key       # bearer (matches a virtual key)
SMART_GATEWAY_DEFAULT_MODE=modest              # strict | modest | quality

Virtual key (bearer)

The gateway authenticates the client by bearer and looks up the upstream provider key, budget, and allowed-models from conf.integrations[].

// Smart Inference side, conf.json (excerpt)
{
  "integrations": [
    {
      "provider": "anthropic",
      "slug": "dev_team_anthropic",
      "bearer_token": "your-smart-gateway-api-key",
      "credentials": { "api_key_env": "ANTHROPIC_API_KEY" },
      "budget_usd": 100,
      "allowed_models": [
        "auto", "auto/cheap", "auto/mid", "auto/quality",
        "openai/gpt-4o-mini",
        "anthropic/claude-haiku-4-5-20251001",
        "anthropic/claude-opus-4-7",
        "google/gemini-2.0-flash"
      ]
    }
  ]
}

Process env vs .env file. npm run start:node does not dotenv-load the gateway. Either set -a; source .env; set +a before starting, or pin secrets in docker-compose.yaml's environment: block. A virtual key whose api_key_env resolves to undefined is silently dropped from the index, and every request to it returns 401.

Model namespace

The model field in the request body is parsed by modelResolver.ts. Five grammars are supported:

Form	Meaning
auto	gateway-wide auto-routing, default tier
auto/<tier>	gateway-wide, specific tier (cheap/mid/quality)
<provider>/auto	provider-scoped auto-routing, default tier
<provider>/auto/<tier>	provider-scoped, specific tier
<provider>/<model_id>	explicit pin, bypasses model selection

Agent code

From the agent's perspective, nothing changes. Set agent_model to a routing intent and optionally pick a mode.

from continuum.agent import BaseAgent, AgentRunner

# Auto-routing, gateway picks the model per turn.
agent = BaseAgent(
    name="shop-assistant",
    instructions="You are a friendly pet shop assistant.",
    model="auto",
    gateway_mode="modest",    # "strict" | "modest" | "quality"
)

response = await AgentRunner().run(agent, "Show me dog leashes", user_id="alice")
print(response.content)

For multi-agent flows, mix and match modes per role:

# Triage routes fast; specialist reasons carefully.
triage    = BaseAgent(name="triage", model="auto", gateway_mode="strict")
specialist = BaseAgent(name="specialist", model="auto", gateway_mode="quality")

Classifier output

When smart_inference.classifier.enabled = true (or by default in conf.defaults.json), every prompt is tagged with complexity and domain. The router uses these to filter candidates before mode/tier ranking.

Field	Values	Source
`complexity`	`simple` · `medium` · `complex`	classifier LLM (gpt-4o-mini by default) with rule fallback
`domain`	`general` · `code` · `health` · `math` · `analysis` · `reasoning` · `finance`	classifier LLM

You can override either via metadata:

{
  "model": "auto",
  "messages": [{"role": "user", "content": "…"}],
  "metadata": {
    "session_id": "s-001",
    "complexity": "complex",
    "domain": "code"
  }
}

Response headers

The gateway echoes its routing decision on every response. Useful for tracing and dashboards.

Header	Type	Example
x-aura-router-mode	string	`modest`
x-aura-router-complexity	string	`medium`
x-aura-router-domain	string	`general`
x-aura-router-picker	string	`category` · `pareto`
x-aura-router-pool-size-before	number	`20`
x-aura-router-pool-size-after	number	`14`
x-aura-router-attempts	list	`claude-sonnet-4-6@anthropic:200`
x-aura-router-handover	string	`none` · `injected`
x-aura-budget-state	string	`ok` · `warn`
x-aura-cache-status	string	`HIT` · `MISS` · `DISABLED`
x-aura-trace-id	uuid	`3a7f1b9e…`

Tips & gotchas

Handover, when the router picks a different model between turns in the same session_id, the gateway injects a one-line system note so the new model keeps prior context. Always pass metadata.session_id for multi-turn agents.
Cooldown, 429/503 from a candidate temporarily removes it from the pool (default 15s, honoring upstream Retry-After up to 5 min).
Semantic cache, cosine ≥ 0.95 hits on near-duplicate prompts. Streams and tool calls bypass cache by design. Enable via conf.cache.semantic.enabled = true.
Streaming tool calls, current versions of Aura forward provider chunks 1:1. If you stream and use tools, aggregate tool_call argument fragments client-side until finish_reason="tool_calls".
Budgets, gated by budgets.enabled. When on, requests are pre-checked against the bearer's budget_usd and the integration's session/project caps.

Continuum Research

The mechanisms behind reliable agents at scale, what makes Continuum more than a thin LLM wrapper. Each topic links a runtime concern to the module that implements it, with citations to the source files.

Internals Design notes Production lessons

Design philosophy

Continuum is a runtime, not a framework. Four principles guide every module:

Code-first, no YAML. Agents are Python dataclasses. The compiler is your friend.
Async-native. Every I/O hop is non-blocking, LLM, MCP tools, Redis, vector store, Langfuse.
Protocol-based abstractions. Replace any layer (LLM, memory, session, observability) by swapping the protocol implementation. No deep inheritance trees.
Trace everything. Auto-tracing isn't a feature, it's a constraint. If a behaviour can't be traced, it shouldn't ship.

Tool Attention

An agent with 50+ MCP tools dilutes the LLM's function-calling accuracy and burns tokens on schemas it never uses. Tool Attention is a top-k semantic promotion mechanism: every turn, only the most relevant tools are sent to the LLM.

Field	Type	Default	Effect
k	int	3	How many tools to promote per turn
min_tools	int	5	Skip attention when the agent has fewer tools than this
NEED_TOOL fallback	enum	auto	If the LLM signals a missing tool, expand the candidate set and retry

Optional dependency. Tool Attention requires sentence-transformers, which is not installed by default. Install it with pip install -e ".[embeddings]" or pip install sentence-transformers. If not installed, tool-attention is silently disabled and all tools are loaded instead.

from continuum.tools.tool_attention.config import ToolAttentionConfig

agent = BaseAgent(
    name="ops",
    instructions="…",
    tools=large_tool_set,        # 30+ tools
    config=AgentConfig(
        tool_attention=ToolAttentionConfig(k=5, min_tools=10),
    ),
)

Side effect. Tool Attention reduces prompt tokens 30–60% on tool-heavy agents, often paying for itself in latency before quality even enters the equation.

Context compression

When the running message array approaches the model's context window, Continuum compresses older turns into a summary while preserving the most recent N exchanges verbatim.

Env	Default	Meaning
CONTEXT_MANAGEMENT_ENABLED	true	Master switch
CONTEXT_COMPRESSION_THRESHOLD	0.8	Trigger at 80% of model's context window
CONTEXT_KEEP_RECENT_MESSAGES	10	Recent turns kept verbatim

Compression runs synchronously between LLM calls. The summary is stored as a system message; the original turns are dropped from the message array but kept in session history (Redis) for audit.

Instruction modifiers

Dynamic, code-driven prompt augmentation. A modifier is a callable (prompt, ctx) → prompt that runs after template variables are resolved but before the LLM call.

def tier_aware(prompt: str, ctx: RunContext) -> str:
    tier = ctx.metadata.get("user_tier", "free")
    if tier == "enterprise":
        return prompt + "\n\nThis is an enterprise user. Prioritise SLA."
    return prompt

agent = BaseAgent(
    name="support",
    instructions="You are helping {user_name}.",
    template_vars={"user_name": "Alice"},
    instruction_modifiers=[tier_aware],
)

Smart layer · model_tier routing

An alternative to Aura's gateway-side router, Continuum can route inline via RouterAgent(routing_strategy="model_tier"). A tier classifier (small LLM or heuristic) reads each prompt and dispatches to one of several pre-defined model tiers.

When SMART_LAYER_ENABLED=true and the strategy is model_tier, the router:

Runs the tier classifier on the prompt → returns cheap / mid / quality
Looks up the tier's pinned model from RouterConfig.tier_models
Falls back to FALLBACK_LLM_MODEL if the picked tier is unhealthy

Falls back to standard llm routing when SMART_LAYER_ENABLED=false.

Tier classifiers

Three classifier backends ship out of the box. Choose via LLM_ROUTE_TIER_CLASSIFIER.

Classifier	Where it runs	Best for
`light_only`	regex + length heuristics	zero-latency baseline; never calls an LLM
`qwen`	HuggingFace Router API, `Qwen3-4B-Instruct`	cloud routing without spinning a model
`qwen_local`	local OpenAI-compatible endpoint (MLX, vLLM)	air-gapped or zero-cost

Set LLM_ROUTE_TIER_CLASSIFIER_HEURISTIC_SHORTCUT=false to skip the keyword shortcut and always run the classifier LLM.

Handoff history transfer modes

When agent A hands off to agent B, the prior conversation is rewritten for B's context. Four modes control how much history travels:

Mode	What B sees	Use when
`FULL`	Verbatim message array	Specialist needs all detail
`SUMMARY`	LLM-generated summary + open question	Long conversations, narrow specialist
`RECENT_N`	Last N turns only	Topic just changed; old context is noise
`HYBRID`	Summary of older turns + last N verbatim	Default, best balance for most flows

Continuum also tracks handoff depth, detects cycles (A→B→A→B), and emits HANDOFF_RETURN events when control returns to a parent agent.

Memory isolation scopes

Long-term memory in Continuum has four orthogonal scopes. Pick per-agent for search and store separately.

Scope	Keyed by	Reach	Multi-tenant safe?
`USER`	user_id	Per-user across all agents	✓ default
`AGENT`	agent_name	Per-agent across all users	One-way, shared by users
`SHARED`	shared_key	Global knowledge base	No, explicitly shared
`CONVERSATION`	conversation_id	Ephemeral, single thread	✓

IntelligentMemoryClient

A drop-in replacement for the standard MemoryClient that adds three behaviours:

Adaptive extraction, uses a per-agent extraction prompt to filter what gets stored (e.g. "only pet preferences, never one-off cart actions").
Relevance re-ranking, re-ranks recalled memories using a small LLM scorer before injecting them into the prompt.
Deduplication, merges semantically equivalent memories to prevent drift.

PII filtering on memory writes

Before a memory is stored, Continuum can run it through a PII scrubber that redacts emails, phone numbers, SSNs, and credit cards. Configure via the memory client:

from continuum.memory import MemoryClient, PIIPolicy

memory = MemoryClient(pii_policy=PIIPolicy.REDACT)

Run artifacts

MCP tools can return structuredContent, JSON payloads alongside their text output. Continuum captures these as run artifacts, exposing them on AgentResponse.artifacts for the application to consume (UI widgets, downstream pipelines, audit logs).

response = await runner.run(agent, "checkout", user_id="alice")
for art in response.artifacts:
    print(art.tool_name, art.structured_content)
    # e.g. ('checkout', {'order_id': 'ORD-92151', 'total_cents': 699})

Evaluation framework

Continuum ships with two opt-in eval stacks (pip install -e ".[eval]"):

DeepEval, criterion-based evaluation with customisable metrics (faithfulness, correctness, toxicity).
RAGAS, RAG-specific metrics (context precision, recall, answer relevance).
EvaluatorAgent, a specialised BaseAgent whose job is grading other agents' outputs.

Golden datasets from Langfuse

Build regression test sets directly from production traces. The continuum.evaluation.golden module pulls traces matching a filter (e.g. tagged good_response in the Langfuse UI) and materialises them as a pytest dataset.

from continuum.evaluation import build_golden_dataset

dataset = await build_golden_dataset(
    project="shop-assistant",
    tags=["good_response"],
    since="2026-04-01",
)
# Use with pytest-asyncio + DeepEval as a CI gate.

Example · Pet Shop Assistant

An end-to-end walkthrough: an agent backed by an MCP shop server, routed through Aura, with Langfuse tracing and Redis-backed sessions. Copy the snippets in order and you'll have a working chat UI in under 10 minutes.

BaseAgent MCP tools Aura FastAPI Web UI

Source. The complete code lives at playground/gateway-local-shop/. This page is the annotated tour.

Architecture

┌──────────────┐    HTTP    ┌────────────────┐    OpenAI    ┌──────────────────┐
│  Browser     │ ─────────► │  FastAPI UI    │              │  Aura            │
│  /chat       │   POST     │  (web.py :8081)│              │  Gateway :8787   │
└──────────────┘            └────────┬───────┘              └────────┬─────────┘
                                     │                               │
                                     │ Continuum AgentRunner         │ pick model
                                     ▼                               ▼
                            ┌────────────────┐              ┌──────────────────┐
                            │  BaseAgent     │  ◄── tools ──│  OpenAI /        │
                            │  (agent.py)    │              │  Anthropic /     │
                            └────────┬───────┘              │  Google          │
                                     │                      └──────────────────┘
                                     │ MCP StreamableHTTP
                                     ▼
                            ┌────────────────┐              ┌──────────────────┐
                            │  Shop server   │              │  Langfuse :3005  │
                            │  (server.py    │              │  ◄── traces ──── │
                            │   :8888)       │              └──────────────────┘
                            └────────────────┘

Session/memory:  Redis (:6380)  ·  Milvus (:19530)

1 · Prerequisites

Python 3.13 with a venv
Docker / Docker Compose
Node 22 LTS for the gateway build
At least one provider key (OPENAI_API_KEY, ANTHROPIC_API_KEY, or GEMINI_API_KEY)

2 · Spin up infra

Continuum stack, Redis, Milvus, Langfuse, ClickHouse, Postgres, MinIO

cd continuum
docker compose up -d redis-sdk milvus milvus-etcd postgres clickhouse minio langfuse-web langfuse-worker

Aura gateway, build, then run from Docker (sharing Continuum's Langfuse Redis)

cd ../continuum-backend-smart-inference
nvm use 22 && npm install && npm run build
docker compose up -d gateway

3 · Configure env

Continuum (continuum/.env), point at the gateway and the local Langfuse:

# LLM / routing
SMART_GATEWAY_URL=http://localhost:8787/v1
SMART_GATEWAY_API_KEY=your-smart-gateway-api-key
SMART_GATEWAY_DEFAULT_MODE=modest

# Observability
LANGFUSE_ENABLED=true
LANGFUSE_HOST=http://localhost:3005           # continuum's docker-compose maps Langfuse to 3005
LANGFUSE_PUBLIC_KEY=pk-lf-…
LANGFUSE_SECRET_KEY=sk-lf-…

# Sessions + memory
SESSION_REDIS_PORT=6380
VECTOR_STORE_PROVIDER=milvus
MILVUS_PORT=19530

Aura (continuum-backend-smart-inference/.env), real provider keys:

OPENAI_API_KEY=sk-…
ANTHROPIC_API_KEY=sk-ant-…
GEMINI_API_KEY=AIza…

# From inside docker, reach the host's Langfuse
LANGFUSE_PUBLIC_KEY=pk-lf-…
LANGFUSE_SECRET_KEY=sk-lf-…
LANGFUSE_BASE_URL=http://host.docker.internal:3005

4 · MCP shop server

A FastMCP server exposes 5 tools (search, get, add-to-cart, view-cart, checkout) and 3 resources. The agent talks to it over StreamableHTTP.

# playground/gateway-local-shop/server.py (excerpt)
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("local-shop")

PRODUCTS = [
    {"id": "p1", "name": "Dog Food (Dry) 5kg", "price": 29.99, "animal": "dog"},
    # … more products …
]
_carts: dict[str, list] = {}

@mcp.tool()
def search_products(query: str = "", animal: str = "") -> list:
    """Filter products by query / animal."""
    return [p for p in PRODUCTS if (not animal or p["animal"] == animal)]

@mcp.tool()
def add_to_cart(session_id: str, product_id: str, quantity: int = 1) -> dict:
    cart = _carts.setdefault(session_id, [])
    cart.append({"product_id": product_id, "quantity": quantity})
    return {"cart_size": len(cart)}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(mcp.streamable_http_app(), host="0.0.0.0", port=8888)

python playground/gateway-local-shop/server.py     # MCP server on :8888

5 · Agent + config

The agent connects to the MCP server, defines memory + session policy, and uses Aura auto-routing.

# playground/gateway-local-shop/agent.py (excerpt)
from continuum import AgentConfig, AgentMemoryConfig, AgentMemoryScope, \
    AgentRunner, BaseAgent, MCPServerStreamableHttp, ToolExecutor
from continuum.tools.tool_attention.config import ToolAttentionConfig
from continuum.tools.types import ToolContextConfig, ToolContextVariable

mcp_server = MCPServerStreamableHttp(
    params={"url": "http://localhost:8888/mcp"},
    context_config=ToolContextConfig(
        variables=[ToolContextVariable(name="session_id",
                  inject_into=["add_to_cart", "view_cart", "checkout"])]
    ),
)
await mcp_server.connect()

executor = ToolExecutor({mcp_server: None})
await executor.initialize()

agent = BaseAgent(
    name="shop-assistant",
    instructions="You are a friendly pet shop assistant.",
    model="auto",                        # gateway picks
    gateway_mode=None,                    # falls back to SMART_GATEWAY_DEFAULT_MODE
    tools=executor.get_tool_definitions(),
    tool_executor=executor,
    memory_config=AgentMemoryConfig(
        search_memories=True, store_memories=True,
        search_scope=AgentMemoryScope.USER,
        store_scope=AgentMemoryScope.USER,
    ),
    config=AgentConfig(
        max_turns=3,
        log_to_session=True,
        tool_attention=ToolAttentionConfig(k=3, min_tools=3),
    ),
)

6 · CLI or Web UI

Two ways to drive it. The CLI is one file and stops at Ctrl-C; the Web UI is a FastAPI page with a ChatGPT-style chat.

CLI loop

# cli.py
import asyncio
from agent import agent, executor

async def main():
    runner = AgentRunner(tool_executor=executor)
    runner.register_agent(agent)
    while True:
        msg = input("you ❭ ").strip()
        if not msg: break
        r = await runner.run(agent, msg, user_id="alice", conversation_id="local")
        print("agent ❭", r.content)

asyncio.run(main())

FastAPI Web UI

# web.py, POST /chat returns the agent's reply as JSON
@app.post("/chat")
async def chat(req: ChatRequest):
    response = await shop_agent.chat(
        req.message,
        user_id=req.user_id,
        conversation_id=req.conversation_id,
    )
    return {"response": response}

python playground/gateway-local-shop/web.py    # open http://localhost:8081

7 · Run a conversation

Type any of these into the chat, the agent will route to the right tool automatically.

Prompt	What the agent does
show me dog toys	`search_products(animal="dog", category="toys")` → product list
add the tennis balls to my cart	`add_to_cart(session_id, product_id="p5")`
what's in my cart?	`view_cart(session_id)` → table with totals
checkout please	`checkout(session_id)` → order id + receipt

Observe. Open http://localhost:3005 → your Continuum project. Every turn appears as a Langfuse trace with the user/assistant messages, every tool call as a span, and the picked model + latency on the generation.

Make it yours

Swap the domain, replace the MCP server with one that exposes your tools (Slack, Jira, internal APIs). The agent code doesn't change.
Tighten the tier, set gateway_mode="strict" to drop latency to ~2–4s/turn (cheap-tier models).
Persist memories, keep memory_config on and the agent will remember per-user facts across sessions.
Add a workflow, wrap the agent in a RouterAgent with a billing-specialist sibling and a triage agent.

Community

Continuum is built at Shyftlabs. Contributions, examples, and feedback are welcome.

Contributing

Fork & clone, work in a feature branch
Install dev dependencies
```
pip install -e ".[dev,temporal,eval]"
```
Write tests, unit + integration, no mocking the database
```
pytest tests/ -v
```
Lint & type-check
```
ruff check src/
mypy src/
```
Submit a PR, describe what changed and why, reference the relevant docs section

Integration tests over mocks: Continuum integration tests hit real services (Redis, Qdrant/Milvus). Run docker compose up -d before running the test suite.

Coming soon · Continuum Suite · Proprietary

Provenance

The trust layer for agents in production. Prompt versioning, guardrails, PII redaction, and policy, so every decision your agents make is governed, observable, and audit-ready.

Prompt Management Guardrails PII Redaction Policy & Governance Audit Trails Access Control

Request early access →

Need governance sooner? Talk to the team at Shyftlabs. Provenance is rolling out to design partners first.