// docs

AI Providers

Compare OpenAI, Anthropic, and local wllama — and configure the right one for your agent.

Overview

AgentOp supports three AI providers for powering your agents. Each provider offers different capabilities, pricing models, and deployment options. This guide will help you choose the right provider and configure it properly for your use case.

Provider Comparison

Feature	OpenAI	Anthropic	Local (wllama)
API Key Required	Yes	Yes	No
Cost	Pay per token	Pay per token	Free
Privacy	Data sent to OpenAI	Data sent to Anthropic	100% local, no data sent
Internet Required	Yes (API calls)	Yes (API calls)	First download only
Model Size	N/A (cloud)	N/A (cloud)	0.5–7.5GB download (recommended model ~2.6GB)
Response Speed	Fast (API latency)	Fast (API latency)	Depends on device
Function Calling	Full support	Full support	Full support via GBNF grammar constraints
Best For	Production apps, complex tasks	Long context, detailed responses	Privacy, offline, no costs

OpenAI Provider

Overview

OpenAI provides the GPT series of models, known for strong general-purpose capabilities and broad knowledge. Best suited for production applications requiring fast, reliable AI responses.

Supported Models

GPT-4o: Most capable model with vision, function calling, and structured outputs
GPT-4o-mini: Fast, affordable model for lighter tasks (recommended for most users)
GPT-4-turbo: Previous generation flagship, still highly capable
GPT-3.5-turbo: Legacy model, still useful for simple tasks at lower cost

Getting an API Key

Visit platform.openai.com
Create an account (requires phone verification)
Add billing information (credit card required)
Navigate to API Keys section
Click "Create new secret key"
Copy the key (shown only once!)
Set usage limits for security

Security Best Practice

Create separate API keys for each agent and set monthly spending limits. This limits damage if a key is compromised.

Configuration in AgentOp

When creating your agent, select "OpenAI" as the provider and choose a model. AgentOp automatically injects the required langchain_openai package and configures the LLM connection — you do not need to write any import or setup code yourself:

# You do NOT need to write this — AgentOp injects it automatically:
#   from langchain_openai import ChatOpenAI
#   llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
#
# Your Python code only defines tool functions and get_tool_schemas().
# The platform handles model setup, API key encryption, and tool dispatch.
#
# Default model: gpt-4o-mini
# API key: encrypted client-side and embedded in the HTML file

Cost Estimation

GPT-4o-mini: ~$0.15 per million input tokens, ~$0.60 per million output tokens
GPT-4o: ~$2.50 per million input tokens, ~$10.00 per million output tokens
GPT-3.5-turbo: ~$0.50 per million input tokens, ~$1.50 per million output tokens

Note: Prices are approximate and change over time. Check OpenAI's pricing page for current rates.

Pros

Excellent general-purpose performance
Fast response times via API
Large ecosystem and community support
Strong function calling capabilities
Vision support in GPT-4o models

Cons

Requires API key and billing setup
Pay-per-use pricing can add up
Data sent to OpenAI servers (privacy consideration)
Internet connection required
Rate limits apply

Anthropic Provider (Claude)

Overview

Anthropic's Claude models are known for their strong instruction-following, safety features, and ability to handle very long contexts. Excellent choice for detailed responses and document analysis.

Supported Models

Claude 3.5 Sonnet: Most capable model with excellent reasoning (recommended)
Claude 3 Opus: Previous flagship, extremely capable but slower and more expensive
Claude 3 Haiku: Fast, cost-effective for simpler tasks

Getting an API Key

Visit console.anthropic.com
Create an account
Add billing information
Navigate to API Keys section
Click "Create Key"
Name your key and copy it
Set usage limits as needed

Configuration in AgentOp

When creating your agent, select "Anthropic" as the provider. AgentOp automatically injects the required langchain_anthropic and related LangChain packages — you do not need to write any import or setup code:

# You do NOT need to write this — AgentOp injects it automatically:
#   from langchain_anthropic import ChatAnthropic
#   llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", ...)
#
# Your Python code only defines tool functions and get_tool_schemas().
# Default model: claude-3-5-sonnet-20241022
# API key: encrypted client-side and embedded in the HTML file

Cost Estimation

Claude 3.5 Sonnet: ~$3.00 per million input tokens, ~$15.00 per million output tokens
Claude 3 Haiku: ~$0.25 per million input tokens, ~$1.25 per million output tokens
Claude 3 Opus: ~$15.00 per million input tokens, ~$75.00 per million output tokens

Note: Prices are approximate. Check Anthropic's pricing page for current rates.

Pros

Excellent at following complex instructions
Very large context windows (up to 200K tokens)
Strong safety and ethical alignment
Detailed, well-reasoned responses
Good for document analysis and summarization

Cons

More expensive than OpenAI for similar capabilities
Requires API key and billing
Data sent to Anthropic servers
Internet connection required
Smaller ecosystem than OpenAI

Local (wllama) Provider

Overview — Run an LLM Locally in the Browser

wllama runs large language models directly in the browser using llama.cpp compiled to WebAssembly (WASM) — no server required, no API key, nothing sent to the cloud. The model downloads once (0.4–7.5 GB depending on the model — the recommended Qwen 3 4B is ~2.6 GB), is cached locally, and then runs on the user's GPU via WebGPU for every subsequent session — including completely offline. A WebGPU-capable GPU is required for usable performance.

AgentOp agents using the Local provider combine Python tool functions running via Pyodide (Python compiled to WebAssembly) with WllamaAgentManager for grammar-constrained tool calling and wllama for inference. GBNF grammars constrain the model's token sampling to produce valid JSON tool calls — no free-form text parsing needed. All of this runs inside the browser tab with no network requests after the initial model download.

Supported Models (GGUF)

Six model families are available, all with full function-calling support via GBNF grammar:

Qwen 3 (Alibaba): 0.6B (~0.4GB), 1.7B (~1.1GB), 4B (~2.6GB, recommended default), 8B (~5.2GB), plus Qwen 2.5 Coder 7B (~4.7GB) for code-heavy agents
Llama (Meta): Llama 3.2 1B (~0.8GB), Llama 3.2 3B (~2.0GB), Llama 3.1 8B (~4.9GB)
Hermes (NousResearch): Hermes 3 Llama 3.2 3B (~2.0GB) — strong agentic tool calling
Phi (Microsoft): Phi 4 Mini 3.8B (~2.3GB), Phi 3.5 Mini 3.8B (~2.4GB)
Gemma 4 (Google): E2B MoE (~3.5GB), E4B MoE (~5.4GB), 12B (~7.5GB, heavy)
DeepSeek (reasoning): R1-0528 Qwen3 8B (~5.0GB), R1 Distill Qwen 7B (~4.7GB), R1 Distill Qwen 1.5B (~1.0GB)

All models are GGUF format (Q4_K_M quantization), loaded from HuggingFace by wllama at runtime. Each model runs with a context window sized for in-browser memory (8K tokens for most models by default). The agent's Context selector lets users with more (V)RAM raise it — up to 32K on most models — and in Auto mode the window is raised to 16K automatically when inference runs on a capable GPU (NVIDIA or Apple silicon).

WebGPU + WebAssembly

wllama uses llama.cpp compiled to WebAssembly, with model layers offloaded to the GPU via WebGPU. Chrome and Edge on a machine with a graphics card are recommended; Safari is WebGPU-accelerated on Apple Silicon Macs. A supported GPU is required for usable local inference — on hardware without one, use a cloud provider. In Auto mode, NVIDIA and Apple GPUs are used immediately; any other GPU (AMD, Intel, Qualcomm — discrete, integrated, or unified-memory) is tested once on first load and falls back to CPU automatically if it fails or stalls. The result is remembered on that device, and the Run on selector can always force GPU or CPU explicitly.

No Configuration Required

Simply select "Local (wllama)" as your provider when creating an agent. No API key needed!

# wllama handles inference via GBNF grammar-constrained tool calling
# Your Python code defines the tools, called via JS⇔Python bridge
# Model selection happens in the browser UI

# Default model: Qwen 3 4B (Q4_K_M) — best balance of speed and capability
# All models support function calling via GBNF grammar constraints

First-Time Setup

When a user opens your agent for the first time:

They select a GGUF model from available options
The model downloads from HuggingFace (0.5–7.5GB depending on the model)
Model is cached in browser for future use
Agent is ready to use offline!

Large Download Size

Larger models can be several gigabytes (up to ~7.5GB for Gemma 4 12B). Warn users about the download on first use. After downloading, the model is cached and works offline.

Performance Considerations

Desktop/Laptop with a GPU: Good performance via WebGPU inference
No dedicated GPU: Impractically slow — use a cloud provider instead
Mobile: Not practical — phones rarely expose a GPU to WebGPU
Low-end / older GPUs: May struggle with larger 7–12B models; try the 0.6B–3B variants
Speed: 2-20 tokens/second depending on device and model size

Pros

Completely free - no API costs
100% private - data never leaves device
Works offline after initial download
No API key management
Full function calling support via GBNF grammar constraints
No rate limits

Cons

Initial model download required (0.5–7.5GB depending on the model)
Requires a WebGPU-capable GPU (not practical without one)
Performance varies by GPU
Smaller models = lower quality than GPT-4 or Claude
Limited to browser environment

API Key Security

AgentOp uses client-side encryption to protect your API keys when embedding them in HTML files:

How It Works

When you download an agent, you're prompted to create an encryption password
Your API key is encrypted using AES-256 encryption in your browser
Only the encrypted key is embedded in the HTML file
When opening the agent, users enter the password to decrypt the key
Decryption happens entirely client-side - no keys sent to servers

Security Best Practices

Use a strong, unique encryption password
Create separate API keys for each agent
Set spending limits on all API keys
Regularly rotate API keys
Never share agents with embedded keys publicly
Consider using Local (wllama) for public agents

Choosing the Right Provider

Use OpenAI if you need:

Production-ready, reliable AI
Fast response times
Vision capabilities (GPT-4o)
Broad general knowledge
Cost-effective solutions (GPT-4o-mini)

Use Anthropic if you need:

Very long context handling (200K tokens)
Extremely detailed responses
Document analysis and summarization
Strong ethical alignment
Complex instruction following

Use Local (wllama) if you need:

Zero API costs
Complete privacy and data control
Offline functionality
No API key management
Public distribution without key exposure

Switching Providers

You can change providers for any agent by editing it and selecting a different provider option. Your Python tool functions (async def + get_tool_schemas()) work identically across all three providers — no code changes are needed when switching. AgentOp automatically configures the correct LangChain packages and infrastructure for each provider behind the scenes.

AI Providers

Overview

Provider Comparison

OpenAI Provider

Overview

Supported Models

Getting an API Key

Configuration in AgentOp

Cost Estimation

Pros

Cons

Anthropic Provider (Claude)

Overview

Supported Models

Getting an API Key

Configuration in AgentOp

Cost Estimation

Pros

Cons

Local (wllama) Provider

Overview — Run an LLM Locally in the Browser

Supported Models (GGUF)

No Configuration Required

First-Time Setup

Performance Considerations

Pros

Cons

API Key Security

How It Works

Choosing the Right Provider

Use OpenAI if you need:

Use Anthropic if you need:

Use Local (wllama) if you need:

Switching Providers

Next Steps

Building Agents →

API Reference →

FAQ →

Create an Agent →