AI Provider Configuration
Choose and configure the AI backend for your agents
Overview
AgentOp supports three AI providers for powering your agents. Each provider offers different capabilities, pricing models, and deployment options. This guide will help you choose the right provider and configure it properly for your use case.
Provider Comparison
| Feature | OpenAI | Anthropic | Local (wllama) |
|---|---|---|---|
| API Key Required | Yes | Yes | No |
| Cost | Pay per token | Pay per token | Free |
| Privacy | Data sent to OpenAI | Data sent to Anthropic | 100% local, no data sent |
| Internet Required | Yes (API calls) | Yes (API calls) | First download only |
| Model Size | N/A (cloud) | N/A (cloud) | 4-8GB download |
| Response Speed | Fast (API latency) | Fast (API latency) | Depends on device |
| Function Calling | Full support | Full support | Full support via LangChain.js |
| Best For | Production apps, complex tasks | Long context, detailed responses | Privacy, offline, no costs |
OpenAI Provider
Overview
OpenAI provides the GPT series of models, known for strong general-purpose capabilities and broad knowledge. Best suited for production applications requiring fast, reliable AI responses.
Supported Models
- GPT-4o: Most capable model with vision, function calling, and structured outputs
- GPT-4o-mini: Fast, affordable model for lighter tasks (recommended for most users)
- GPT-4-turbo: Previous generation flagship, still highly capable
- GPT-3.5-turbo: Legacy model, still useful for simple tasks at lower cost
Getting an API Key
- Visit platform.openai.com
- Create an account (requires phone verification)
- Add billing information (credit card required)
- Navigate to API Keys section
- Click "Create new secret key"
- Copy the key (shown only once!)
- Set usage limits for security
Security Best Practice
Create separate API keys for each agent and set monthly spending limits. This limits damage if a key is compromised.
Configuration in AgentOp
When creating your agent, select "OpenAI" as the provider and choose a model.
AgentOp automatically injects the required langchain_openai package
and configures the LLM connection — you do not need to write any import or
setup code yourself:
# You do NOT need to write this — AgentOp injects it automatically:
# from langchain_openai import ChatOpenAI
# llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
#
# Your Python code only defines tool functions and get_tool_schemas().
# The platform handles model setup, API key encryption, and tool dispatch.
#
# Default model: gpt-4o-mini
# API key: encrypted client-side and embedded in the HTML file
Cost Estimation
- GPT-4o-mini: ~$0.15 per million input tokens, ~$0.60 per million output tokens
- GPT-4o: ~$2.50 per million input tokens, ~$10.00 per million output tokens
- GPT-3.5-turbo: ~$0.50 per million input tokens, ~$1.50 per million output tokens
Note: Prices are approximate and change over time. Check OpenAI's pricing page for current rates.
Pros
- Excellent general-purpose performance
- Fast response times via API
- Large ecosystem and community support
- Strong function calling capabilities
- Vision support in GPT-4o models
Cons
- Requires API key and billing setup
- Pay-per-use pricing can add up
- Data sent to OpenAI servers (privacy consideration)
- Internet connection required
- Rate limits apply
Anthropic Provider (Claude)
Overview
Anthropic's Claude models are known for their strong instruction-following, safety features, and ability to handle very long contexts. Excellent choice for detailed responses and document analysis.
Supported Models
- Claude 3.5 Sonnet: Most capable model with excellent reasoning (recommended)
- Claude 3 Opus: Previous flagship, extremely capable but slower and more expensive
- Claude 3 Haiku: Fast, cost-effective for simpler tasks
Getting an API Key
- Visit console.anthropic.com
- Create an account
- Add billing information
- Navigate to API Keys section
- Click "Create Key"
- Name your key and copy it
- Set usage limits as needed
Configuration in AgentOp
When creating your agent, select "Anthropic" as the provider. AgentOp
automatically injects the required langchain_anthropic and related
LangChain packages — you do not need to write any import or setup code:
# You do NOT need to write this — AgentOp injects it automatically:
# from langchain_anthropic import ChatAnthropic
# llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", ...)
#
# Your Python code only defines tool functions and get_tool_schemas().
# Default model: claude-3-5-sonnet-20241022
# API key: encrypted client-side and embedded in the HTML file
Cost Estimation
- Claude 3.5 Sonnet: ~$3.00 per million input tokens, ~$15.00 per million output tokens
- Claude 3 Haiku: ~$0.25 per million input tokens, ~$1.25 per million output tokens
- Claude 3 Opus: ~$15.00 per million input tokens, ~$75.00 per million output tokens
Note: Prices are approximate. Check Anthropic's pricing page for current rates.
Pros
- Excellent at following complex instructions
- Very large context windows (up to 200K tokens)
- Strong safety and ethical alignment
- Detailed, well-reasoned responses
- Good for document analysis and summarization
Cons
- More expensive than OpenAI for similar capabilities
- Requires API key and billing
- Data sent to Anthropic servers
- Internet connection required
- Smaller ecosystem than OpenAI
Local (wllama) Provider
Overview — Run an LLM Locally in the Browser
wllama runs large language models directly in the browser using llama.cpp compiled to WebAssembly (WASM) — no server required, no API key, nothing sent to the cloud. The model downloads once (4–8 GB for Q4 GGUF variants), is cached locally, and then runs entirely on the user's CPU via WASM for every subsequent session — including completely offline.
AgentOp agents using the Local provider combine Python tool functions running via Pyodide (Python compiled to WebAssembly) with WllamaAgentManager for grammar-constrained tool calling and wllama for inference. GBNF grammars constrain the model's token sampling to produce valid JSON tool calls — no free-form text parsing needed. All of this runs inside the browser tab with no network requests after the initial model download.
Supported Models (GGUF)
- Hermes-2-Pro-Mistral-7B-Q4_K_M: 7B model with function calling via grammar (recommended)
- Hermes-3-Llama-3.1-8B-Q4_K_M: Hermes-tuned Llama 3.1, 8B parameters
- Llama-3.2-3B-Instruct-Q4_K_M: Meta's compact Llama 3.2, 3B parameters
- Llama-3.1-8B-Instruct-Q4_K_M: Meta's Llama 3.1, 8B parameters
- Qwen2.5-7B-Instruct-Q4_K_M: Alibaba Qwen 2.5, 7B parameters
- Qwen3-0.6B-Q4_K_M: Alibaba Qwen 3, 0.6B ultra-lightweight
- Phi-3.5-mini-instruct-Q4_K_M: Microsoft Phi 3.5 Mini, 3.8B parameters
- Gemma-4-E2B-it-Q4_K_M: Google Gemma 4, 2.3B effective parameters (recommended compact)
- Gemma-4-E4B-it-Q4_K_M: Google Gemma 4, 4.5B effective parameters
All models are GGUF format, loaded from HuggingFace by wllama at runtime.
WebAssembly — No WebGPU Required
wllama uses llama.cpp compiled to single-thread WASM. It works in all modern browsers without WebGPU — Chrome, Firefox, Safari, Edge. No special browser flags needed.
No Configuration Required
Simply select "Local (wllama)" as your provider when creating an agent. No API key needed!
# wllama handles inference via GBNF grammar-constrained tool calling
# Your Python code defines the tools, called via JS⇔Python bridge
# Model selection happens in the browser UI
# Default model: Hermes-2-Pro-Mistral-7B-Q4_K_M
# All models support function calling via GBNF grammar constraints
First-Time Setup
When a user opens your agent for the first time:
- They select a GGUF model from available options
- The model downloads from HuggingFace (4-8GB for Q4 variants)
- Model is cached in browser for future use
- Agent is ready to use offline!
Large Download Size
Models are 4-8GB in size. Warn users about the download on first use. After downloading, the model is cached and works offline.
Performance Considerations
- Desktop/Laptop: Good performance via CPU WASM inference
- Mobile: Works but will be slower on most devices
- Low-end devices: May struggle with larger 7-8B models; try 1-3B variants
- Speed: 2-20 tokens/second depending on device and model size
Pros
- Completely free - no API costs
- 100% private - data never leaves device
- Works offline after initial download
- No API key management
- Full function calling support via GBNF grammar constraints
- No rate limits
Cons
- Large initial download (4-8GB)
- CPU-based inference (slower than GPU-backed API models)
- Performance varies by device
- Smaller models = lower quality than GPT-4 or Claude
- Limited to browser environment
API Key Security
AgentOp uses client-side encryption to protect your API keys when embedding them in HTML files:
How It Works
- When you download an agent, you're prompted to create an encryption password
- Your API key is encrypted using AES-256 encryption in your browser
- Only the encrypted key is embedded in the HTML file
- When opening the agent, users enter the password to decrypt the key
- Decryption happens entirely client-side - no keys sent to servers
Security Best Practices
- Use a strong, unique encryption password
- Create separate API keys for each agent
- Set spending limits on all API keys
- Regularly rotate API keys
- Never share agents with embedded keys publicly
- Consider using Local (wllama) for public agents
Choosing the Right Provider
Use OpenAI if you need:
- Production-ready, reliable AI
- Fast response times
- Vision capabilities (GPT-4o)
- Broad general knowledge
- Cost-effective solutions (GPT-4o-mini)
Use Anthropic if you need:
- Very long context handling (200K tokens)
- Extremely detailed responses
- Document analysis and summarization
- Strong ethical alignment
- Complex instruction following
Use Local (wllama) if you need:
- Zero API costs
- Complete privacy and data control
- Offline functionality
- No API key management
- Public distribution without key exposure
Switching Providers
You can change providers for any agent by editing it and selecting a different provider option.
Your Python tool functions (async def + get_tool_schemas()) work
identically across all three providers — no code changes are needed when switching.
AgentOp automatically configures the correct LangChain packages and infrastructure for each
provider behind the scenes.