Cloud Providers Setup
While local models offer privacy and offline capability, cloud LLM providers bring tremendous power and convenience. Cloud APIs give you access to the most capable models without requiring powerful local hardware. The models stay current as providers continuously improve them. You never worry about running out of local resources, and you can scale workloads far beyond what your machine could handle alone.
The tradeoff, of course, is that your data leaves your machine and travels to the provider's infrastructure. You pay per request rather than having unlimited local inference. Internet connectivity becomes a hard requirement. But for many use cases—especially when you need maximum capability or are working with workloads that exceed local resources—cloud providers are the right choice.
Anthropic (Claude)
Anthropic's Claude models are renowned for their strong reasoning capabilities, nuanced understanding, and helpful personality. Claude particularly excels at complex analysis, coding assistance, and maintaining context over long conversations. The latest Claude Sonnet 4 offers an excellent balance of speed, cost, and intelligence that makes it ideal for most production use cases.
Getting Your API Key
Before you can use Anthropic's models, you need an API key that authorizes your requests. Visit console.anthropic.com and create an account if you don't already have one. Once logged in, navigate to the "API Keys" section of the console. You'll see options to create new keys and manage existing ones.
Click "Create Key" to generate a new API key. Anthropic will display the key once—copy it immediately because you won't be able to see it again. API keys follow the format sk-ant- followed by a long string of characters. Treat this key like a password: anyone with access to it can make requests on your behalf and incur charges to your account.
Store the key securely. The recommended approach is to set it as an environment variable in your shell configuration file:
# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_API_KEY="sk-ant-api03-..."
# Reload your shell configuration
source ~/.bashrc
Setting it in your shell configuration ensures the variable is available every time you open a terminal. The daemon inherits this environment when it starts, giving it access to the key without storing it in configuration files that might be shared or committed to version control.
Configuring Lattice
With your API key secured in an environment variable, configure Lattice to use Anthropic as a provider. Edit your configuration file at ~/.lovelace/lattice/config.toml:
[orchestrator]
default_provider = "anthropic"
[providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-sonnet-4-6"
enabled = true
The api_key_env field tells Lattice to read the API key from the ANTHROPIC_API_KEY environment variable. This level of indirection keeps credentials out of configuration files. The default_model specifies which Claude model to use when you don't explicitly request a different one. The enabled field activates this provider for use.
After saving your configuration, restart the daemon to load the new settings:
lattice-ctl daemon stop
lattice-ctl daemon start
Verify the provider is configured correctly and accessible:
lattice-ctl provider test anthropic
If the test succeeds, Lattice can authenticate with Anthropic and make requests. If it fails, check that your API key is set correctly and that you have internet connectivity.
Understanding Claude Models
Anthropic offers several Claude models, each optimized for different priorities. Claude Sonnet (claude-sonnet-4-6) provides the best balance for most users—it's fast, capable, and reasonably priced. Use Sonnet as your default unless you have specific needs that require a different tier.
Claude Opus (claude-opus-4-8) represents the most powerful option, delivering the highest quality reasoning and output. Opus excels at extremely complex tasks, nuanced analysis, and scenarios where absolute quality matters more than speed or cost. However, it's significantly more expensive per request and slower to respond than Sonnet.
Claude Haiku (claude-haiku-4-5) optimizes for speed and cost. Haiku responds quickly and costs less per request, making it ideal for simple tasks, high-volume workloads, or scenarios where speed trumps sophistication. Use Haiku for straightforward tasks like classification, simple summarization, or cases where you need rapid responses.
All Claude models support a 200,000 token context window, allowing them to work with extremely long inputs or maintain extensive conversation history. This large context makes Claude particularly suitable for analyzing lengthy documents, maintaining complex multi-turn conversations, or tasks requiring substantial reference material.
Managing Costs
Cloud APIs charge per request based on token usage—both input tokens (what you send) and output tokens (what the model generates). Costs add up quickly if you're not mindful of usage patterns.
Monitor your spending through the Anthropic console's usage dashboard. Set up billing alerts to notify you when spending approaches your budget limits. Many users find it helpful to set monthly spending caps that prevent runaway costs from unexpected usage spikes.
Different tasks warrant different models based on their cost-performance tradeoffs. Use Haiku for simple, high-volume tasks where sophisticated reasoning isn't required. Reserve Sonnet for general-purpose work requiring solid capabilities. Only escalate to Opus when you need maximum quality and the cost is justified by the task's importance.
Keep prompts concise and focused to minimize token usage. Long, rambling prompts waste tokens without improving output quality. Similarly, when you don't need lengthy responses, specify desired output length to avoid paying for verbose answers you don't need.
OpenAI
Get API Key
- Go to platform.openai.com
- Sign up or log in
- Go to "API Keys"
- Click "Create new secret key"
- Copy the key (starts with
sk-)
Configuration
Environment Variable:
export OPENAI_API_KEY="sk-..."
Config File:
# ~/.lovelace/lattice/config.toml
[orchestrator]
default_provider = "openai"
[providers.openai]
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-4"
enabled = true
Restart and test:
lattice-ctl daemon stop
lattice-ctl daemon start
lattice-ctl provider test openai
Available Models
lattice-ctl provider models openai
GPT-4 Turbo
- Model:
gpt-4-turbo - Latest GPT-4 with improvements
- 128K context window
GPT-4
- Model:
gpt-4 - Original GPT-4
- 8K context window
GPT-3.5 Turbo
- Model:
gpt-3.5-turbo - Fast and capable
- 16K context window
- Lower cost
Pricing
Google Gemini
Get API Key
- Go to ai.google.dev
- Click "Get API Key"
- Follow setup instructions
- Copy your API key
Configuration
Environment Variable:
export GOOGLE_API_KEY="..."
Config File:
# ~/.lovelace/lattice/config.toml
[orchestrator]
default_provider = "google"
[providers.google]
api_key_env = "GOOGLE_API_KEY"
default_model = "gemini-pro"
enabled = true
Restart and test:
lattice-ctl daemon stop
lattice-ctl daemon start
lattice-ctl provider test google
Multiple Providers
Configure multiple providers and switch between them:
# ~/.lovelace/lattice/config.toml
[orchestrator]
default_provider = "anthropic" # Default
[providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-sonnet-4-6"
enabled = true
[providers.openai]
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-4-turbo"
enabled = true
[providers.google]
api_key_env = "GOOGLE_API_KEY"
default_model = "gemini-pro"
enabled = true
Switch provider at runtime:
# Start daemon with specific provider
lattice-ctl daemon start --provider openai --model gpt-4-turbo
# Or change in config and restart
lattice-ctl daemon stop
# Edit config.toml: default_provider = "openai"
lattice-ctl daemon start
API Key Security
Best Practices
✅ DO:
- Use environment variables
- Add to
.envor shell config - Never commit to git
- Rotate keys regularly
❌ DON'T:
- Hardcode in config files
- Commit to version control
- Share keys publicly
- Use same key everywhere
Storage Locations
# Shell config (recommended)
~/.bashrc
~/.zshrc
# Lattice config (reference env var only)
~/.lovelace/lattice/config.toml
# Project-specific (not committed)
.env
Example .bashrc:
# ~/.bashrc or ~/.zshrc
# Lattice API Keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
# Reload after editing:
# source ~/.bashrc
Rate Limits & Quotas
Anthropic
- Free tier: Limited requests/month
- Paid: Based on usage tier
- Check: console.anthropic.com
OpenAI
- Free trial: $5 credit
- Pay-as-you-go: Based on tokens
- Check: platform.openai.com/usage
- Free tier: Generous limits
- Paid: Based on usage
- Check: ai.google.dev
Error Handling
Invalid API key
lattice-ctl provider test anthropic
# Error: Invalid API key
# Fix:
echo $ANTHROPIC_API_KEY # Check set correctly
export ANTHROPIC_API_KEY="sk-ant-..." # Set if missing
lattice-ctl daemon restart # Reload environment
Rate limit exceeded
# Error: Rate limit exceeded
# Solutions:
# 1. Wait and retry
# 2. Upgrade tier (check provider console)
# 3. Switch to different provider temporarily
lattice-ctl daemon start --provider openai
Network timeout
# Error: Request timeout
# Check internet connection
ping anthropic.com
# Increase timeout in config
[daemon]
ipc_timeout_ms = 30000 # 30 seconds
Cost Management
Monitor Usage
Anthropic:
- Dashboard: console.anthropic.com → Usage
OpenAI:
- Dashboard: platform.openai.com → Usage
Google:
- Dashboard: ai.google.dev → Usage
Set Limits
Configure in provider dashboards to prevent runaway costs:
- Monthly spending limits
- Rate limits
- Email alerts
Optimize Costs
-
Use appropriate models:
- Haiku (
claude-haiku-4-5) / GPT-3.5 Turbo for simple tasks - Sonnet (
claude-sonnet-4-6) / GPT-4 for complex tasks - Opus (
claude-opus-4-8) only when maximum quality is required
- Haiku (
-
Local models for development:
toml# Use Ollama locally, cloud in production [orchestrator] default_provider = "ollama" # Development # default_provider = "anthropic" # Production -
Monitor token usage:
bash# Check request sizes in logs lattice-ctl daemon start --foreground
Next Steps
- Local Models → - Add offline capability
- Config File Reference → - Complete TOML schema
- Environment Variables → - All available env vars
Related
OpenAI
OpenAI pioneered widely-accessible large language models and continues to push the frontier with GPT-4 and its variants. OpenAI's models excel at creative tasks, code generation, and general-purpose reasoning. The extensive ecosystem around OpenAI's API and broad model selection make it a versatile choice for many scenarios.
Obtaining Your API Key
Navigate to platform.openai.com and sign in with your OpenAI account, or create one if you're new to the platform. Once authenticated, go to the "API Keys" section from your account dashboard. You'll see a list of any existing keys and an option to create new ones.
Click "Create new secret key" to generate a fresh API key. OpenAI displays the complete key immediately after creation—this is your only opportunity to copy it. The key starts with sk- followed by a long alphanumeric string. Store this securely, as anyone with access can charge requests to your account.
Set the API key in your shell environment for secure, persistent access:
export OPENAI_API_KEY="sk-..."
Add this export to your shell configuration file (~/.bashrc or ~/.zshrc) to ensure it's available in every new terminal session.
Configuring Lattice for OpenAI
Edit your Lattice configuration to include OpenAI as a provider:
[orchestrator]
default_provider = "openai"
[providers.openai]
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-4-turbo"
enabled = true
This configuration tells Lattice to read the API key from your environment variable, use GPT-4 Turbo as the default model, and enable OpenAI for use. After saving the configuration file, restart the daemon and test connectivity:
lattice-ctl daemon stop
lattice-ctl daemon start
lattice-ctl provider test openai
Successful test output confirms that Lattice can authenticate with OpenAI and make requests.
Navigating OpenAI's Model Lineup
OpenAI offers several model tiers, each with different capabilities and price points. GPT-4 Turbo (gpt-4-turbo) represents the latest iteration of GPT-4 with improved performance and a larger 128K context window. It balances capability with reasonable cost and speed, making it suitable for most production use cases.
The original GPT-4 (gpt-4) model remains available with an 8K context window. While slightly less capable than Turbo, it's well-tested and reliable for tasks that don't require the extended context or latest improvements.
GPT-3.5 Turbo (gpt-3.5-turbo) offers a cost-effective option for less demanding tasks. It's significantly faster and cheaper than GPT-4 variants while still handling many common use cases competently. Use GPT-3.5 Turbo for simple classification, straightforward summarization, or high-volume tasks where cost matters more than maximum quality.
Managing OpenAI Costs
OpenAI's pricing is based on token consumption—both input and output tokens count toward your bill. The platform provides detailed usage tracking through the usage dashboard at platform.openai.com/usage, where you can see daily and monthly spending broken down by model.
Set spending limits in your account settings to prevent unexpected charges. OpenAI allows both hard limits (which stop requests when reached) and soft limits (which trigger notifications). Configure these based on your budget and risk tolerance.
Choose models appropriately for each task. GPT-3.5 Turbo costs a fraction of what GPT-4 does—using it for simple tasks dramatically reduces expenses. Reserve GPT-4 for complex reasoning, creative work, or tasks where quality justifies the additional cost.
Google Gemini
Google's Gemini models bring multimodal capabilities and Google's AI research prowess to the LLM landscape. Gemini excels at tasks requiring integration of text and other modalities, and offers competitive pricing with generous free tiers for development use.
Getting Started with Gemini
Visit ai.google.dev to access Google's AI platform. If you don't have a Google Cloud account, you'll need to create one—the process is straightforward and Google provides free credits for new users. Once you're logged in, navigate to the API keys section and click "Get API Key."
Google generates an API key immediately. Copy this key for use in your Lattice configuration. Unlike the other providers, Google's API keys don't follow a standard prefix pattern, so verify you've copied the complete key.
Set the API key in your environment:
export GOOGLE_API_KEY="your-api-key-here"
Configuring Gemini in Lattice
Add Gemini to your provider configuration:
[orchestrator]
default_provider = "google"
[providers.google]
api_key_env = "GOOGLE_API_KEY"
default_model = "gemini-pro"
enabled = true
Restart the daemon and verify connectivity:
lattice-ctl daemon stop
lattice-ctl daemon start
lattice-ctl provider test google
Understanding Gemini's Offerings
Gemini Pro represents Google's capable general-purpose model, handling a wide range of tasks with good performance. It supports text generation, analysis, and reasoning tasks competently. Gemini's multimodal capabilities—processing text, images, and other inputs together—make it particularly valuable for tasks requiring integration across different data types.
Google's pricing for Gemini is competitive with generous free tiers that make it accessible for development and experimentation. Check ai.google.dev for current pricing and tier details, as Google frequently updates offerings.
Using Multiple Providers Simultaneously
Lattice doesn't limit you to a single provider—configure all the providers you want access to and switch between them as needs dictate. Having multiple providers configured gives you flexibility to choose the best tool for each job, provides redundancy if one provider has outages, and lets you balance costs across different services.
A multi-provider configuration looks like this:
[orchestrator]
default_provider = "anthropic" # Default for general use
[providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-sonnet-4-6"
enabled = true
[providers.openai]
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-4-turbo"
enabled = true
[providers.google]
api_key_env = "GOOGLE_API_KEY"
default_model = "gemini-pro"
enabled = true
With this setup, Anthropic serves as your default provider for most work. When you need OpenAI's specific strengths or want to use Gemini's multimodal capabilities, those providers are available. You can specify which provider to use for specific sessions or agents, overriding the default when appropriate.
Securing Your API Keys
API keys are credentials that authorize spending on your account, so protecting them is crucial. Never commit API keys to version control—they should live only in environment variables or secure secret management systems. Add patterns like .env and files containing keys to your .gitignore to prevent accidental commits.
Rotate keys periodically as a security best practice. Most providers let you create multiple keys—generate a new one, update your configuration to use it, verify everything works, then revoke the old key. This rotation limits exposure if a key is ever compromised.
Store keys in your shell configuration files (~/.bashrc, ~/.zshrc) for local development. For production deployments, use proper secret management systems like environment variable injection, secret managers, or configuration management tools that handle credentials securely.
Never share API keys directly. If you need to collaborate with others, each person should obtain their own keys from the provider. Sharing keys makes it impossible to track who's responsible for usage and complicates revocation if someone leaves the project or a key is compromised.
Understanding Rate Limits
Cloud providers impose rate limits to prevent abuse and ensure fair resource allocation. These limits typically restrict how many requests you can make per minute, per day, or per month. When you exceed rate limits, providers return error responses instead of processing your requests.
Anthropic's rate limits vary by usage tier—new accounts have lower limits than established, high-volume customers. You can see your current limits in the Anthropic console and request increases if needed. OpenAI follows a similar tiered approach, with limits increasing as you demonstrate responsible usage over time.
When building applications that might hit rate limits, implement retry logic with exponential backoff. If a request fails due to rate limiting, wait progressively longer before retrying—this prevents hammering the API and gives the limit window time to reset. Most provider client libraries include retry logic, but verify this when using direct API access.
Distributing load across multiple providers helps work around rate limits organically. If you're pushing the limits on Anthropic, some requests can go to OpenAI or Google instead. This natural load balancing keeps all providers within their limits while maintaining total throughput.
Monitoring Usage and Costs
Every cloud provider offers usage dashboards where you can track request volumes, token consumption, and resulting costs. Check these dashboards regularly to ensure spending aligns with expectations and to catch anomalies early.
Set up billing alerts in each provider's console. Configure alerts at levels that give you early warning—perhaps at 50%, 75%, and 90% of your monthly budget. These notifications prevent surprise bills and give you time to investigate unexpected usage spikes.
Track which parts of your application or which workflows consume the most tokens. This attribution helps optimize costs by identifying inefficient prompts, unnecessary requests, or opportunities to use cheaper models without sacrificing quality. Many providers offer detailed breakdowns by API key, making it easy to track usage across different projects or environments.
Troubleshooting Cloud Provider Issues
When cloud provider requests fail, several common issues are worth checking first. Invalid or missing API keys cause authentication failures—verify the key is set correctly with echo $ANTHROPIC_API_KEY or the equivalent for your provider. Ensure the daemon restarted after you set the key, as it needs to inherit the environment variable.
Rate limit errors indicate you're exceeding allowed request volumes. Check your usage in the provider's dashboard to confirm you're hitting limits. If legitimate usage exceeds your limits, contact the provider to request an increase. For development spikes causing temporary limit issues, spread requests over time or use multiple API keys across different accounts.
Network timeouts suggest connectivity problems between your machine and the provider. Test basic connectivity with curl to the provider's API endpoint. If that works but Lattice still times out, check your IPC timeout settings—increase ipc_timeout_ms in the daemon configuration for slow network conditions.
Model not found errors mean you're requesting a model that doesn't exist or you don't have access to. Verify the model name matches exactly what the provider expects—model names are case-sensitive and version-specific. Check your API tier to ensure you have access to the requested model.
Next Steps
Cloud provider configuration is one piece of the Lattice puzzle. The local models guide covers setting up Ollama and LM Studio for offline, private AI. The configuration overview explains how all configuration pieces fit together. For testing and troubleshooting providers, see the provider CLI commands guide. Finally, return to the Lattice overview to understand how cloud and local providers complement each other in a complete AI setup.