Configuration Guide
This guide covers all configuration options available in the Semantic Router, from basic setup to advanced customization for production deployments.
Configuration File Structure
The main configuration file is located at config/config.yaml. Here's the complete structure:
# config/config.yaml
router:
# Server configuration
host: "0.0.0.0"
port: 50051
log_level: "info" # debug, info, warn, error
# Model paths and configuration
models:
category_classifier: "./models/category_classifier_modernbert-base_model"
pii_detector: "./models/pii_classifier_modernbert-base_model"
jailbreak_guard: "./models/jailbreak_classifier_modernbert-base_model"
intent_classifier: "./models/intent_classifier_modernbert-base_model"
# Backend model endpoints
endpoints:
endpoint1:
url: "http://192.168.12.90:11434"
model_type: "math"
model_name: "llama2-math-7b"
cost_per_token: 0.002
max_tokens: 4096
timeout: 300
health_check_path: "/health"
endpoint2:
url: "http://192.168.12.91:11434"
model_type: "creative"
model_name: "llama2-creative-13b"
cost_per_token: 0.005
max_tokens: 8192
timeout: 600
endpoint3:
url: "http://192.168.12.92:11434"
model_type: "code"
model_name: "codellama-34b"
cost_per_token: 0.008
max_tokens: 4096
timeout: 300
general_endpoint:
url: "http://192.168.12.93:11434"
model_type: "general"
model_name: "llama2-70b"
cost_per_token: 0.015
max_tokens: 4096
timeout: 300
# Classification configuration
classification:
confidence_threshold: 0.75
fallback_model: "general"
enable_ensemble: false
ensemble_weights: [0.6, 0.4] # If ensemble enabled
# Security settings
security:
enable_pii_detection: true
enable_jailbreak_guard: true
pii_action: "block" # block, mask, allow
jailbreak_action: "block" # block, flag, allow
pii_confidence_threshold: 0.8
jailbreak_confidence_threshold: 0.3 # Low threshold for safety
# Semantic cache configuration
cache:
enabled: true
cache_type: "memory" # memory, redis
similarity_threshold: 0.85
ttl_seconds: 3600
max_entries: 10000
cleanup_interval: 300
# Redis configuration (if cache_type: redis)
redis:
host: "localhost"
port: 6379
password: ""
database: 0
# Tools configuration
tools:
auto_selection: true
max_tools: 5
relevance_threshold: 0.6
tools_database_path: "./config/tools_db.json"
# Monitoring and metrics
monitoring:
enable_metrics: true
metrics_port: 9090
enable_tracing: false
jaeger_endpoint: "http://localhost:14268/api/traces"
# Performance tuning
performance:
max_concurrent_requests: 100
request_timeout: 30
classification_timeout: 5
enable_batching: false
batch_size: 10
batch_timeout: 100 # milliseconds
Detailed Configuration Options
Server Configuration
router:
host: "0.0.0.0" # Bind address (0.0.0.0 for all interfaces)
port: 50051 # gRPC server port
log_level: "info" # Logging level: debug, info, warn, error
max_message_size: 4194304 # 4MB max message size
Model Configuration
Model Paths
models:
category_classifier: "./models/category_classifier_modernbert-base_model"
pii_detector: "./models/pii_classifier_modernbert-base_model"
jailbreak_guard: "./models/jailbreak_classifier_modernbert-base_model"
intent_classifier: "./models/intent_classifier_modernbert-base_model"
# Optional: Custom model configurations
custom_models:
legal_classifier: "./models/legal_classifier_model"
medical_classifier: "./models/medical_classifier_model"
Endpoint Configuration
Each endpoint represents a backend LLM that can handle requests:
endpoints:
my_endpoint:
url: "http://my-model-server:8080" # Backend URL
model_type: "specialized_domain" # Category this model handles
model_name: "my-custom-model-v1" # Model identifier
cost_per_token: 0.001 # Cost in dollars per token
max_tokens: 2048 # Maximum tokens for this model
timeout: 300 # Request timeout in seconds
health_check_path: "/health" # Health check endpoint
headers: # Custom headers
Authorization: "Bearer token123"
X-Custom-Header: "value"
retry_count: 3 # Number of retries on failure
circuit_breaker: # Circuit breaker configuration
failure_threshold: 5
reset_timeout: 60
Classification Settings
Fine-tune how the router makes routing decisions:
classification:
# Global confidence threshold for routing decisions
confidence_threshold: 0.75
# Fallback model when confidence is low
fallback_model: "general"
# Per-category confidence thresholds
category_thresholds:
mathematics: 0.85 # Require high confidence for math routing
creative: 0.70 # Allow lower confidence for creative
code: 0.80 # High confidence for code generation
# Ensemble classification (multiple models voting)
enable_ensemble: false
ensemble_models: ["model1", "model2", "model3"]
ensemble_weights: [0.5, 0.3, 0.2]
# Advanced options
enable_confidence_calibration: true
calibration_temperature: 1.5
Security Configuration
Configure PII detection and jailbreak protection:
security:
# PII Detection
enable_pii_detection: true
pii_action: "block" # block, mask, allow
pii_confidence_threshold: 0.8
pii_entity_types: ["PERSON", "EMAIL", "PHONE", "SSN", "LOCATION"]
# Custom PII patterns (regex)
custom_pii_patterns:
credit_card: '\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
api_key: '\b[A-Za-z0-9]{32,}\b'
# Jailbreak Protection
enable_jailbreak_guard: true
jailbreak_action: "block" # block, flag, allow
jailbreak_confidence_threshold: 0.3 # Low threshold for safety
# Additional security measures
rate_limiting:
enabled: true
requests_per_minute: 60
burst_size: 10
ip_whitelist:
enabled: false
allowed_ips: ["192.168.1.0/24", "10.0.0.0/8"]
Cache Configuration
Configure semantic caching for performance:
cache:
enabled: true
cache_type: "memory" # memory, redis, hybrid
# Similarity settings
similarity_threshold: 0.85 # Cosine similarity threshold
similarity_algorithm: "cosine" # cosine, euclidean, dot_product
# Memory cache settings
max_entries: 10000
ttl_seconds: 3600
cleanup_interval: 300
# Redis cache settings (if cache_type: redis or hybrid)
redis:
host: "localhost"
port: 6379
password: "mypassword"
database: 0
pool_size: 10
connection_timeout: 5
# Cache warming
enable_cache_warming: false
warm_up_queries: ["common query 1", "common query 2"]
# Cache analytics
enable_cache_metrics: true
log_cache_performance: true
Tools Configuration
Configure automatic tool selection:
tools:
auto_selection: true
max_tools: 5
relevance_threshold: 0.6
# Tools database
tools_database_path: "./config/tools_db.json"
# Tool categories and weights
tool_categories:
calculation:
weight: 1.0
max_tools: 3
web_search:
weight: 0.8
max_tools: 2
file_operations:
weight: 0.9
max_tools: 2
# Custom tool scoring
custom_scoring:
enable_semantic_scoring: true
enable_keyword_scoring: true
enable_category_scoring: true
weights: [0.4, 0.4, 0.2] # semantic, keyword, category
Environment-Specific Configurations
Development Configuration
# config/development.yaml
router:
log_level: "debug"
classification:
confidence_threshold: 0.5 # Lower for testing
security:
enable_pii_detection: false # Disable for testing
enable_jailbreak_guard: false
cache:
ttl_seconds: 300 # Shorter cache for development
monitoring:
enable_metrics: true
enable_tracing: true
Production Configuration
# config/production.yaml
router:
log_level: "warn"
classification:
confidence_threshold: 0.8 # Higher for production
enable_ensemble: true
security:
enable_pii_detection: true
enable_jailbreak_guard: true
pii_action: "block"
jailbreak_action: "block"
rate_limiting:
enabled: true
requests_per_minute: 1000
cache:
cache_type: "redis"
ttl_seconds: 7200 # Longer cache
performance:
max_concurrent_requests: 1000
enable_batching: true
monitoring:
enable_metrics: true
enable_tracing: true
Testing Configuration
# config/testing.yaml
router:
log_level: "debug"
endpoints:
mock_endpoint:
url: "http://localhost:8080/mock"
model_type: "general"
classification:
confidence_threshold: 0.1 # Very low for testing all paths
security:
enable_pii_detection: true
pii_action: "flag" # Don't block in tests
enable_jailbreak_guard: true
jailbreak_action: "flag"
cache:
enabled: false # Disable cache for consistent test results
Dynamic Configuration Updates
Hot Reloading
Enable configuration hot reloading for production environments:
router:
config:
enable_hot_reload: true
reload_interval: 60 # Check for changes every 60 seconds
reload_signal: "SIGHUP" # Signal to trigger reload
Configuration Management
Use environment variables for sensitive values:
# Environment variables
export ROUTER_REDIS_PASSWORD="secure_password"
export ROUTER_API_KEY="your_api_key"
export ROUTER_LOG_LEVEL="info"
# In config file
router:
redis:
password: "${ROUTER_REDIS_PASSWORD}"
api:
key: "${ROUTER_API_KEY}"
log_level: "${ROUTER_LOG_LEVEL:info}" # Default to "info"
Configuration Validation
Built-in Validation
The router validates configuration on startup:
# Test configuration
./bin/router -config config/config.yaml -validate-only
# Check specific section
./bin/router -config config/config.yaml -validate-section=endpoints
Configuration Schema
Use JSON Schema validation:
# Install schema validator
npm install -g ajv-cli
# Validate configuration
ajv validate -s config/schema.json -d config/config.yaml
Advanced Configuration Patterns
Multi-Tenant Configuration
# config/multi-tenant.yaml
router:
tenants:
tenant_a:
classification:
confidence_threshold: 0.8
endpoints: ["endpoint1", "endpoint2"]
security:
enable_pii_detection: true
tenant_b:
classification:
confidence_threshold: 0.6
endpoints: ["endpoint3", "endpoint4"]
security:
enable_pii_detection: false
Load Balancing Configuration
router:
endpoints:
math_cluster:
type: "cluster"
load_balancing: "round_robin" # round_robin, weighted, least_connections
members:
- url: "http://math1:8080"
weight: 1
- url: "http://math2:8080"
weight: 2
- url: "http://math3:8080"
weight: 1
health_check:
enabled: true
interval: 30
timeout: 5
healthy_threshold: 2
unhealthy_threshold: 3
A/B Testing Configuration
router:
experiments:
model_comparison:
enabled: true
traffic_split: 0.1 # 10% to experimental model
control_endpoint: "endpoint1"
experimental_endpoint: "endpoint2"
metrics_collection: true
feature_flags:
enable_new_classifier: false
enable_advanced_caching: true
enable_multi_model_routing: false
Configuration Best Practices
1. Security Best Practices
# Use strong security settings in production
security:
enable_pii_detection: true
pii_action: "block"
enable_jailbreak_guard: true
jailbreak_action: "block"
# Enable rate limiting
rate_limiting:
enabled: true
requests_per_minute: 100
# Use IP whitelisting if applicable
ip_whitelist:
enabled: true
allowed_ips: ["trusted_network/24"]
2. Performance Best Practices
# Optimize for performance
performance:
max_concurrent_requests: 500
enable_batching: true
batch_size: 20
cache:
enabled: true
cache_type: "redis" # Use Redis for distributed caching
max_entries: 50000
ttl_seconds: 3600
classification:
confidence_threshold: 0.75 # Balance accuracy and speed
3. Monitoring Best Practices
# Comprehensive monitoring
monitoring:
enable_metrics: true
enable_tracing: true
enable_logging: true
# Detailed metrics collection
detailed_metrics:
classification_latency: true
cache_performance: true
security_events: true
endpoint_health: true
Troubleshooting Configuration
Common Configuration Issues
-
Invalid YAML syntax
-
Missing model files
-
Unreachable endpoints
-
Port conflicts
Configuration Debugging
Enable debug logging for configuration issues:
# Run with verbose configuration logging
./bin/router -config config/config.yaml -log-level debug -config-debug
Next Steps
- API Reference: Detailed API documentation
- Architecture Guide: Understand the system design and monitoring
- Installation Guide: Deployment setup and requirements
For more advanced configuration options, refer to the specific component documentation or join our community discussions.