Common issues, error scenarios, and solutions for the py_rt algorithmic trading system.
- Quick Diagnostics
- Service-Specific Issues
- Network and Connectivity
- API and Authentication
- Performance Issues
- Data and Configuration
- Emergency Scenarios
# 1. Check all service status
./scripts/health_check.sh
# 2. View recent logs for errors
sudo journalctl -u trading-* --priority=err --since "10 minutes ago"
# 3. Check system resources
top -b -n 1 | head -20
df -h
# 4. Verify ZeroMQ connections
netstat -an | grep -E '(5555|5556|5557|5558)'
# 5. Check Prometheus metrics
curl -s http://localhost:9090/metrics | grep -E "(error|failed|disconnect)"| Error Pattern | Location | Common Cause |
|---|---|---|
Connection refused |
Any service | Service not running or wrong port |
WebSocket error |
Market Data | API key issue or network problem |
Risk check failed |
Risk Manager | Position limit exceeded |
Order rejected |
Execution Engine | Insufficient buying power or invalid order |
Timeout |
Any service | Network latency or overloaded service |
Parse error |
Any service | Configuration file syntax error |
Symptoms:
ERROR market_data: WebSocket connection failed: Connection refused
ERROR market_data: Failed to authenticate with Alpaca
Diagnosis:
# Check service status
sudo systemctl status trading-market-data
# Check API credentials
grep APCA_API_KEY .env
# Test WebSocket connection manually
wscat -c wss://stream.data.alpaca.markets/v2/iexSolutions:
- Invalid API credentials:
# Verify credentials with Alpaca API
curl -H "APCA-API-KEY-ID: $APCA_API_KEY_ID" \
-H "APCA-API-SECRET-KEY: $APCA_API_SECRET_KEY" \
https://paper-api.alpaca.markets/v2/account
# If 401 Unauthorized, regenerate keys at alpaca.markets- Network connectivity:
# Test DNS resolution
nslookup stream.data.alpaca.markets
# Test connectivity
telnet stream.data.alpaca.markets 443
# Check firewall
sudo ufw status- Service binding issue:
# Check if port is already in use
sudo lsof -i :5555
# Kill conflicting process
sudo kill -9 <PID>
# Restart service
sudo systemctl restart trading-market-dataSymptoms:
WARN market_data: Message processing latency high: 250ms
Diagnosis:
# Check current latency
curl -s http://localhost:9090/metrics | grep market_data_latency_seconds
# Check message queue depth
curl -s http://localhost:9090/metrics | grep market_data_queue_depthSolutions:
- CPU bottleneck:
# Check CPU usage
top -b -n 1 | grep market-data
# Increase process priority
sudo renice -n -5 -p $(pgrep market-data)- Message queue backup:
# Restart service to clear queue
sudo systemctl restart trading-market-data
# Reduce subscribed symbols if too many
nano config/system.json
# Reduce symbols list to 5-10 most active- Network issues:
# Check network latency to Alpaca
ping stream.data.alpaca.markets
# Check packet loss
mtr -c 100 stream.data.alpaca.marketsSymptoms:
- No quotes received for specific symbols
- Gaps in price data
Diagnosis:
# Check subscription status
sudo journalctl -u trading-market-data -n 100 | grep -i subscribe
# Check message counts per symbol
curl -s http://localhost:9090/metrics | grep market_data_messages_by_symbolSolutions:
- Symbol not subscribed:
# Add symbol to config
nano config/system.json
# Add to "symbols": ["AAPL", "MSFT", "NEW_SYMBOL"]
# Restart market data service
sudo systemctl restart trading-market-data- Symbol suspended or halted:
# Check symbol status via Alpaca API
curl -H "APCA-API-KEY-ID: $APCA_API_KEY_ID" \
-H "APCA-API-SECRET-KEY: $APCA_API_SECRET_KEY" \
https://paper-api.alpaca.markets/v2/assets/AAPLSymptoms:
ERROR risk_manager: Risk check failed: Position limit exceeded
ERROR risk_manager: Risk check failed: Max daily loss reached
Diagnosis:
# Check current risk metrics
curl -s http://localhost:9090/metrics | grep -E "(risk_|position_)"
# Check circuit breaker status
curl -s http://localhost:9090/metrics | grep circuit_breaker_active
# View risk violations
sudo journalctl -u trading-risk-manager -n 100 | grep "Risk check failed"Solutions:
- Position limit exceeded:
# Check current positions
python scripts/check_positions.py
# Close positions if needed
python scripts/close_position.py --symbol AAPL
# Or adjust limits
nano config/risk_limits.toml
# Increase max_shares or max_notional_per_position
sudo systemctl restart trading-risk-manager- Daily loss limit reached:
# Check current P&L
python scripts/check_pnl.py
# If circuit breaker activated incorrectly, deactivate
curl -X POST http://localhost:8080/api/v1/circuit-breaker/deactivate
# Adjust threshold if too conservative
nano config/risk_limits.toml
# Increase max_daily_loss
sudo systemctl restart trading-risk-manager- Concentration limit:
# Check position concentration
python scripts/check_concentration.py
# Diversify positions or adjust limits
nano config/risk_limits.toml
# Increase max_concentration_percentSymptoms:
WARN risk_manager: No market data received for 30 seconds
ERROR risk_manager: Cannot calculate position value without market data
Diagnosis:
# Check ZeroMQ subscription
netstat -an | grep 5555
# Verify market data service is publishing
python scripts/monitor_zmq.py --port 5555
# Check risk manager logs
sudo journalctl -u trading-risk-manager -fSolutions:
- ZeroMQ connection issue:
# Verify ZeroMQ configuration
grep zmq_subscribe_address config/system.json
# Should be: "tcp://127.0.0.1:5555"
# Restart risk manager
sudo systemctl restart trading-risk-manager- Market data service not running:
# Start market data service first
sudo systemctl start trading-market-data
sleep 5
# Then start risk manager
sudo systemctl start trading-risk-managerSymptoms:
ERROR execution_engine: Failed to submit order: API rate limit exceeded
ERROR execution_engine: Order submission timeout
Diagnosis:
# Check order submission rate
curl -s http://localhost:9090/metrics | grep execution_orders_submitted_total
# Check API rate limiting
curl -s http://localhost:9090/metrics | grep execution_rate_limit_errors
# View recent order failures
sudo journalctl -u trading-execution-engine -n 50 | grep -i "failed\|error"Solutions:
- API rate limit exceeded:
# Reduce order submission rate
nano config/system.json
# Adjust: "rate_limit_per_second": 100 (from 200)
# Enable order batching
nano rust/execution-engine/src/main.rs
# Implement order batching logic
# Restart execution engine
sudo systemctl restart trading-execution-engine- Network timeout:
# Increase timeout values
nano config/system.json
# Increase: "retry_delay_ms": 2000
# Check network connectivity to Alpaca
ping paper-api.alpaca.markets
# Test API directly
curl -H "APCA-API-KEY-ID: $APCA_API_KEY_ID" \
-H "APCA-API-SECRET-KEY: $APCA_API_SECRET_KEY" \
https://paper-api.alpaca.markets/v2/orders- Insufficient buying power:
# Check account status
curl -H "APCA-API-KEY-ID: $APCA_API_KEY_ID" \
-H "APCA-API-SECRET-KEY: $APCA_API_SECRET_KEY" \
https://paper-api.alpaca.markets/v2/account | jq '.buying_power'
# Reduce order sizes
nano config/risk_limits.toml
# Decrease: max_order_valueSymptoms:
- Orders filled at prices significantly different from expected
- Metrics show high slippage values
Diagnosis:
# Check slippage metrics
curl -s http://localhost:9090/metrics | grep execution_slippage
# Review recent fills
python scripts/analyze_fills.py --since "1 hour ago"Solutions:
- Using market orders in low liquidity:
# Switch to limit orders
nano config/system.json
# Change default order type to limit
# Add limit price calculation
# Price = mid_price + (spread * max_slippage_tolerance)- Large order size:
# Implement TWAP or VWAP execution
python scripts/enable_algo_execution.py --algo twap
# Reduce order size
nano config/risk_limits.toml
# Decrease: max_order_size- Trading during volatile periods:
# Avoid trading during market open/close
nano config/risk_limits.toml
# Set blackout_periods = ["09:30-09:45", "15:45-16:00"]Symptoms:
ERROR signal_bridge: Failed to load ONNX model: File not found
ERROR signal_bridge: Model inference error: Invalid input shape
Diagnosis:
# Check model file
ls -lh models/
# Verify model path in config
grep model_path config/system.json
# Test model loading
python scripts/test_model_loading.py models/trading_model.onnxSolutions:
- Model file not found:
# Verify model exists
ls -l models/trading_model.onnx
# Copy model from backup
cp backups/models/trading_model.onnx models/
# Restart signal bridge
sudo systemctl restart trading-signal-bridge- Model version incompatibility:
# Re-export model with correct ONNX version
python scripts/export_model.py --onnx-version 14
# Update Rust ONNX runtime version
cd rust/signal-bridge
cargo update -p ort
# Rebuild and restart
cargo build --release
sudo systemctl restart trading-signal-bridge- Input shape mismatch:
# Check model input requirements
python scripts/inspect_onnx_model.py models/trading_model.onnx
# Ensure feature vector matches expected shape
nano config/system.json
# Verify "features" list matches model inputSymptoms:
ERROR common: Failed to bind ZeroMQ socket: Address already in use
Diagnosis:
# Check what's using the port
sudo lsof -i :5555
# Check ZeroMQ socket status
ss -tan | grep -E '(5555|5556|5557|5558)'Solutions:
- Port already in use:
# Kill process using the port
sudo kill -9 $(lsof -t -i:5555)
# Or change port in configuration
nano config/system.json
# Change: "zmq_publish_address": "tcp://127.0.0.1:5560"- Permission denied:
# Use ports above 1024 (no root required)
# Or add capability to binary
sudo setcap cap_net_bind_service=+ep rust/target/release/market-dataSymptoms:
- Cannot connect to external APIs
- WebSocket connections timeout
Diagnosis:
# Check firewall status
sudo ufw status
# Check iptables rules
sudo iptables -L -n
# Test connection
telnet paper-api.alpaca.markets 443Solutions:
# Allow outbound HTTPS connections
sudo ufw allow out 443/tcp
# Allow Alpaca API endpoints
sudo ufw allow out to paper-api.alpaca.markets
# Reload firewall
sudo ufw reloadSymptoms:
WARN execution_engine: API request took 5000ms (expected <1000ms)
Diagnosis:
# Measure latency to Alpaca
ping -c 100 paper-api.alpaca.markets | tail -1
# Trace route
traceroute paper-api.alpaca.markets
# Check network interface errors
ifconfig | grep -E "(errors|dropped)"Solutions:
- Network congestion:
# Check bandwidth usage
iftop -i eth0
# Prioritize trading traffic (QoS)
sudo tc qdisc add dev eth0 root handle 1: htb default 12
sudo tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit- Sub-optimal routing:
# Consider using VPN closer to exchange
# Or co-locate server in AWS us-east-1 (closer to Alpaca)Symptoms:
ERROR execution_engine: 401 Unauthorized
ERROR market_data: Authentication failed: Invalid API key
Diagnosis:
# Test API credentials
curl -v -H "APCA-API-KEY-ID: $APCA_API_KEY_ID" \
-H "APCA-API-SECRET-KEY: $APCA_API_SECRET_KEY" \
https://paper-api.alpaca.markets/v2/account
# Check environment variables
env | grep APCA_Solutions:
- Invalid or expired keys:
# Regenerate API keys at alpaca.markets dashboard
# Update .env file
nano .env
# APCA_API_KEY_ID=new_key
# APCA_API_SECRET_KEY=new_secret
# Restart all services
./scripts/restart_trading_system.sh- Keys not loaded:
# Ensure .env is sourced
source .env
# Verify systemd service loads environment
sudo systemctl edit trading-market-data
# Add: EnvironmentFile=/opt/RustAlgorithmTrading/.env
# Reload and restart
sudo systemctl daemon-reload
sudo systemctl restart trading-market-dataSymptoms:
- Orders submitted to wrong environment
- Cannot find positions in expected account
Diagnosis:
# Check current configuration
grep APCA_API_BASE_URL .env
grep paper_trading config/system.json
# Verify account type
curl -H "APCA-API-KEY-ID: $APCA_API_KEY_ID" \
-H "APCA-API-SECRET-KEY: $APCA_API_SECRET_KEY" \
$APCA_API_BASE_URL/v2/account | jq '.account_number'Solutions:
# Ensure paper trading is enabled
nano .env
# APCA_API_BASE_URL=https://paper-api.alpaca.markets
nano config/system.json
# "paper_trading": true
# NEVER switch to live trading without proper testing
# and explicit configuration changeSymptoms:
WARN kernel: Out of memory: Killed process <PID> (market-data)
Diagnosis:
# Check memory usage
free -h
ps aux --sort=-%mem | head -10
# Check service-specific memory
sudo systemctl status trading-market-data | grep Memory
# Monitor over time
watch -n 1 'ps aux | grep -E "(market-data|risk-manager|execution-engine)"'Solutions:
- Memory leak:
# Check for memory leak in logs
sudo journalctl -u trading-market-data | grep -i "memory\|leak"
# Restart service to clear
sudo systemctl restart trading-market-data
# Update to latest version (may contain fixes)
git pull
cd rust
cargo build --release
sudo systemctl restart trading-*- Insufficient system memory:
# Add swap space
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Or upgrade server memory- Message queue buildup:
# Reduce message retention
nano rust/market-data/src/main.rs
# Set ZMQ_SNDHWM and ZMQ_RCVHWM to lower values
# Rebuild and restart
cd rust
cargo build --release
sudo systemctl restart trading-market-dataSymptoms:
- Services consuming >90% CPU
- System unresponsive
Diagnosis:
# Identify CPU-intensive process
top -b -n 1 | head -20
# Check per-thread CPU usage
top -H -p $(pgrep market-data)
# Profile with perf
sudo perf record -p $(pgrep market-data) -g -- sleep 10
sudo perf reportSolutions:
- Tight loop or busy-wait:
# Check for recent code changes
git log --oneline -10
# Revert to last known good version
git checkout v1.0.0
cd rust
cargo build --release- Excessive logging:
# Reduce log level
export RUST_LOG=info # change from debug
# Restart services
sudo systemctl restart trading-*- Too many symbols:
# Reduce subscribed symbols
nano config/system.json
# Keep only 5-10 most liquid symbols
sudo systemctl restart trading-market-dataSymptoms:
- Orders taking >1 second to submit
- High execution latency in metrics
Diagnosis:
# Check execution latency
curl -s http://localhost:9090/metrics | grep execution_latency_seconds
# Profile execution path
sudo strace -c -p $(pgrep execution-engine)Solutions:
- Network latency:
# Co-locate closer to exchange
# Use lower-latency network provider
# Enable TCP fast open
sudo sysctl -w net.ipv4.tcp_fastopen=3- Synchronous API calls:
# Ensure async execution
# Check Rust code uses Tokio async/await properly
cd rust/execution-engine
cargo build --release --features async-execution- CPU throttling:
# Disable CPU frequency scaling
sudo cpupower frequency-set --governor performanceSymptoms:
ERROR common: Failed to parse config: expected value at line 45 column 5
Diagnosis:
# Validate JSON syntax
jq . config/system.json
# Validate TOML syntax
python -c "import toml; toml.load('config/risk_limits.toml')"Solutions:
# Fix JSON syntax errors
# Common issues:
# - Trailing commas
# - Missing quotes
# - Unescaped special characters
# Restore from backup if needed
cp config/system.json.bak config/system.json
# Verify after fix
jq . config/system.json > /dev/null && echo "Valid JSON"Symptoms:
ERROR position_tracker: Failed to deserialize position data
ERROR signal_bridge: Cannot read model file
Diagnosis:
# Check file integrity
sha256sum data/positions.dat
sha256sum models/trading_model.onnx
# Compare with checksums from backupSolutions:
# Restore from backup
cp backups/data/positions.dat data/
cp backups/models/trading_model.onnx models/
# Verify checksums match
sha256sum -c checksums.txt
# Restart services
sudo systemctl restart trading-*Immediate Actions:
# 1. Activate emergency stop
./scripts/emergency_stop.sh
# 2. Check system resources
top
df -h
# 3. Kill hung processes if needed
sudo killall -9 market-data risk-manager execution-engine signal-bridge
# 4. Check for kernel panic
dmesg | tail -50
# 5. Reboot if necessary
sudo rebootImmediate Actions:
# 1. Activate circuit breaker
curl -X POST http://localhost:8080/api/v1/circuit-breaker/activate
# 2. Cancel all open orders
python scripts/cancel_all_orders.py
# 3. Liquidate positions if needed
python scripts/liquidate_positions.py --confirm
# 4. Review P&L
python scripts/check_pnl.py --detailed
# 5. Investigate cause
sudo journalctl -u trading-* --since "1 hour ago" | grep -i "error\|loss"
# 6. Document incident
./scripts/create_incident_report.shImmediate Actions:
# 1. Check WebSocket connection
sudo journalctl -u trading-market-data -n 100 | grep -i "websocket\|connect"
# 2. Verify API status
curl https://status.alpaca.markets/
# 3. Restart market data service
sudo systemctl restart trading-market-data
# 4. If still failing, switch to backup feed
nano config/system.json
# Switch websocket_url to backup
# 5. Halt trading if no data available
curl -X POST http://localhost:8080/api/v1/circuit-breaker/activate# Run diagnostic collection script
./scripts/collect_diagnostics.sh
# This creates: diagnostics-<timestamp>.tar.gz containing:
# - Service logs
# - Configuration files (with secrets redacted)
# - Metrics snapshots
# - System information
# - Recent errors- GitHub Issues: https://github.com/SamoraDC/RustAlgorithmTrading/issues
- Documentation: https://github.com/SamoraDC/RustAlgorithmTrading/docs
- Email: davi.samora@example.com
- System information:
uname -a - Version:
git describe --tags - Logs:
sudo journalctl -u trading-* --since "1 hour ago" - Metrics snapshot:
curl localhost:9090/metrics - Configuration (redact secrets):
config/system.json
- Deployment Guide - Initial setup and deployment
- Operations Guide - Day-to-day operations
- Architecture Documentation - System design
- API Documentation - API reference