Building Deterministic Systems with Non-Deterministic AI
The current AI hype cycle has led many organizations to approach every problem with “let’s add AI to it.” While AI capabilities are genuinely impressive, treating AI as the solution rather than a tool often leads to unreliable, unpredictable systems that fail in production. The key insight is this: AI should be a component within a deterministic system, not the system itself.
In this post, I’ll share architectural patterns for building reliable systems that leverage AI’s strengths while compensating for its weaknesses - with practical examples from cybersecurity operations.
The Problem with AI-First Architecture
Non-Deterministic by Nature
Large Language Models are fundamentally probabilistic. The same prompt with identical context can yield different outputs across invocations. This is acceptable for creative tasks but problematic for:
- Security decisions - Inconsistent threat classifications lead to missed attacks or alert fatigue
- Compliance workflows - Auditors expect reproducible, explainable decisions
- Automated responses - Unpredictable actions can cause outages or security incidents
- Integration points - Downstream systems expect consistent data formats
The Hallucination Problem
AI models confidently generate plausible-sounding but incorrect information. In a security context, this could mean:
- Fabricated CVE numbers that don’t exist
- Incorrect remediation steps that introduce new vulnerabilities
- False attribution to threat actors
- Imaginary network indicators that waste analyst time
Context Window Limitations
AI agents lose context over long operations. A 30-minute incident investigation might “forget” critical findings from earlier analysis, leading to incomplete or contradictory conclusions.
The Solution: AI as a Tool, Not the Brain
The fundamental shift is treating AI as one component in a larger deterministic system - similar to how you’d use any other tool with known limitations.
┌─────────────────────────────────────────────────────────────┐
│ Deterministic Orchestration Layer │
│ (State machine, workflow engine, rules engine) │
└─────────┬─────────────────┬─────────────────┬───────────────┘
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ AI Agent │ │ Rule-Based│ │ External │
│ (Tool) │ │ Logic │ │ APIs │
└───────────┘ └───────────┘ └───────────┘
│ │ │
└─────────────────┴─────────────────┘
│
▼
┌───────────────────┐
│ Validation Layer │
│ (Schema, rules) │
└───────────────────┘
Core Principles
- Deterministic orchestration - The overall workflow is controlled by predictable logic
- AI for specific subtasks - Use AI only where its strengths apply
- Validation boundaries - Every AI output passes through validation before action
- Human checkpoints - Critical decisions require human approval
- Fallback paths - Graceful degradation when AI fails or produces invalid output
Pattern 1: The Validator Pattern
Never trust AI output directly. Wrap every AI interaction with validation.
Example: Threat Intelligence Enrichment
from dataclasses import dataclass
from enum import Enum
import re
class RiskLevel(Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
UNKNOWN = "unknown"
@dataclass
class ThreatAssessment:
indicator: str
risk_level: RiskLevel
category: str
confidence: float
sources: list[str]
class ThreatIntelService:
"""Deterministic wrapper around AI-powered threat analysis."""
VALID_CATEGORIES = {
"malware", "phishing", "c2", "scanner", "tor_exit",
"botnet", "spam", "unknown"
}
def __init__(self, ai_agent, threat_db):
self.ai_agent = ai_agent
self.threat_db = threat_db # Deterministic source of truth
def assess_indicator(self, indicator: str) -> ThreatAssessment:
# Step 1: Check deterministic sources first
known_threat = self.threat_db.lookup(indicator)
if known_threat:
return known_threat # Deterministic result
# Step 2: Use AI for unknown indicators
ai_result = self.ai_agent.analyze(indicator)
# Step 3: Validate AI output against schema
validated = self._validate_ai_result(ai_result, indicator)
# Step 4: Apply business rules
final_result = self._apply_rules(validated)
return final_result
def _validate_ai_result(self, ai_result: dict, indicator: str) -> ThreatAssessment:
"""Force AI output into valid, deterministic structure."""
# Validate risk level - default to UNKNOWN if invalid
try:
risk_level = RiskLevel(ai_result.get("risk_level", "").lower())
except ValueError:
risk_level = RiskLevel.UNKNOWN
# Validate category - reject hallucinated categories
category = ai_result.get("category", "unknown").lower()
if category not in self.VALID_CATEGORIES:
category = "unknown"
# Validate confidence is a proper float
try:
confidence = float(ai_result.get("confidence", 0))
confidence = max(0.0, min(1.0, confidence)) # Clamp to [0, 1]
except (TypeError, ValueError):
confidence = 0.0
# Validate sources exist and are reasonable
sources = ai_result.get("sources", [])
if not isinstance(sources, list):
sources = []
sources = [s for s in sources if isinstance(s, str) and len(s) < 500]
return ThreatAssessment(
indicator=indicator,
risk_level=risk_level,
category=category,
confidence=confidence,
sources=sources
)
def _apply_rules(self, assessment: ThreatAssessment) -> ThreatAssessment:
"""Apply deterministic business rules on top of AI assessment."""
# Rule: Private IPs are always low risk
if self._is_private_ip(assessment.indicator):
assessment.risk_level = RiskLevel.LOW
assessment.category = "internal"
assessment.confidence = 1.0
# Rule: Known good domains override AI assessment
if self._is_known_good(assessment.indicator):
assessment.risk_level = RiskLevel.LOW
assessment.confidence = 1.0
# Rule: Low confidence AI results get downgraded
if assessment.confidence < 0.5 and assessment.risk_level == RiskLevel.CRITICAL:
assessment.risk_level = RiskLevel.HIGH
return assessment
Key points:
- AI is consulted only after deterministic sources fail
- Every AI output field is validated and sanitized
- Business rules override AI decisions when appropriate
- Invalid AI output results in safe defaults, not errors
Pattern 2: The State Machine Pattern
Use explicit state machines to control workflow progression, with AI handling specific transitions.
Example: Automated Incident Response
from enum import Enum, auto
from typing import Callable
class IncidentState(Enum):
DETECTED = auto()
TRIAGING = auto()
INVESTIGATING = auto()
CONTAINING = auto()
AWAITING_APPROVAL = auto()
REMEDIATING = auto()
VALIDATING = auto()
CLOSED = auto()
ESCALATED = auto()
class IncidentStateMachine:
"""Deterministic state machine with AI-assisted transitions."""
# Define valid state transitions
TRANSITIONS = {
IncidentState.DETECTED: [IncidentState.TRIAGING, IncidentState.ESCALATED],
IncidentState.TRIAGING: [IncidentState.INVESTIGATING, IncidentState.CLOSED, IncidentState.ESCALATED],
IncidentState.INVESTIGATING: [IncidentState.CONTAINING, IncidentState.CLOSED, IncidentState.ESCALATED],
IncidentState.CONTAINING: [IncidentState.AWAITING_APPROVAL, IncidentState.ESCALATED],
IncidentState.AWAITING_APPROVAL: [IncidentState.REMEDIATING, IncidentState.ESCALATED],
IncidentState.REMEDIATING: [IncidentState.VALIDATING, IncidentState.ESCALATED],
IncidentState.VALIDATING: [IncidentState.CLOSED, IncidentState.INVESTIGATING],
}
# States that require human approval
APPROVAL_REQUIRED = {IncidentState.AWAITING_APPROVAL}
# Maximum time in any state before auto-escalation
STATE_TIMEOUTS = {
IncidentState.TRIAGING: 300, # 5 minutes
IncidentState.INVESTIGATING: 1800, # 30 minutes
IncidentState.CONTAINING: 600, # 10 minutes
}
def __init__(self, incident_id: str, ai_agent, human_interface):
self.incident_id = incident_id
self.state = IncidentState.DETECTED
self.ai_agent = ai_agent
self.human_interface = human_interface
self.history = []
self.context = {}
def transition(self, target_state: IncidentState, reason: str) -> bool:
"""Attempt state transition with validation."""
# Validate transition is allowed
if target_state not in self.TRANSITIONS.get(self.state, []):
self._log(f"Invalid transition: {self.state} -> {target_state}")
return False
# Check if approval required
if target_state in self.APPROVAL_REQUIRED:
if not self.human_interface.get_approval(self.incident_id, reason):
self._log(f"Transition to {target_state} denied by human")
return False
# Execute transition
self.history.append({
"from": self.state,
"to": target_state,
"reason": reason,
"timestamp": self._now()
})
self.state = target_state
return True
def run_triage(self) -> IncidentState:
"""AI-assisted triage with deterministic outcome."""
# Gather facts deterministically
facts = self._gather_incident_facts()
# Ask AI for severity assessment
ai_assessment = self.ai_agent.assess_severity(facts)
# Validate AI response
severity = self._validate_severity(ai_assessment)
# Deterministic routing based on severity
if severity == "critical":
self.transition(IncidentState.ESCALATED, "Critical severity - immediate escalation")
elif severity == "false_positive":
self.transition(IncidentState.CLOSED, "AI triage: false positive")
else:
self.transition(IncidentState.INVESTIGATING, f"AI triage: {severity} severity")
return self.state
def _validate_severity(self, ai_assessment: dict) -> str:
"""Force AI output into valid severity category."""
valid_severities = {"critical", "high", "medium", "low", "false_positive"}
severity = ai_assessment.get("severity", "").lower()
if severity not in valid_severities:
# Default to medium if AI gives invalid response
return "medium"
return severity
Key points:
- State transitions are explicitly defined and enforced
- AI cannot bypass the state machine
- Human approval gates exist for critical transitions
- Timeouts prevent workflows from stalling
- AI failures result in safe defaults, not system failures
Pattern 3: The Consensus Pattern
For high-stakes decisions, require agreement between multiple sources before acting.
Example: Automated Blocking Decision
@dataclass
class BlockingDecision:
should_block: bool
confidence: float
sources_agree: int
sources_total: int
reasoning: list[str]
class ConsensusBlockingService:
"""Require multi-source agreement before automated blocking."""
CONFIDENCE_THRESHOLD = 0.8
MINIMUM_SOURCES = 2
def __init__(self, ai_agent, rule_engine, threat_feeds: list):
self.ai_agent = ai_agent
self.rule_engine = rule_engine
self.threat_feeds = threat_feeds
def evaluate_for_blocking(self, indicator: str) -> BlockingDecision:
votes = []
reasoning = []
# Source 1: Deterministic rule engine
rule_result = self.rule_engine.evaluate(indicator)
if rule_result.matched:
votes.append(("rules", True, 1.0))
reasoning.append(f"Rule match: {rule_result.rule_name}")
else:
votes.append(("rules", False, 1.0))
# Source 2: Threat intelligence feeds (deterministic)
for feed in self.threat_feeds:
feed_result = feed.lookup(indicator)
if feed_result:
votes.append((feed.name, True, feed_result.confidence))
reasoning.append(f"{feed.name}: {feed_result.category}")
else:
votes.append((feed.name, False, 0.5))
# Source 3: AI assessment (non-deterministic, weighted lower)
ai_result = self.ai_agent.assess_threat(indicator)
ai_vote = self._parse_ai_vote(ai_result)
votes.append(("ai_agent", ai_vote.should_block, ai_vote.confidence * 0.7))
if ai_vote.should_block:
reasoning.append(f"AI assessment: {ai_vote.reason}")
# Calculate consensus
return self._calculate_consensus(votes, reasoning)
def _calculate_consensus(self, votes: list, reasoning: list) -> BlockingDecision:
block_votes = [v for v in votes if v[1]] # Votes to block
if len(block_votes) < self.MINIMUM_SOURCES:
return BlockingDecision(
should_block=False,
confidence=0.0,
sources_agree=len(block_votes),
sources_total=len(votes),
reasoning=["Insufficient consensus for blocking"]
)
# Weighted confidence
total_confidence = sum(v[2] for v in block_votes) / len(votes)
return BlockingDecision(
should_block=total_confidence >= self.CONFIDENCE_THRESHOLD,
confidence=total_confidence,
sources_agree=len(block_votes),
sources_total=len(votes),
reasoning=reasoning
)
Key points:
- AI is one vote among many, not the sole decision maker
- AI confidence is weighted lower than deterministic sources
- Minimum agreement threshold prevents single-source decisions
- Full reasoning trail for audit and review
Pattern 4: The Sandbox Pattern
Let AI operate freely within constrained boundaries, then validate before committing.
Example: AI-Generated Detection Rules
class DetectionRuleSandbox:
"""Allow AI to generate detection rules, validate before deployment."""
# Constraints on what AI-generated rules can do
MAX_RULE_COMPLEXITY = 10 # Maximum number of conditions
ALLOWED_FIELDS = {"src_ip", "dst_ip", "port", "protocol", "user", "process"}
FORBIDDEN_ACTIONS = {"delete", "modify", "execute"}
def __init__(self, ai_agent, rule_validator, test_environment):
self.ai_agent = ai_agent
self.rule_validator = rule_validator
self.test_env = test_environment
def generate_and_validate_rule(self, threat_description: str) -> dict:
# Step 1: AI generates candidate rule
candidate = self.ai_agent.generate_detection_rule(threat_description)
# Step 2: Syntax validation
if not self.rule_validator.is_valid_syntax(candidate):
return {"status": "rejected", "reason": "Invalid syntax"}
# Step 3: Constraint validation
constraint_check = self._check_constraints(candidate)
if not constraint_check["passed"]:
return {"status": "rejected", "reason": constraint_check["reason"]}
# Step 4: Test against known samples
test_result = self._test_in_sandbox(candidate)
if test_result["false_positive_rate"] > 0.01:
return {"status": "rejected", "reason": "Excessive false positives"}
if test_result["detection_rate"] < 0.8:
return {"status": "rejected", "reason": "Insufficient detection rate"}
# Step 5: Queue for human review (never auto-deploy)
return {
"status": "pending_review",
"rule": candidate,
"test_results": test_result,
"requires_approval": True
}
def _check_constraints(self, rule: dict) -> dict:
"""Enforce deterministic constraints on AI-generated rules."""
# Check complexity
conditions = rule.get("conditions", [])
if len(conditions) > self.MAX_RULE_COMPLEXITY:
return {"passed": False, "reason": "Rule too complex"}
# Check only allowed fields are used
used_fields = self._extract_fields(rule)
invalid_fields = used_fields - self.ALLOWED_FIELDS
if invalid_fields:
return {"passed": False, "reason": f"Invalid fields: {invalid_fields}"}
# Check no forbidden actions
actions = rule.get("actions", [])
for action in actions:
if action.get("type") in self.FORBIDDEN_ACTIONS:
return {"passed": False, "reason": f"Forbidden action: {action['type']}"}
return {"passed": True}
def _test_in_sandbox(self, rule: dict) -> dict:
"""Test rule against known malicious and benign samples."""
# Run against known malicious samples
malicious_results = self.test_env.run_against_malicious(rule)
# Run against known benign samples
benign_results = self.test_env.run_against_benign(rule)
return {
"detection_rate": malicious_results["detected"] / malicious_results["total"],
"false_positive_rate": benign_results["flagged"] / benign_results["total"],
"samples_tested": malicious_results["total"] + benign_results["total"]
}
Key points:
- AI operates within strict constraints
- Multiple validation layers before any real-world impact
- Testing against known samples provides objective metrics
- Human approval required before production deployment
Decision Framework: When to Use AI
Not every task benefits from AI. Use this framework:
| Task Characteristic | AI Suitability | Example |
|---|---|---|
| Requires creativity/synthesis | High | Summarizing incident findings |
| Pattern recognition in unstructured data | High | Analyzing malware behavior |
| Strict correctness required | Low | Firewall rule generation |
| Explainability required for compliance | Low | Access control decisions |
| High volume, low stakes | Medium | Log triage |
| Low volume, high stakes | Low | Production deployments |
| Well-defined, deterministic logic | Avoid AI | IP allowlist management |
The “Intern Test”
A useful heuristic: Would you let a smart but inexperienced intern do this task unsupervised?
- Yes, with review -> AI can do it with validation
- Yes, independently -> You probably don’t need AI
- No, too risky -> Don’t let AI do it autonomously
Implementation Checklist
Before deploying AI in your system, verify:
- AI outputs are validated against a strict schema
- Invalid AI responses result in safe defaults, not errors
- Deterministic sources are consulted before AI
- Business rules can override AI decisions
- Human approval gates exist for high-impact actions
- Full audit trail of AI decisions and reasoning
- Timeouts and circuit breakers for AI calls
- Fallback paths when AI is unavailable
- Testing with adversarial/edge-case inputs
- Monitoring for AI accuracy drift over time
Conclusion
AI is a powerful tool, but it’s still a tool. The organizations getting the most value from AI are those treating it as a component within well-architected systems, not as a magical solution that replaces engineering discipline.
The patterns in this post - Validator, State Machine, Consensus, and Sandbox - provide practical approaches to harnessing AI’s strengths while maintaining the reliability and predictability that production systems demand.
In cybersecurity, where the cost of errors can be severe, this disciplined approach isn’t optional - it’s essential. Use AI to augment human decision-making and automate well-defined subtasks, but keep deterministic systems in control of the overall workflow.
The future isn’t AI replacing reliable systems. It’s AI making reliable systems smarter.