Building Deterministic Systems with Non-Deterministic AI

| 11 min read

The current AI hype cycle has led many organizations to approach every problem with “let’s add AI to it.” While AI capabilities are genuinely impressive, treating AI as the solution rather than a tool often leads to unreliable, unpredictable systems that fail in production. The key insight is this: AI should be a component within a deterministic system, not the system itself.

In this post, I’ll share architectural patterns for building reliable systems that leverage AI’s strengths while compensating for its weaknesses - with practical examples from cybersecurity operations.

The Problem with AI-First Architecture

Non-Deterministic by Nature

Large Language Models are fundamentally probabilistic. The same prompt with identical context can yield different outputs across invocations. This is acceptable for creative tasks but problematic for:

  • Security decisions - Inconsistent threat classifications lead to missed attacks or alert fatigue
  • Compliance workflows - Auditors expect reproducible, explainable decisions
  • Automated responses - Unpredictable actions can cause outages or security incidents
  • Integration points - Downstream systems expect consistent data formats

The Hallucination Problem

AI models confidently generate plausible-sounding but incorrect information. In a security context, this could mean:

  • Fabricated CVE numbers that don’t exist
  • Incorrect remediation steps that introduce new vulnerabilities
  • False attribution to threat actors
  • Imaginary network indicators that waste analyst time

Context Window Limitations

AI agents lose context over long operations. A 30-minute incident investigation might “forget” critical findings from earlier analysis, leading to incomplete or contradictory conclusions.

The Solution: AI as a Tool, Not the Brain

The fundamental shift is treating AI as one component in a larger deterministic system - similar to how you’d use any other tool with known limitations.

┌─────────────────────────────────────────────────────────────┐
│              Deterministic Orchestration Layer              │
│         (State machine, workflow engine, rules engine)      │
└─────────┬─────────────────┬─────────────────┬───────────────┘
          │                 │                 │
          ▼                 ▼                 ▼
    ┌───────────┐     ┌───────────┐     ┌───────────┐
    │  AI Agent │     │ Rule-Based│     │  External │
    │  (Tool)   │     │   Logic   │     │   APIs    │
    └───────────┘     └───────────┘     └───────────┘
          │                 │                 │
          └─────────────────┴─────────────────┘


                  ┌───────────────────┐
                  │  Validation Layer │
                  │  (Schema, rules)  │
                  └───────────────────┘

Core Principles

  1. Deterministic orchestration - The overall workflow is controlled by predictable logic
  2. AI for specific subtasks - Use AI only where its strengths apply
  3. Validation boundaries - Every AI output passes through validation before action
  4. Human checkpoints - Critical decisions require human approval
  5. Fallback paths - Graceful degradation when AI fails or produces invalid output

Pattern 1: The Validator Pattern

Never trust AI output directly. Wrap every AI interaction with validation.

Example: Threat Intelligence Enrichment

from dataclasses import dataclass
from enum import Enum
import re

class RiskLevel(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    UNKNOWN = "unknown"

@dataclass
class ThreatAssessment:
    indicator: str
    risk_level: RiskLevel
    category: str
    confidence: float
    sources: list[str]

class ThreatIntelService:
    """Deterministic wrapper around AI-powered threat analysis."""
    
    VALID_CATEGORIES = {
        "malware", "phishing", "c2", "scanner", "tor_exit", 
        "botnet", "spam", "unknown"
    }
    
    def __init__(self, ai_agent, threat_db):
        self.ai_agent = ai_agent
        self.threat_db = threat_db  # Deterministic source of truth
    
    def assess_indicator(self, indicator: str) -> ThreatAssessment:
        # Step 1: Check deterministic sources first
        known_threat = self.threat_db.lookup(indicator)
        if known_threat:
            return known_threat  # Deterministic result
        
        # Step 2: Use AI for unknown indicators
        ai_result = self.ai_agent.analyze(indicator)
        
        # Step 3: Validate AI output against schema
        validated = self._validate_ai_result(ai_result, indicator)
        
        # Step 4: Apply business rules
        final_result = self._apply_rules(validated)
        
        return final_result
    
    def _validate_ai_result(self, ai_result: dict, indicator: str) -> ThreatAssessment:
        """Force AI output into valid, deterministic structure."""
        
        # Validate risk level - default to UNKNOWN if invalid
        try:
            risk_level = RiskLevel(ai_result.get("risk_level", "").lower())
        except ValueError:
            risk_level = RiskLevel.UNKNOWN
        
        # Validate category - reject hallucinated categories
        category = ai_result.get("category", "unknown").lower()
        if category not in self.VALID_CATEGORIES:
            category = "unknown"
        
        # Validate confidence is a proper float
        try:
            confidence = float(ai_result.get("confidence", 0))
            confidence = max(0.0, min(1.0, confidence))  # Clamp to [0, 1]
        except (TypeError, ValueError):
            confidence = 0.0
        
        # Validate sources exist and are reasonable
        sources = ai_result.get("sources", [])
        if not isinstance(sources, list):
            sources = []
        sources = [s for s in sources if isinstance(s, str) and len(s) < 500]
        
        return ThreatAssessment(
            indicator=indicator,
            risk_level=risk_level,
            category=category,
            confidence=confidence,
            sources=sources
        )
    
    def _apply_rules(self, assessment: ThreatAssessment) -> ThreatAssessment:
        """Apply deterministic business rules on top of AI assessment."""
        
        # Rule: Private IPs are always low risk
        if self._is_private_ip(assessment.indicator):
            assessment.risk_level = RiskLevel.LOW
            assessment.category = "internal"
            assessment.confidence = 1.0
        
        # Rule: Known good domains override AI assessment
        if self._is_known_good(assessment.indicator):
            assessment.risk_level = RiskLevel.LOW
            assessment.confidence = 1.0
        
        # Rule: Low confidence AI results get downgraded
        if assessment.confidence < 0.5 and assessment.risk_level == RiskLevel.CRITICAL:
            assessment.risk_level = RiskLevel.HIGH
        
        return assessment

Key points:

  • AI is consulted only after deterministic sources fail
  • Every AI output field is validated and sanitized
  • Business rules override AI decisions when appropriate
  • Invalid AI output results in safe defaults, not errors

Pattern 2: The State Machine Pattern

Use explicit state machines to control workflow progression, with AI handling specific transitions.

Example: Automated Incident Response

from enum import Enum, auto
from typing import Callable

class IncidentState(Enum):
    DETECTED = auto()
    TRIAGING = auto()
    INVESTIGATING = auto()
    CONTAINING = auto()
    AWAITING_APPROVAL = auto()
    REMEDIATING = auto()
    VALIDATING = auto()
    CLOSED = auto()
    ESCALATED = auto()

class IncidentStateMachine:
    """Deterministic state machine with AI-assisted transitions."""
    
    # Define valid state transitions
    TRANSITIONS = {
        IncidentState.DETECTED: [IncidentState.TRIAGING, IncidentState.ESCALATED],
        IncidentState.TRIAGING: [IncidentState.INVESTIGATING, IncidentState.CLOSED, IncidentState.ESCALATED],
        IncidentState.INVESTIGATING: [IncidentState.CONTAINING, IncidentState.CLOSED, IncidentState.ESCALATED],
        IncidentState.CONTAINING: [IncidentState.AWAITING_APPROVAL, IncidentState.ESCALATED],
        IncidentState.AWAITING_APPROVAL: [IncidentState.REMEDIATING, IncidentState.ESCALATED],
        IncidentState.REMEDIATING: [IncidentState.VALIDATING, IncidentState.ESCALATED],
        IncidentState.VALIDATING: [IncidentState.CLOSED, IncidentState.INVESTIGATING],
    }
    
    # States that require human approval
    APPROVAL_REQUIRED = {IncidentState.AWAITING_APPROVAL}
    
    # Maximum time in any state before auto-escalation
    STATE_TIMEOUTS = {
        IncidentState.TRIAGING: 300,      # 5 minutes
        IncidentState.INVESTIGATING: 1800, # 30 minutes
        IncidentState.CONTAINING: 600,     # 10 minutes
    }
    
    def __init__(self, incident_id: str, ai_agent, human_interface):
        self.incident_id = incident_id
        self.state = IncidentState.DETECTED
        self.ai_agent = ai_agent
        self.human_interface = human_interface
        self.history = []
        self.context = {}
    
    def transition(self, target_state: IncidentState, reason: str) -> bool:
        """Attempt state transition with validation."""
        
        # Validate transition is allowed
        if target_state not in self.TRANSITIONS.get(self.state, []):
            self._log(f"Invalid transition: {self.state} -> {target_state}")
            return False
        
        # Check if approval required
        if target_state in self.APPROVAL_REQUIRED:
            if not self.human_interface.get_approval(self.incident_id, reason):
                self._log(f"Transition to {target_state} denied by human")
                return False
        
        # Execute transition
        self.history.append({
            "from": self.state,
            "to": target_state,
            "reason": reason,
            "timestamp": self._now()
        })
        self.state = target_state
        return True
    
    def run_triage(self) -> IncidentState:
        """AI-assisted triage with deterministic outcome."""
        
        # Gather facts deterministically
        facts = self._gather_incident_facts()
        
        # Ask AI for severity assessment
        ai_assessment = self.ai_agent.assess_severity(facts)
        
        # Validate AI response
        severity = self._validate_severity(ai_assessment)
        
        # Deterministic routing based on severity
        if severity == "critical":
            self.transition(IncidentState.ESCALATED, "Critical severity - immediate escalation")
        elif severity == "false_positive":
            self.transition(IncidentState.CLOSED, "AI triage: false positive")
        else:
            self.transition(IncidentState.INVESTIGATING, f"AI triage: {severity} severity")
        
        return self.state
    
    def _validate_severity(self, ai_assessment: dict) -> str:
        """Force AI output into valid severity category."""
        valid_severities = {"critical", "high", "medium", "low", "false_positive"}
        severity = ai_assessment.get("severity", "").lower()
        
        if severity not in valid_severities:
            # Default to medium if AI gives invalid response
            return "medium"
        
        return severity

Key points:

  • State transitions are explicitly defined and enforced
  • AI cannot bypass the state machine
  • Human approval gates exist for critical transitions
  • Timeouts prevent workflows from stalling
  • AI failures result in safe defaults, not system failures

Pattern 3: The Consensus Pattern

For high-stakes decisions, require agreement between multiple sources before acting.

Example: Automated Blocking Decision

@dataclass
class BlockingDecision:
    should_block: bool
    confidence: float
    sources_agree: int
    sources_total: int
    reasoning: list[str]

class ConsensusBlockingService:
    """Require multi-source agreement before automated blocking."""
    
    CONFIDENCE_THRESHOLD = 0.8
    MINIMUM_SOURCES = 2
    
    def __init__(self, ai_agent, rule_engine, threat_feeds: list):
        self.ai_agent = ai_agent
        self.rule_engine = rule_engine
        self.threat_feeds = threat_feeds
    
    def evaluate_for_blocking(self, indicator: str) -> BlockingDecision:
        votes = []
        reasoning = []
        
        # Source 1: Deterministic rule engine
        rule_result = self.rule_engine.evaluate(indicator)
        if rule_result.matched:
            votes.append(("rules", True, 1.0))
            reasoning.append(f"Rule match: {rule_result.rule_name}")
        else:
            votes.append(("rules", False, 1.0))
        
        # Source 2: Threat intelligence feeds (deterministic)
        for feed in self.threat_feeds:
            feed_result = feed.lookup(indicator)
            if feed_result:
                votes.append((feed.name, True, feed_result.confidence))
                reasoning.append(f"{feed.name}: {feed_result.category}")
            else:
                votes.append((feed.name, False, 0.5))
        
        # Source 3: AI assessment (non-deterministic, weighted lower)
        ai_result = self.ai_agent.assess_threat(indicator)
        ai_vote = self._parse_ai_vote(ai_result)
        votes.append(("ai_agent", ai_vote.should_block, ai_vote.confidence * 0.7))
        if ai_vote.should_block:
            reasoning.append(f"AI assessment: {ai_vote.reason}")
        
        # Calculate consensus
        return self._calculate_consensus(votes, reasoning)
    
    def _calculate_consensus(self, votes: list, reasoning: list) -> BlockingDecision:
        block_votes = [v for v in votes if v[1]]  # Votes to block
        
        if len(block_votes) < self.MINIMUM_SOURCES:
            return BlockingDecision(
                should_block=False,
                confidence=0.0,
                sources_agree=len(block_votes),
                sources_total=len(votes),
                reasoning=["Insufficient consensus for blocking"]
            )
        
        # Weighted confidence
        total_confidence = sum(v[2] for v in block_votes) / len(votes)
        
        return BlockingDecision(
            should_block=total_confidence >= self.CONFIDENCE_THRESHOLD,
            confidence=total_confidence,
            sources_agree=len(block_votes),
            sources_total=len(votes),
            reasoning=reasoning
        )

Key points:

  • AI is one vote among many, not the sole decision maker
  • AI confidence is weighted lower than deterministic sources
  • Minimum agreement threshold prevents single-source decisions
  • Full reasoning trail for audit and review

Pattern 4: The Sandbox Pattern

Let AI operate freely within constrained boundaries, then validate before committing.

Example: AI-Generated Detection Rules

class DetectionRuleSandbox:
    """Allow AI to generate detection rules, validate before deployment."""
    
    # Constraints on what AI-generated rules can do
    MAX_RULE_COMPLEXITY = 10  # Maximum number of conditions
    ALLOWED_FIELDS = {"src_ip", "dst_ip", "port", "protocol", "user", "process"}
    FORBIDDEN_ACTIONS = {"delete", "modify", "execute"}
    
    def __init__(self, ai_agent, rule_validator, test_environment):
        self.ai_agent = ai_agent
        self.rule_validator = rule_validator
        self.test_env = test_environment
    
    def generate_and_validate_rule(self, threat_description: str) -> dict:
        # Step 1: AI generates candidate rule
        candidate = self.ai_agent.generate_detection_rule(threat_description)
        
        # Step 2: Syntax validation
        if not self.rule_validator.is_valid_syntax(candidate):
            return {"status": "rejected", "reason": "Invalid syntax"}
        
        # Step 3: Constraint validation
        constraint_check = self._check_constraints(candidate)
        if not constraint_check["passed"]:
            return {"status": "rejected", "reason": constraint_check["reason"]}
        
        # Step 4: Test against known samples
        test_result = self._test_in_sandbox(candidate)
        if test_result["false_positive_rate"] > 0.01:
            return {"status": "rejected", "reason": "Excessive false positives"}
        if test_result["detection_rate"] < 0.8:
            return {"status": "rejected", "reason": "Insufficient detection rate"}
        
        # Step 5: Queue for human review (never auto-deploy)
        return {
            "status": "pending_review",
            "rule": candidate,
            "test_results": test_result,
            "requires_approval": True
        }
    
    def _check_constraints(self, rule: dict) -> dict:
        """Enforce deterministic constraints on AI-generated rules."""
        
        # Check complexity
        conditions = rule.get("conditions", [])
        if len(conditions) > self.MAX_RULE_COMPLEXITY:
            return {"passed": False, "reason": "Rule too complex"}
        
        # Check only allowed fields are used
        used_fields = self._extract_fields(rule)
        invalid_fields = used_fields - self.ALLOWED_FIELDS
        if invalid_fields:
            return {"passed": False, "reason": f"Invalid fields: {invalid_fields}"}
        
        # Check no forbidden actions
        actions = rule.get("actions", [])
        for action in actions:
            if action.get("type") in self.FORBIDDEN_ACTIONS:
                return {"passed": False, "reason": f"Forbidden action: {action['type']}"}
        
        return {"passed": True}
    
    def _test_in_sandbox(self, rule: dict) -> dict:
        """Test rule against known malicious and benign samples."""
        
        # Run against known malicious samples
        malicious_results = self.test_env.run_against_malicious(rule)
        
        # Run against known benign samples
        benign_results = self.test_env.run_against_benign(rule)
        
        return {
            "detection_rate": malicious_results["detected"] / malicious_results["total"],
            "false_positive_rate": benign_results["flagged"] / benign_results["total"],
            "samples_tested": malicious_results["total"] + benign_results["total"]
        }

Key points:

  • AI operates within strict constraints
  • Multiple validation layers before any real-world impact
  • Testing against known samples provides objective metrics
  • Human approval required before production deployment

Decision Framework: When to Use AI

Not every task benefits from AI. Use this framework:

Task CharacteristicAI SuitabilityExample
Requires creativity/synthesisHighSummarizing incident findings
Pattern recognition in unstructured dataHighAnalyzing malware behavior
Strict correctness requiredLowFirewall rule generation
Explainability required for complianceLowAccess control decisions
High volume, low stakesMediumLog triage
Low volume, high stakesLowProduction deployments
Well-defined, deterministic logicAvoid AIIP allowlist management

The “Intern Test”

A useful heuristic: Would you let a smart but inexperienced intern do this task unsupervised?

  • Yes, with review -> AI can do it with validation
  • Yes, independently -> You probably don’t need AI
  • No, too risky -> Don’t let AI do it autonomously

Implementation Checklist

Before deploying AI in your system, verify:

  • AI outputs are validated against a strict schema
  • Invalid AI responses result in safe defaults, not errors
  • Deterministic sources are consulted before AI
  • Business rules can override AI decisions
  • Human approval gates exist for high-impact actions
  • Full audit trail of AI decisions and reasoning
  • Timeouts and circuit breakers for AI calls
  • Fallback paths when AI is unavailable
  • Testing with adversarial/edge-case inputs
  • Monitoring for AI accuracy drift over time

Conclusion

AI is a powerful tool, but it’s still a tool. The organizations getting the most value from AI are those treating it as a component within well-architected systems, not as a magical solution that replaces engineering discipline.

The patterns in this post - Validator, State Machine, Consensus, and Sandbox - provide practical approaches to harnessing AI’s strengths while maintaining the reliability and predictability that production systems demand.

In cybersecurity, where the cost of errors can be severe, this disciplined approach isn’t optional - it’s essential. Use AI to augment human decision-making and automate well-defined subtasks, but keep deterministic systems in control of the overall workflow.

The future isn’t AI replacing reliable systems. It’s AI making reliable systems smarter.