Architecture
1. Core entities
Minimal shape. Full column lists live in the migration; only load-bearing fields are named here.
alerts raw ingest from adapter; AI-triaged
cases investigation unit; one run at a time
case_runs a single AI execution span against a case
case_events ordered event inbox per case (immutable)
proposals AI-proposed actions awaiting human gate
execution_log append-only audit of all meaningful actions
notes markdown / evidence blocks
iocs typed artifacts; carry external_context
case_iocs, case_assets bridge tables
case_links related-case edges (shared IOC / asset / rule)
case_outbox outbound work for executors and exportsEvery content-bearing row carries tenant_id, visibility, and created_at. RLS applies per tenancy.
2. Visibility model
Classes (enum):
mssp_only default; internal reasoning, raw tool output, hypotheses
customer_safe approved for customer view
system lifecycle and state-change events, always visible
tool_output classified per-tool at registration timeRules:
visibilityis a column on every user-visible row (messages, notes, proposals, tool_output records, timeline entries, facts-panel fields).- Default on insert is
mssp_only. Promotion tocustomer_safeis an explicit operation. - Customer portal queries filter at the RLS policy layer, not at render. A customer-viewer session cannot read
mssp_onlyrows even via raw SQL. - Proposals have field-level visibility:
{action, outcome}may becustomer_safewhile{rationale, blast_radius}staysmssp_only. Rendered as two projections. - Every visibility promotion emits an
execution_logentry with the actor and rationale.
Default-deny-promotion: policies may downgrade visibility but may not upgrade without an explicit action by an authorized principal.
3. Run lifecycle
States:
active run consuming events and taking steps
waiting_on_gate a proposal is pending; run does not mutate state
halted_budget budget exceeded; requires analyst resume
paused analyst-paused
completed case closed
failed unrecoverable error; requires analyst resume or restartTransitions:
active → waiting_on_gate on proposal created (status = proposed)
waiting_on_gate → active on proposal approved/rejected (new event)
active → halted_budget on budget exceeded
halted_budget → active on analyst resume (grants new budget)
active → paused on analyst pause
paused → active on analyst resume
active → completed on case close
* → failed on uncaught error, preserved for diagnosisInvariants:
- At most one run per case in state
active | waiting_on_gate | halted_budget | paused. Enforced via a partial unique index oncase_runs(case_id) WHERE status IN (...). - Budget counters on the run:
tokens_used,dollars_used,tool_calls_used,wall_clock_ms. Enforced server-side; soft warn at 75%, hard halt at 100%. - A
waiting_on_gaterun does not process inbox events except gate-resolution events (proposal.approved / .rejected).
4. Event inbox, ordering, coalescing, idempotency
All incoming work for a case lands in case_events:
event_id uuid PK
case_id FK
run_id FK nullable
seq bigint, case-scoped monotonic (sequence)
kind enum (alert_ingested, tool_result,
proposal_approved, proposal_rejected,
analyst_message, analyst_correction,
budget_warning, external_signal, ...)
payload jsonb
causation_event_id uuid nullable (which event caused this one)
correlation_id uuid (spans a causally-related fan-out)
idempotency_key text unique per case
created_at timestamptzRules:
seqis issued by a case-scoped sequence on insert. Consumers read strictly inseqorder.idempotency_keyis unique percase_id. Duplicate insert is silently dropped (return the existing row).- Coalescing: before insert, events matching
(case_id, kind, payload.signature, window)merge into a single row. Signature is kind-specific (alert: fingerprint of IOC + rule + asset; tool_result: tool_id + params hash). causation_event_idlinks cause → effect for replay.correlation_idgroups events from a single external trigger or analyst action.- Events are immutable. Updates express as follow-on events.
Burst example: 100 similar host alerts in 5 minutes coalesce into one alert_ingested event carrying an asset_ids: [...] list. The run processes it once.
5. Proposal lifecycle and execution contract
States:
draft being composed by the AI
proposed submitted to human gate
approved human approved (with typed reason if required)
rejected human rejected (reason required)
executing outbox picked up; executor running
executed action complete, result recorded
rolled_back post-execution reversal (rare, analyst-initiated)
failed executor errorIdempotency:
proposal.idempotency_key = sha256(case_id || action_type ||
canonical_json(params))Duplicate proposals within an active window (default 15 minutes) are rejected at insert. Guarantees the AI cannot double-fire even under re-run.
Gate behavior:
- On
proposed: run transitions towaiting_on_gate. - On
approved: insert row incase_outboxwithkind = 'execute_proposal',idempotency_key = proposal.idempotency_key. Emitproposal_approvedintocase_events. Run resumes. - On
rejected: emitproposal_rejectedwith reason intocase_events. Run resumes. No outbox row.
Execution:
- Separate executor worker consumes
case_outboxand performs the action. - On success: records
execute_proposal_resultintocase_events, updates proposal →executed, writesexecution_logentry. - On failure: records error, updates proposal →
failed, writesexecution_logentry. The run may propose a retry. - Exactly-once via
idempotency_key: outbox rows with duplicate keys are rejected. Executor workers claim rows with a lease (e.g.,FOR UPDATE SKIP LOCKED).
The AI run does not execute side effects inline. Everything goes through the outbox.
6. Execution log schema and invariants
Append-only, separate from conversation:
log_id uuid PK
case_id FK
run_id FK nullable
actor_kind enum (ai, human, system, executor)
actor_id text
kind enum (tool_call, proposal_state_change,
approval, override, visibility_promotion,
correction_applied, policy_bound,
export_emitted, ...)
subject_type enum (case, proposal, ioc, asset, note, ...)
subject_id text
before jsonb nullable
after jsonb nullable
versions jsonb (model_id, prompt_version, template_version,
policy_version at time of action)
ts timestamptz default now()Invariants:
- No UPDATE or DELETE permitted from app roles. Only INSERT + SELECT. Enforced at the Postgres role-grant layer.
- Every proposal state change, every tool call, every approval, every analyst override of an AI decision, every visibility change, every correction, every outbox dispatch writes a row.
versionscaptures the stack that produced the action. Required for reproducibility and post-hoc calibration.- The conversation is a rendered view of a subset of events; it is not audit. Destroying or compacting conversation does not destroy audit.
7. Facts-panel authority and correction flow
Structured case state (hypotheses, IOCs, assets, timeline summary, confidence, active directives) is a reducer output over case_events. It is never directly mutated by conversation.
Rules:
- Conversation messages do not write structured state.
- AI updates to structured state happen via AI-emitted events (
hypothesis_updated,ioc_added,asset_linked). - Analyst edits in the facts panel emit
analyst_correctionevents. The reducer applies them. The AI consumes the correction as the next inbox event and re-reasons from the corrected state. - The facts panel is eventually consistent with
case_events. A materialized projection (table or view) is maintained; reads can hit it directly. - Direct corrections to the execution log are forbidden; corrections express as new events plus a pointer to the corrected one.
8. Tool capability taxonomy
Every tool is registered with a capability class, a default approval policy, and a cost model.
Capability classes:
read_local inspect SocTalk state only
read_external_silent no target footprint (feeds, cached intel, vector)
read_external_attributed trace at target (SIEM query, EDR read)
write_sandbox footprint without target mutation (detonation)
write_external target state change (block, isolate, notify)Default approval policy per class:
read_local → autonomous
read_external_silent → autonomous
read_external_attributed → analyst_approve
write_sandbox → analyst_approve
write_external → typed_reasonPer-tool cost model: {tokens_est, dollars_est, wall_ms_est, footprint}. The run budget tracks the sum.
9. Policy precedence
Policies are merged in this order, lower overrides higher:
1. install default (shipped in chart, read-only in v1)
2. tenant override (MSSP sets per customer)
3. case template (phishing, ransomware, etc.)
4. case-local override (set for this one case by analyst)For each policy key (tool approval, auto-close, visibility promotion, response templates, budget), the effective value is the deepest scope that defines it.
Invariants:
- Visibility promotion is never set to
permissiveby default at install scope. Default is "explicit promotion required." - A tenant policy cannot override an install-level hard cap (e.g.,
max_tokens_per_case). - Case-local overrides are scoped to the case and do not persist to future cases.
10. Auto-close / reopen semantics
Auto-close for high-confidence FPs:
Trigger:
AI assessment = fp, confidence ≥ policy.auto_close_threshold
AND policy.auto_close_enabled is true for the tenant
AND no active directive prevents auto-close
Action:
case.status = 'auto_closed_fp'
case.reopen_window_until = now() + policy.reopen_window
case.reopen_signature = {
ioc_fingerprints: [...],
asset_ids: [...],
time_window: {start, end}
}
run transitions to completed
execution_log row writtenReopen:
Trigger:
new case_events row with kind ∈ {alert_ingested, external_signal}
whose signature intersects a case's reopen_signature
where case.status = 'auto_closed_fp'
AND now() < case.reopen_window_until
Action:
case.status = 'active'
emit reopened event into case_events
new run created
execution_log row written
conversation receives a system message noting the reopenKill switch:
IntegrationConfig.auto_close_enabledper tenant (default: on).CaseTemplate.auto_close_disabledper case type.
11. TheHive export contract (outbox-based, one-way)
Mirror cases, IOCs, and selected notes outbound to TheHive when the tenant has thehive_export_enabled. Never accept inbound changes.
Outbox row (in case_outbox):
id uuid PK
kind 'export.thehive.case' | 'export.thehive.ioc' | ...
external_system 'thehive'
external_ref TheHive object id (filled on first successful mirror)
object_type case | ioc | note
object_id internal subject id
idempotency_key sha256(object_type || object_id || state_hash)
payload jsonb
export_status pending | in_flight | succeeded | failed | skipped
attempts int
last_error text nullable
next_attempt_at timestamptz
created_at, updated_atRules:
- State change on a mirrored object enqueues an export row with a fresh
idempotency_key(incorporates the state hash). - Worker claims with
FOR UPDATE SKIP LOCKED. On success, recordsexternal_ref(creating or updating on TheHive side as needed) and writes execution_log. - Inbound webhooks from TheHive are accepted only for read-only dashboard cases (not v1). Any attempt to accept inbound state is explicitly rejected and logged.
- No reconciliation loop — TheHive is a downstream mirror, the source of truth is SocTalk.
- Failed exports retry with exponential backoff up to a cap; permanent failure surfaces on the integrations health panel.
12. Mandatory tests and invariants
Test suite (unit + integration) must cover:
- Execution log immutability. UPDATE and DELETE against
execution_logfrom the app role fail at the Postgres layer. - Single active run per case. Concurrent attempts to create a second active run fail with a unique-constraint violation.
- Proposal idempotency. Submitting two proposals with the same idempotency key within the window: the second is rejected.
- Gate-pause behavior. A run with a
proposedproposal does not consume non-gate events from its inbox. - Outbox exactly-once. Two workers claiming the same outbox row result in one succeeding, one no-oping.
- Visibility enforcement. A customer-viewer session cannot select
mssp_onlyrows from any table, even with raw SQL. - Visibility promotion logged. Every promotion from
mssp_onlytocustomer_safeproduces anexecution_logrow. - Correction flow. Analyst correction event produces a new event that the reducer applies; the facts-panel projection reflects the correction.
- Auto-close reopen. An event matching a reopen_signature within the window reopens the case and starts a new run.
- TheHive export idempotency. Re-running an export for an object whose state has not changed is a no-op (same idempotency_key).
- Tool approval policy. A
write_externaltool call without a typed_reason approval cannot reach the executor. - Policy precedence. Case-local override wins over tenant which wins over install for the same policy key.
13. Out of this spec
- Component models, visual behavior, command-bar parsing → the conversation UI workstream.
- Campaign correlation, scoring, cross-tenant mechanics → the campaigns workstream.
- Prompt library, LLM tool registry contents, model-version policy → separate the LLM runtime workstream (LLM runtime) when we get there.
