Skip to content

Sizing Profile for Pilot Installs

Reference profiles

Two reference host sizes for this release.

small-dev

Intended for: development, demos, single-tenant POC.

ResourceValue
CPU4 vCPU
RAM16 GB
Disk100 GB SSD
Max tenants1–2
SocTalk control plane reserved~2 GB RAM, 1 vCPU
Per-tenant budget~6–8 GB RAM, 1–1.5 vCPU

Boot times are slower here; <30 min to OSS stack healthy SLO applies.

pilot-prod

Intended for: MSSP running real pilot customers, 3–5 tenants.

ResourceValue
CPU8 vCPU
RAM32 GB
Disk500 GB SSD
Max tenants3–5
SocTalk control plane reserved~3 GB RAM, 1–2 vCPU
Per-tenant budget~5–7 GB RAM, 1–1.5 vCPU

Boot times on the <15 min to OSS stack healthy SLO.

Per-tenant footprint (estimates)

These are starting-point values for ResourceQuota and LimitRange in the tenant chart. pre-release validation measures actuals; actuals replace these in the final values.

ComponentRAM requestRAM limitCPU requestCPU limitDisk (PVC)
Wazuh manager512 MB1 GB200 m500 m20 GB
Wazuh indexer (OpenSearch fork)2 GB (heap 1 GB)4 GB (heap 2 GB)500 m2000 m50 GB
Wazuh dashboard512 MB1 GB100 m500 m
Filebeat128 MB256 MB50 m200 m
TheHive1 GB2 GB300 m1000 m
Cassandra (TheHive backing)2 GB4 GB500 m1500 m30 GB
Cortex768 MB1.5 GB200 m800 m
Cortex ElasticSearch1 GB2 GB300 m1000 m20 GB
SocTalk adapter128 MB256 MB50 m200 m
Per-tenant total (limits)~8 GB request, ~16 GB limit~2.2 vCPU request, ~7.7 vCPU limit~120 GB

Note: limits are burst ceilings; sustained usage is closer to requests. Running 3 tenants on an 8-vCPU / 32 GB / 500 GB host means:

  • RAM: ~24 GB of requests (fits), ~48 GB of limits (requires careful overcommit tuning).
  • CPU: ~6.6 vCPU of requests (fits with control plane), bursts share total.
  • Disk: ~360 GB of tenant PVCs (fits with margin for control plane + SocTalk DB).

This is why pilot-prod caps at 5 tenants; beyond 5, memory limits start bumping into node capacity even accounting for overcommit.

Max-tenants-per-node formula

Approximation:

max_tenants = floor((node_total_RAM - control_plane_RAM - safety_margin) / per_tenant_RAM_request)
  • control_plane_RAM: 2 GB (small-dev) or 3 GB (pilot-prod) for SocTalk + Postgres + ingress controller + Cilium + cert-manager.
  • safety_margin: 10% of node RAM for K8s system pods, CNI, DNS, monitoring.
  • per_tenant_RAM_request: 8 GB baseline.

For 32 GB pilot-prod: floor((32 - 3 - 3.2) / 8) = floor(25.8 / 8) = 3 guaranteed tenants without overcommit. With overcommit, 4–5 is safe for typical alert volumes.

Disk sizing drivers

The dominant disk consumer is the Wazuh indexer (stores indexed events). Ingest rate determines growth:

Alert rateDaily index size (rough)Retention 30 daysRetention 90 days
10 alerts/sec sustained~5 GB/day150 GB450 GB
1 alert/sec sustained~500 MB/day15 GB45 GB
100 alerts/day~10 MB/day300 MB900 MB

Tenant PVC sizes in the chart default to 50 GB for the Wazuh indexer; MSSPs override per-tenant for high-volume customers.

Retention policy defaults to 30 days of hot data in indexer; older data is deleted or archived (doesn't implement hot→cold tiering; a future release adds it).

Sizing gates

Pre-provisioning check

When MSSP operator creates a new tenant, SocTalk controller runs a sanity check:

available_RAM = node.allocatable.memory - sum(ns.resourceQuota.requests.memory for ns in existing_tenant_namespaces) - control_plane_reserve
if (new_tenant.resourceQuota.requests.memory > available_RAM):
    refuse with "insufficient cluster capacity for new tenant"
    or
    prompt MSSP: "this will overcommit; proceed? [y/N]"

This gate is softer in this release (warn rather than hard-fail) since MSSPs may intentionally overcommit for light-use customers.

Per-tenant LimitRange enforcement

Every tenant namespace has a LimitRange:

yaml
apiVersion: v1
kind: LimitRange
metadata: { name: tenant-limits, namespace: tenant-acme }
spec:
  limits:
    - type: Container
      default:
        memory: "2Gi"
        cpu: "500m"
      defaultRequest:
        memory: "256Mi"
        cpu: "100m"
      max:
        memory: "6Gi"
        cpu: "2"

Prevents an accidentally-misconfigured pod from requesting 30 GB and starving the node.

Profiles beyond

Documented but not validated in this release:

ProfileCPURAMDiskMax tenants
mid-host16 vCPU64 GB1 TB10–15
large-host32 vCPU128 GB2 TB25–30
multi-node cluster3 nodes × large-50+ (a future release multi-install recommended instead)

Recommendation for MSSPs growing past pilot-prod capacity:

  • : add a second host, run a second SocTalk install (schema supports this, tooling is manual).
  • a future release: multi-install automation in Cloud layer.
  • a future release: clustered K3s with proper scheduling across nodes.

Measurement plan (pre-release validation)

The spike produces real numbers to replace the estimates in §2:

  1. Deploy soctalk-tenant with one tenant on k3d (dev-harness).
  2. Idle measurement: take kubectl top pod -n tenant-acme snapshot.
  3. Load test: inject 10 alerts/sec for 10 minutes; measure peak.
  4. Stop load; measure ~5 minutes later for "warm-idle" numbers.
  5. Repeat with three tenants in parallel to observe interference.
  6. Update this document's tables with measured values.

Released under the Apache 2.0 License.