Running in Production
Security hardening
Section titled “Security hardening”Container security
Section titled “Container security”The mc-operator container image is built on Ubuntu Noble Chiseled — an ultra-minimal, distroless-style image with:
- No shell, no package manager, no curl/wget
- Minimal attack surface: only the ASP.NET Core runtime and direct dependencies
- Non-root execution by default (built-in
appuser, UID 1654)
The Helm chart enforces the security context at the pod level:
securityContext: runAsNonRoot: true runAsUser: 1001 fsGroup: 1001The operator requires cluster-scoped RBAC to watch MinecraftServer resources across namespaces and manage apps/statefulsets. The Helm chart creates a ClusterRole with exactly the permissions needed — nothing more. Review charts/mc-operator/templates/clusterrole.yaml for the exact grants.
Webhook TLS
Section titled “Webhook TLS”The Helm chart automatically generates self-signed TLS certificates for the webhook service and injects the caBundle into the webhook configurations. Certificates are valid for 10 years and persist across helm upgrade operations.
For environments that require certificate rotation or integration with an organizational PKI, you can use cert-manager or another certificate management solution alongside the operator.
High availability
Section titled “High availability”Multiple operator replicas
Section titled “Multiple operator replicas”The operator supports running multiple replicas with built-in Kubernetes leader election. Leader election is automatically enabled when deploying more than one replica:
replicaCount: 2Only one replica is the active leader at any time. Passive replicas take over within seconds of leader failure. There is no warm cache to warm up — the operator is stateless between reconciles.
Operator reliability
Section titled “Operator reliability”The operator uses a 5-minute requeue on success (drift detection) and a 30-second requeue on error. This means:
- Spec changes are applied within seconds
- Transient errors (e.g. Kubernetes API throttling) self-heal quickly
- Continuous reconciliation catches out-of-band changes to child resources
Monitoring
Section titled “Monitoring”The operator does not yet export Prometheus metrics directly. You can monitor the operator through:
- Pod logs: The operator logs reconcile cycles, errors, and phase transitions.
- Kubernetes events: Check
kubectl get events -n mc-operator-system. - MinecraftServer status:
kubectl get mcs -n minecraftshows Phase and Ready columns.
# Watch all servers across all namespaceskubectl get minecraftservers -A -w
# Inspect a specific server's statuskubectl describe minecraftserver paper-survival -n minecraftBackups
Section titled “Backups”mc-operator does not implement automated backups in v1. Recommended approaches:
PVC snapshots with Velero
Section titled “PVC snapshots with Velero”Velero can take VolumeSnapshot-backed PVC backups on a schedule:
velero backup create minecraft-backup \ --include-namespaces minecraft \ --snapshot-volumes=trueManual backup via kubectl cp
Section titled “Manual backup via kubectl cp”For a running server, you can copy world data out while the server is paused:
# Pause the server firstkubectl patch mcs paper-survival -n minecraft \ --type merge -p '{"spec": {"replicas": 0}}'
# Copy world datakubectl exec -n minecraft data-paper-survival-0 -c minecraft -- \ tar czf - /data > backup-$(date +%Y%m%d).tar.gz
# Resumekubectl patch mcs paper-survival -n minecraft \ --type merge -p '{"spec": {"replicas": 1}}'Resource sizing guidelines
Section titled “Resource sizing guidelines”| Players | CPU Request | Memory Request | JVM Heap |
|---|---|---|---|
| 1–5 | 250m | 1.5Gi | 1G max |
| 5–20 | 500m | 2.5Gi | 2G max |
| 20–50 | 1–2 | 5Gi | 4G max |
| 50+ | 2–4 | 8Gi+ | 6–8G max |
These are rough guidelines. Profile your specific server (plugins, world size, chunk loading patterns) for accurate sizing.
Storage class selection
Section titled “Storage class selection”Use a high-throughput, low-latency StorageClass for Minecraft world data. World I/O is write-heavy (chunk saving).
By default the operator creates PVCs with ReadWriteOnce access mode. When spec.prePull: true is set, the PVC is created with ReadWriteMany so a short-lived pre-pull Job can mount the data volume simultaneously with the running server pod during version upgrades. If you enable pre-pull, ensure your StorageClass supports ReadWriteMany:
prePull: truestorage: storageClassName: "premium-rwx" # Cloud-provider SSD class with RWX support size: "30Gi"Common StorageClass options with ReadWriteMany support:
- GKE: Filestore (
ReadWriteMany) or standard Persistent Disk in single-node clusters - EKS: EFS (with the EFS CSI driver)
- AKS: Azure Files (
azurefile-csi) - On-prem: Longhorn, Rook-Ceph (CephFS), NFS-based CSI drivers
Namespace strategy
Section titled “Namespace strategy”Deploy each server environment to its own namespace:
kubectl create namespace minecraft-productionkubectl create namespace minecraft-stagingThis provides:
- Clear resource isolation
- Namespace-scoped RBAC if needed
- Easy cost attribution via namespace labels
Upgrade strategy
Section titled “Upgrade strategy”Upgrading the operator
Section titled “Upgrading the operator”helm upgrade mc-operator oci://ghcr.io/danihengeveld/charts/mc-operator \ --version <new-version> \ --namespace mc-operator-systemOperator upgrades do not affect running Minecraft servers. The reconciler is non-destructive by default — it updates child resources only when the spec changes.
Upgrading the CRD
Section titled “Upgrading the CRD”CRD upgrades may include schema changes. Always apply the new CRD before upgrading the operator:
kubectl apply -f https://raw.githubusercontent.com/danihengeveld/mc-operator/v<version>/manifests/crd/minecraftservers.yamlhelm upgrade mc-operator ...Updating Minecraft server versions
Section titled “Updating Minecraft server versions”kubectl patch minecraftserver paper-survival -n minecraft \ --type merge -p '{"spec": {"server": {"version": "1.21.0"}}}'When spec.prePull: true is set, the operator detects the image change and begins a zero-downtime upgrade sequence:
- A short-lived
batch/v1pre-pull Job is created on the server’s current node. It mounts the data PVC and runs the itzg startup scripts with a fakejavastub — this downloads the new server jar to the PVC while the old server is still running, then exits 0. - Once the Job completes, the StatefulSet rolling update is applied. Because both the OCI image layers and the server jar are already present, the new pod starts almost immediately.
The server status message shows "Pre-pulling image: <image>" during step 1. You can watch the upgrade:
kubectl get minecraftserver paper-survival -n minecraft -w