Day-2 Operations
This page covers what you need to run Lunar Hub after the initial install: upgrades, secret rotation, observability, and uninstall. For the per-version upgrade notes (what values changed, what to migrate), see the chart README's Upgrading section.
Upgrading
helm repo update
helm upgrade lunar earthly/lunar \
--namespace lunar \
-f values.yamlThe Hub runs migrations automatically on every startup. Migrations are forward-only and an interrupted one is safe to retry.
Before every upgrade:
Back up Postgres. Migrations are forward-only. If you need to roll back, roll the database back.
Preview the chart changes. The
helm-diffplugin is worth installing —helm diff upgrade lunar earthly/lunar -f values.yamlshows exactly what Kubernetes resources will change.Read the version-specific notes in the chart README. Minor version bumps occasionally require values changes.
Pin your new image tags in
values.yaml. Don't rely on the chart's default tags.
The Hub runs as a single replica today. Helm's default rolling-update strategy will briefly run two Hubs (old and new) during upgrade; the new Hub will block on migrations if the old one is still holding DB connections. Plan for a ~1 minute Hub API unavailability during upgrade. CI agents and collectors/policy/catalogers should retry transparently across this window.
Rotating secrets
Lunar's secrets split into two categories — user-managed (you create and rotate them) and chart-managed (auto-generated on first install, preserved across upgrades via helm.sh/resource-policy: keep). See Required secrets in the chart README for the full list and naming rules.
Hub auth token (<release>-auth-token)
<release>-auth-token)Used by the CLI and CI agents to authenticate to the Hub. The Hub accepts a single token today — rotation requires a coordinated cutover. To force a regeneration, delete the secret and re-run helm upgrade (the chart will generate a fresh token):
Both the Hub and the Operator read this token from the same secret (the Operator uses it as OPERATOR_HUB_TOKEN to call the Hub), so restart both. Retrieve the new token with kubectl -n lunar get secret lunar-auth-token -o jsonpath='{.data.token}' | base64 -d, then update every CI agent's LUNAR_HUB_TOKEN and every developer's CLI config. Active builds authenticated with the old token will fail mid-run and need to be retried. Dual-token support is on the roadmap.
To pin to an externally-managed token instead, set hub.auth.secretName in values to a secret you control — the chart will consume it instead of generating its own.
GitHub App private key (lunar-github-app)
lunar-github-app)GitHub supports multiple active private keys per App — rotate with zero downtime:
Generate a new key in GitHub → Apps → your App → Private keys → Generate a private key. Download the new PEM.
Update the Kubernetes secret in place, matching the install-time encoding (the Hub expects a base64-encoded PEM inside the secret value):
Restart the Hub —
kubectl -n lunar rollout restart deployment/lunar-hub.Verify webhooks still deliver successfully (GitHub → any repo → Settings → Webhooks → Recent Deliveries).
Delete the old key in GitHub.
GitHub webhook secret (<release>-github-webhook)
<release>-github-webhook)Changing the webhook secret requires re-registering every repo's webhook. Easier path: delete and regenerate, then re-pull your primary config to trigger re-registration.
During the window between Hub restart and lunar hub pull, per-repo webhooks from GitHub are still signed with the old secret, and the new Hub rejects them with a 500. GitHub does not auto-retry failed webhook deliveries — any events in that window are lost unless you manually redeliver them from Settings → Webhooks → Recent Deliveries on the affected repos. Minimize the gap by running lunar hub pull immediately after the Hub restart comes up healthy.
Database password (lunar-db)
lunar-db)Standard Kubernetes secret rotation. If your Postgres supports multiple active passwords (e.g. RDS password grace), rotate without downtime:
Set the new password in Postgres (keeping the old one active).
Update the
lunar-dbsecret.Restart the Hub.
Revoke the old password.
If your Postgres doesn't support multiple active passwords, expect ~1 minute of Hub unavailability during the rotation.
Other secrets
A few secrets don't get their own recipe above:
<release>-grafana-admin
Chart
lunar-grafana deployment
Hub licence JWT (example lunar-hub-licence)
You
Hub
Per-scope runtime secrets (hub.secrets.{collector,cataloger,policy}.secretName)
You
Hub
The rotation shape follows the same split as the sections above:
Chart-managed (Grafana admin):
kubectl delete secret <name>→helm upgradeto regenerate → restart the consumer deployment.User-managed (hub licence JWT, runtime secrets): update the secret in place (
kubectl applyorkubectl create --dry-run=client -o yaml | kubectl apply -f -) → restart the consumer(s).
To find what consumes an arbitrary secret:
Observability
Logs
The Hub and Operator log in structured JSON by default (logging.format: json). Log level defaults to info; raise to debug temporarily by editing values.yaml and running helm upgrade.
The Hub and Operator don't ship logs anywhere themselves — they stream to stdout. To get them into your log aggregator, point a cluster-level log shipper (Fluent Bit, Vector, Datadog Agent, etc.) at the lunar namespace.
Telemetry to Earthly
Hub tenant identity and telemetry destinations come from the signed licence JWT, not from HUB_TENANT_ID, HUB_ELASTIC_*, or HUB_OTEL_* environment variables.
The chart handles the mount and HUB_LICENCE_FILE wiring for you. Keep (or override) the licence secret reference in values:
Rotation flow:
Request a new JWT from Earthly.
Update the Kubernetes secret value (
hub-licence.jwt) in place.The hub re-validates every 5 minutes and also enforces the exact
expboundary.
Metrics
The Hub exports request-duration histograms (HTTP and gRPC) via OTLP when the active licence includes telemetry.otel.endpoint and telemetry.otel.token.
If telemetry.otel is omitted, the Hub falls back to writing metrics to a local temp file.
Secure (TLS) OTLP is not implemented yet, so OTLP export runs insecure today.
The Hub does not export distributed traces today — only metrics. Trace support is on the roadmap.
Diagnostics bundle
The Hub can produce a diagnostics bundle (Postgres queue state, recent error logs, slow-query stats) for support requests. The bundle gathers extra Postgres telemetry when the pg_stat_statements extension is enabled — without it the bundle still works, just with less query-performance data.
To enable it:
Amazon RDS / Aurora: add
pg_stat_statementstoshared_preload_librariesin the DB parameter group, then reboot the instance.Self-managed: add
pg_stat_statementstoshared_preload_librariesinpostgresql.confand restart, thenCREATE EXTENSION pg_stat_statements;as a superuser.
The Hub does not require this extension — it's only used by the diagnostics path.
Scaling
Lunar Hub runs as a single replica today. Vertical scaling (CPU, memory, Postgres connection pool via HUB_DB_MAX_OPEN_CONNS / HUB_DB_MAX_POOL_CONNS) is the way to handle more load. Horizontal scaling of the Hub is not yet supported.
Run throughput scales independently — raise operator.maxConcurrent and size run pod resources accordingly. See operator.scriptContainerSpec* in the chart README.
Prioritizing core services over script pods
By default the Operator creates script pods (collectors, policies, catalogers) at the cluster's default priority — the same priority as the Hub, Operator, and Grafana. When those script pods share a node with the core services, a burst of runs can drive the node into memory pressure and the kubelet may evict a core pod when it should be shedding an ephemeral script instead.
This matters in two setups:
Single-namespace installs — the Operator runs scripts in the same namespace (and usually the same node group) as the Hub, so core and ephemeral workloads compete for the same memory.
Shared scratch namespaces or node groups — scripts land on nodes shared with other workloads, and you want Lunar's ephemeral pods to yield first under contention.
operator.scriptPodPriorityClassName (chart >= 2.3.0) sets the PriorityClass on every script pod the Operator creates. Point it at a class whose value sits below your core services, and the kubelet evicts scripts first under pressure — the scheduler also won't preempt a higher-priority pod to make room for a script.
Setup
Create the PriorityClass. The chart references it by name but does not create it.
Point the Operator at it in
values.yaml:Upgrade and restart the Operator — only script pods created after the restart inherit the class:
Leave the Hub, Operator, and Grafana on the cluster default priority — don't set a class on them. The gap between their 0 and the script class's negative value is what orders eviction; you lower scripts rather than raise core services. In a shared scratch node group, pick a value below whatever neighbouring workloads run at so Lunar's scripts are the first thing to go.
PriorityClasses are cluster-scoped, and a pod that references one that doesn't exist is rejected at admission — so the Operator will fail to launch scripts until the class is applied. Create the class before (or in the same change as) the helm upgrade, and delete it only once no workload references it.
Priority orders who gets evicted first — it is not a resource cap. Keep your script pod requests/limits set (see operator.scriptContainerSpec* in the chart README) so a single run can't consume a whole node before the kubelet reacts. Priority decides the order; requests and limits decide the ceiling.
Backup and disaster recovery
Lunar's authoritative state lives in the systems you already back up:
Postgres — use your existing Postgres backup process. Everything the Hub cares about (components, policies, run history, queue state) is here.
S3 buckets — enable object versioning on both buckets and, if compliance requires, cross-region replication. Lost resource archives cause re-building the cache of catalogers, collectors, and policies; lost log archives mean lost history. Neither impedes Hub operation.
Hub PVC — the Hub's PVC at
/var/lib/lunarholds the local git repo cache and materialized run bundles. It is regenerable; the Hub repopulates it on startup from Postgres and S3. You do not need to back it up.Kubernetes secrets — keep the GitHub App PEM, DB credentials, auth token, and webhook secret in your organization's secret-management system of record. Losing them means re-provisioning.
Uninstalling
Helm removes all resources the chart created: the Hub and operator deployments, services, ingress, RBAC, and service accounts.
Not removed automatically:
The
lunarnamespace itself (and, if used, the Operator's execution namespace).Kubernetes secrets you created manually (e.g.
lunar-db,lunar-github-app,lunar-hub-licence).Chart-managed secrets (
<release>-auth-token,<release>-github-webhook,<release>-grafana-admin). These carryhelm.sh/resource-policy: keepso a re-install reuses the same values — by design. Delete them explicitly if you want fresh credentials on the next install.The Hub's PVC (
<release>-hubby default). Delete withkubectl -n lunar delete pvc -l app.kubernetes.io/instance=lunarif you want the state gone.Your Postgres database, S3 buckets, or their contents.
The GitHub App. Webhooks previously registered by this Hub are tagged with the Hub's instance ID (
earthly-lunar-<tenant-id-from-licence>as a URL fragment); they remain on your repos until you remove them manually or install a fresh Hub with the same tenant ID, which will clean up its own stale webhooks on next config pull.
To fully tear down:
Getting help
For enterprise onboarding or production sizing guidance, contact the Earthly team.
Last updated
