> For the complete documentation index, see [llms.txt](https://docs-lunar.earthly.dev/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs-lunar.earthly.dev/install/lunar-hub/hub-day2.md).

# Day-2 Operations

This page covers what you need to run Lunar Hub after the initial install: upgrades, secret rotation, observability, and uninstall. For the per-version upgrade notes (what values changed, what to migrate), see the [chart README's Upgrading section](https://github.com/earthly/charts/blob/main/README.md#upgrading).

## Upgrading

```bash
helm repo update
helm upgrade lunar earthly/lunar \
  --namespace lunar \
  -f values.yaml
```

The Hub runs migrations automatically on every startup. Migrations are forward-only and an interrupted one is safe to retry.

**Before every upgrade:**

1. **Back up Postgres.** Migrations are forward-only. If you need to roll back, roll the database back.
2. **Preview the chart changes.** The [`helm-diff`](https://github.com/databus23/helm-diff) plugin is worth installing — `helm diff upgrade lunar earthly/lunar -f values.yaml` shows exactly what Kubernetes resources will change.
3. **Read the version-specific notes** in the [chart README](https://github.com/earthly/charts/blob/main/README.md#upgrading). Minor version bumps occasionally require values changes.
4. **Pin your new image tags** in `values.yaml`. Don't rely on the chart's default tags.

{% hint style="warning" %}
The Hub runs as a single replica today. Helm's default rolling-update strategy will briefly run two Hubs (old and new) during upgrade; the new Hub will block on migrations if the old one is still holding DB connections. Plan for a \~1 minute Hub API unavailability during upgrade. CI agents and collectors/policy/catalogers should retry transparently across this window.
{% endhint %}

## Rotating secrets

Lunar's secrets split into two categories — **user-managed** (you create and rotate them) and **chart-managed** (auto-generated on first install, preserved across upgrades via `helm.sh/resource-policy: keep`). See [Required secrets](https://github.com/earthly/charts/blob/main/README.md#required-secrets) in the chart README for the full list and naming rules.

### Hub auth token (`<release>-auth-token`)

Used by the CLI and CI agents to authenticate to the Hub. The Hub accepts a single token today — rotation requires a coordinated cutover. To force a regeneration, delete the secret and re-run `helm upgrade` (the chart will generate a fresh token):

```bash
kubectl -n lunar delete secret lunar-auth-token
helm upgrade lunar earthly/lunar -n lunar -f values.yaml
kubectl -n lunar rollout restart deployment/lunar-hub
kubectl -n lunar rollout restart deployment/lunar-operator
```

Both the Hub and the Operator read this token from the same secret (the Operator uses it as `OPERATOR_HUB_TOKEN` to call the Hub), so restart both. Retrieve the new token with `kubectl -n lunar get secret lunar-auth-token -o jsonpath='{.data.token}' | base64 -d`, then update every CI agent's `LUNAR_HUB_TOKEN` and every developer's CLI config. Active builds authenticated with the old token will fail mid-run and need to be retried. Dual-token support is on the roadmap.

To pin to an externally-managed token instead, set `hub.auth.secretName` in values to a secret you control — the chart will consume it instead of generating its own.

### GitHub App private key (`lunar-github-app`)

GitHub supports multiple active private keys per App — rotate with zero downtime:

1. **Generate a new key** in GitHub → **Apps → your App → Private keys → Generate a private key**. Download the new PEM.
2. **Update the Kubernetes secret** in place, matching the install-time encoding (the Hub expects a base64-encoded PEM inside the secret value):

   ```bash
   kubectl -n lunar create secret generic lunar-github-app \
     --from-literal=private-key="$(base64 < path/to/new-key.pem | tr -d '\n')" \
     --dry-run=client -o yaml | kubectl apply -f -
   ```
3. **Restart the Hub** — `kubectl -n lunar rollout restart deployment/lunar-hub`.
4. **Verify** webhooks still deliver successfully (GitHub → any repo → **Settings → Webhooks → Recent Deliveries**).
5. **Delete the old key** in GitHub.

### GitHub webhook secret (`<release>-github-webhook`)

Changing the webhook secret requires re-registering every repo's webhook. Easier path: delete and regenerate, then re-pull your primary config to trigger re-registration.

```bash
kubectl -n lunar delete secret lunar-github-webhook
helm upgrade lunar earthly/lunar -n lunar -f values.yaml
kubectl -n lunar rollout restart deployment/lunar-hub

# Re-pull triggers webhook re-registration on every repo in your config.
lunar hub pull github://your-org/your-config-repo@main
```

During the window between Hub restart and `lunar hub pull`, per-repo webhooks from GitHub are still signed with the old secret, and the new Hub rejects them with a `500`. **GitHub does not auto-retry failed webhook deliveries** — any events in that window are lost unless you manually redeliver them from **Settings → Webhooks → Recent Deliveries** on the affected repos. Minimize the gap by running `lunar hub pull` immediately after the Hub restart comes up healthy.

### Database password (`lunar-db`)

Standard Kubernetes secret rotation. If your Postgres supports multiple active passwords (e.g. RDS password grace), rotate without downtime:

1. Set the new password in Postgres (keeping the old one active).
2. Update the `lunar-db` secret.
3. Restart the Hub.
4. Revoke the old password.

If your Postgres doesn't support multiple active passwords, expect \~1 minute of Hub unavailability during the rotation.

### Other secrets

A few secrets don't get their own recipe above:

| Secret                                                                            | Managed by | Consumed by                |
| --------------------------------------------------------------------------------- | ---------- | -------------------------- |
| `<release>-grafana-admin`                                                         | Chart      | `lunar-grafana` deployment |
| Hub licence JWT (example `lunar-hub-licence`)                                     | You        | Hub                        |
| Per-scope runtime secrets (`hub.secrets.{collector,cataloger,policy}.secretName`) | You        | Hub                        |

The rotation shape follows the same split as the sections above:

* **Chart-managed** (Grafana admin): `kubectl delete secret <name>` → `helm upgrade` to regenerate → restart the consumer deployment.
* **User-managed** (hub licence JWT, runtime secrets): update the secret in place (`kubectl apply` or `kubectl create --dry-run=client -o yaml | kubectl apply -f -`) → restart the consumer(s).

To find what consumes an arbitrary secret:

```bash
kubectl -n lunar get deployments -o yaml | grep -B2 '<secret-name>'
```

## Observability

### Logs

The Hub and Operator log in structured JSON by default (`logging.format: json`). Log level defaults to `info`; raise to `debug` temporarily by editing `values.yaml` and running `helm upgrade`.

The Hub and Operator don't ship logs anywhere themselves — they stream to stdout. To get them into your log aggregator, point a cluster-level log shipper (Fluent Bit, Vector, Datadog Agent, etc.) at the `lunar` namespace.

### Telemetry to Earthly

Hub tenant identity and telemetry destinations come from the signed licence JWT, not from `HUB_TENANT_ID`, `HUB_ELASTIC_*`, or `HUB_OTEL_*` environment variables.

The chart handles the mount and `HUB_LICENCE_FILE` wiring for you. Keep (or override) the licence secret reference in values:

```yaml
hub:
  licence:
    secretName: lunar-hub-licence
    secretKey: hub-licence.jwt
    # optional; defaults to /var/run/secrets/lunar/hub-licence.jwt
    # filePath: /var/run/secrets/lunar/hub-licence.jwt
```

Rotation flow:

* Request a new JWT from Earthly.
* Update the Kubernetes secret value (`hub-licence.jwt`) in place.
* The hub re-validates every 5 minutes and also enforces the exact `exp` boundary.

### Metrics

The Hub exports request-duration histograms (HTTP and gRPC) via OTLP when the active licence includes `telemetry.otel.endpoint` and `telemetry.otel.token`.

If `telemetry.otel` is omitted, the Hub falls back to writing metrics to a local temp file.

Secure (TLS) OTLP is not implemented yet, so OTLP export runs insecure today.

{% hint style="info" %}
The Hub does not export distributed traces today — only metrics. Trace support is on the roadmap.
{% endhint %}

### Diagnostics bundle

The Hub can produce a diagnostics bundle (Postgres queue state, recent error logs, slow-query stats) for support requests. The bundle gathers extra Postgres telemetry when the `pg_stat_statements` extension is enabled — without it the bundle still works, just with less query-performance data.

To enable it:

* **Amazon RDS / Aurora:** add `pg_stat_statements` to `shared_preload_libraries` in the DB parameter group, then reboot the instance.
* **Self-managed:** add `pg_stat_statements` to `shared_preload_libraries` in `postgresql.conf` and restart, then `CREATE EXTENSION pg_stat_statements;` as a superuser.

The Hub does not require this extension — it's only used by the diagnostics path.

## Scaling

Lunar Hub runs as a single replica today. Vertical scaling (CPU, memory, Postgres connection pool via `HUB_DB_MAX_OPEN_CONNS` / `HUB_DB_MAX_POOL_CONNS`) is the way to handle more load. Horizontal scaling of the Hub is not yet supported.

Run throughput scales independently — raise `operator.maxConcurrent` and size run pod resources accordingly. See [`operator.scriptContainerSpec*`](https://github.com/earthly/charts/blob/main/README.md#values-reference) in the chart README.

## Prioritizing core services over script pods

By default the Operator creates script pods (collectors, policies, catalogers) at the cluster's default priority — the same priority as the Hub, Operator, and Grafana. When those script pods share a node with the core services, a burst of runs can drive the node into memory pressure and the kubelet may evict a *core* pod when it should be shedding an ephemeral script instead.

This matters in two setups:

* **Single-namespace installs** — the Operator runs scripts in the same namespace (and usually the same node group) as the Hub, so core and ephemeral workloads compete for the same memory.
* **Shared scratch namespaces or node groups** — scripts land on nodes shared with other workloads, and you want Lunar's ephemeral pods to yield first under contention.

`operator.scriptPodPriorityClassName` (chart `>= 2.3.0`) sets the [PriorityClass](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) on every script pod the Operator creates. Point it at a class whose value sits *below* your core services, and the kubelet evicts scripts first under pressure — the scheduler also won't preempt a higher-priority pod to make room for a script.

### Setup

1. **Create the PriorityClass.** The chart references it by name but does not create it.

   ```yaml
   apiVersion: scheduling.k8s.io/v1
   kind: PriorityClass
   metadata:
     name: lunar-script
   value: -10           # negative → below the default (0) your core pods use
   globalDefault: false
   description: Ephemeral Lunar script pods; evicted before core services.
   ```

   ```bash
   kubectl apply -f lunar-script-priorityclass.yaml
   ```
2. **Point the Operator at it** in `values.yaml`:

   ```yaml
   operator:
     scriptPodPriorityClassName: lunar-script
   ```
3. **Upgrade and restart the Operator** — only script pods created *after* the restart inherit the class:

   ```bash
   helm upgrade lunar earthly/lunar -n lunar -f values.yaml
   kubectl -n lunar rollout restart deployment/lunar-operator
   ```

Leave the Hub, Operator, and Grafana on the cluster default priority — don't set a class on them. The gap between their `0` and the script class's negative value is what orders eviction; you lower scripts rather than raise core services. In a shared scratch node group, pick a value below whatever neighbouring workloads run at so Lunar's scripts are the first thing to go.

{% hint style="warning" %}
PriorityClasses are **cluster-scoped**, and a pod that references one that doesn't exist is rejected at admission — so the Operator will fail to launch scripts until the class is applied. Create the class *before* (or in the same change as) the `helm upgrade`, and delete it only once no workload references it.
{% endhint %}

{% hint style="info" %}
Priority orders *who gets evicted first* — it is not a resource cap. Keep your script pod requests/limits set (see [`operator.scriptContainerSpec*`](https://github.com/earthly/charts/blob/main/README.md#values-reference) in the chart README) so a single run can't consume a whole node before the kubelet reacts. Priority decides the order; requests and limits decide the ceiling.
{% endhint %}

## Backup and disaster recovery

Lunar's authoritative state lives in the systems you already back up:

* **Postgres** — use your existing Postgres backup process. Everything the Hub cares about (components, policies, run history, queue state) is here.
* **S3 buckets** — enable object versioning on both buckets and, if compliance requires, cross-region replication. Lost resource archives cause re-building the cache of catalogers, collectors, and policies; lost log archives mean lost history. Neither impedes Hub operation.
* **Hub PVC** — the Hub's PVC at `/var/lib/lunar` holds the local git repo cache and materialized run bundles. It is regenerable; the Hub repopulates it on startup from Postgres and S3. You do not need to back it up.
* **Kubernetes secrets** — keep the GitHub App PEM, DB credentials, auth token, and webhook secret in your organization's secret-management system of record. Losing them means re-provisioning.

## Uninstalling

```bash
helm uninstall lunar --namespace lunar
```

Helm removes all resources the chart created: the Hub and operator deployments, services, ingress, RBAC, and service accounts.

**Not removed automatically:**

* The `lunar` namespace itself (and, if used, the Operator's execution namespace).
* Kubernetes secrets you created manually (e.g. `lunar-db`, `lunar-github-app`, `lunar-hub-licence`).
* Chart-managed secrets (`<release>-auth-token`, `<release>-github-webhook`, `<release>-grafana-admin`). These carry `helm.sh/resource-policy: keep` so a re-install reuses the same values — by design. Delete them explicitly if you want fresh credentials on the next install.
* The Hub's PVC (`<release>-hub` by default). Delete with `kubectl -n lunar delete pvc -l app.kubernetes.io/instance=lunar` if you want the state gone.
* Your Postgres database, S3 buckets, or their contents.
* The GitHub App. Webhooks previously registered by this Hub are tagged with the Hub's instance ID (`earthly-lunar-<tenant-id-from-licence>` as a URL fragment); they remain on your repos until you remove them manually or install a fresh Hub with the same tenant ID, which will clean up its own stale webhooks on next config pull.

To fully tear down:

```bash
helm uninstall lunar --namespace lunar
kubectl -n lunar delete pvc -l app.kubernetes.io/instance=lunar
kubectl -n lunar delete secret lunar-db lunar-github-app \
  lunar-hub-licence \
  lunar-auth-token lunar-github-webhook lunar-grafana-admin \
  --ignore-not-found
kubectl delete namespace lunar

# Externally:
#   - Drop the Postgres database
#   - Delete or empty the S3 buckets
#   - Uninstall the GitHub App from your organization
```

## Getting help

* [Chart values reference](https://github.com/earthly/charts/blob/main/README.md#values-reference)
* [Chart source](https://github.com/earthly/charts)
* [Lunar source](https://github.com/earthly/lunar)
* For enterprise onboarding or production sizing guidance, [contact the Earthly team](https://earthly.dev/earthly-lunar/demo).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs-lunar.earthly.dev/install/lunar-hub/hub-day2.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.