home-services/plan.md

# ai-gateway — Implementation Plan

This plan describes the implementation of a new Go microservice, `ai-gateway`, in the `home-services` monorepo (`gitea.nik4nao.com/nik/home-services`). It centralizes all AI/LLM logic behind a gRPC API so callers (`discord-bot`, `alexa-bridge`) remain thin transport adapters with zero AI knowledge.

---

## 1. Goals & Non-Goals

### Goals
- New gRPC service `ai-gateway` listening on `:50052`.
- Owns **all** AI logic: Ollama connection, prompt construction, LLM intent parsing, dispatch to `ha-gateway`.
- Callers send raw user text via `QueryRequest`; receive a human-readable reply in `QueryResponse`.
- mTLS client authentication when calling `ha-gateway` (ha-gateway requires mTLS).
- Hexagonal architecture, matching the existing `ha-gateway` layout.
- Structured logging via `slog`, OTel OTLP gRPC traces/metrics.
- Deployed to the `home-services` namespace on K3s.

### Non-Goals
- No auth on `ai-gateway`'s own inbound gRPC surface in this iteration (in-cluster only; match current `ha-gateway` posture).
- No streaming responses — unary only.
- No conversation memory — each `Query` is stateless.
- No new Home Assistant features beyond what `ha-gateway` already exposes (LightService + EntityService).

---

## 2. Repository Layout

All paths are relative to the `home-services` repo root.

```
proto/
  ai/v1/ai.proto                          # NEW

gen/
  ai/v1/                                  # NEW (generated; committed)
    ai.pb.go
    ai_grpc.pb.go

services/
  ai-gateway/                             # NEW
    go.mod
    cmd/
      ai-gateway/
        main.go
    config/
      config.go
    domain/
      prompt.go
      service.go
      intent.go
    adapters/
      inbound/
        grpc/
          server.go
      outbound/
        ollama/
          client.go
        hagateway/
          client.go
    internal/
      observability/
        logging.go
        otel.go
    Dockerfile
    .dockerignore
  discord-bot/                            # MODIFIED
    adapters/outbound/aigateway/client.go # NEW
    (remove any direct Ollama code if present)
```

Also update:
- `go.work` — add `./services/ai-gateway` and keep `replace` directive to `../gen`.
- `buf.gen.yaml` / `buf.yaml` — include the new `ai/v1` proto package.

---

## 3. Proto Definition

### File: `proto/ai/v1/ai.proto`

```proto
syntax = "proto3";

package ai.v1;

option go_package = "gitea.nik4nao.com/nik/home-services/gen/ai/v1;aiv1";

// AIService accepts free-form natural language queries and returns a
// human-readable reply. It encapsulates LLM prompting, intent parsing,
// and dispatch to downstream services (e.g. ha-gateway).
service AIService {
  rpc Query(QueryRequest) returns (QueryResponse);
}

message QueryRequest {
  // Raw user text, e.g. "turn on the living room light".
  string text = 1;

  // Optional caller identifier for logging/tracing (e.g. "discord-bot").
  string source = 2;
}

message QueryResponse {
  // Human-readable reply to show the user.
  string reply = 1;

  // Parsed intent name, if any. Empty if no actionable intent was detected.
  string intent = 2;

  // True if an action was dispatched to a downstream service.
  bool action_taken = 3;
}
```

### Generation
- Run `buf generate` from repo root.
- Commit `gen/ai/v1/*.pb.go` and `gen/ai/v1/*_grpc.pb.go` (per existing convention — `gen/` is committed to avoid CI codegen dependency).

---

## 4. Configuration (`services/ai-gateway/config/config.go`)

Load from environment. Use `os.Getenv` with defaults (matches existing ha-gateway style — no new dep).

| Env Var                       | Default                                               | Purpose                                          |
| ----------------------------- | ----------------------------------------------------- | ------------------------------------------------ |
| `GRPC_LISTEN_ADDR`            | `:50052`                                              | Inbound gRPC bind address                        |
| `OLLAMA_URL`                  | `http://192.168.7.96:11434`                           | Ollama HTTP API (direct LAN IP; no K8s Service)  |
| `OLLAMA_MODEL`                | `llama3`                                              | Model name                                       |
| `OLLAMA_TIMEOUT`              | `30s`                                                 | HTTP timeout for Ollama calls                    |
| `HA_GATEWAY_ADDR`             | `ha-gateway.home-services.svc.cluster.local:50051`    | ha-gateway gRPC endpoint                         |
| `HA_GATEWAY_TLS_CA_FILE`      | `/etc/ai-gateway/tls/ca.crt`                          | CA cert that signed ha-gateway's server cert     |
| `HA_GATEWAY_TLS_CERT_FILE`    | `/etc/ai-gateway/tls/tls.crt`                         | ai-gateway's client cert (for mTLS)              |
| `HA_GATEWAY_TLS_KEY_FILE`     | `/etc/ai-gateway/tls/tls.key`                         | ai-gateway's client key                          |
| `HA_GATEWAY_SERVER_NAME`      | `ha-gateway.home-services.svc.cluster.local`          | SNI / cert verification name                     |
| `LOG_LEVEL`                   | `info`                                                | `debug`/`info`/`warn`/`error`                    |
| `LOG_FORMAT`                  | `json`                                                | `json` or `text`                                 |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | `otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317` | OTLP gRPC endpoint |
| `OTEL_SERVICE_NAME`           | `ai-gateway`                                          | Service name for traces/metrics                  |

Provide a `Config` struct with a `Load()` function returning `(Config, error)`. Validate required files exist at startup.

---

## 5. Domain Layer

### `domain/intent.go`

Define the intent contract the LLM must produce:

```go
package domain

type Intent struct {
    Name    string            `json:"intent"`   // e.g. "turn_on_light", "turn_off_light", "none"
    Entity  string            `json:"entity"`   // e.g. "living_room" (friendly name or entity_id)
    Params  map[string]string `json:"params"`   // optional, e.g. {"brightness":"80"}
    Reply   string            `json:"reply"`    // what to say back to the user
}

const (
    IntentNone         = "none"
    IntentTurnOnLight  = "turn_on_light"
    IntentTurnOffLight = "turn_off_light"
    IntentListEntities = "list_entities"
)
```

### `domain/prompt.go`

Build the Ollama prompt. The system prompt MUST instruct the model to return **only** a single JSON object matching the `Intent` schema. No markdown fences, no prose.

```go
package domain

import "fmt"

const systemPrompt = `You are a home automation assistant. Given a user request, respond with a single JSON object and nothing else — no markdown, no code fences, no explanation.

Schema:
{
  "intent": "turn_on_light" | "turn_off_light" | "list_entities" | "none",
  "entity": "<friendly_name_or_empty>",
  "params": { "<key>": "<value>" },
  "reply":  "<short human-readable reply>"
}

Rules:
- If the request is not actionable, use intent="none" and put the conversational answer in "reply".
- Always include all four fields. Use "" or {} for empty values.
- Do not wrap the JSON in backticks.`

func BuildPrompt(userText string) string {
    return fmt.Sprintf("%s\n\nUser: %s", systemPrompt, userText)
}
```

### `domain/service.go`

The orchestrator. Depends on two ports (interfaces) defined here:

```go
package domain

import "context"

type LLMClient interface {
    Generate(ctx context.Context, prompt string) (string, error)
}

type HAClient interface {
    TurnOnLight(ctx context.Context, entity string, params map[string]string) error
    TurnOffLight(ctx context.Context, entity string) error
    ListEntities(ctx context.Context) ([]string, error)
}

type Service struct {
    llm LLMClient
    ha  HAClient
    log *slog.Logger
}

func NewService(llm LLMClient, ha HAClient, log *slog.Logger) *Service { /* ... */ }

type QueryResult struct {
    Reply       string
    Intent      string
    ActionTaken bool
}

func (s *Service) Query(ctx context.Context, text string) (QueryResult, error) {
    // 1. BuildPrompt(text)
    // 2. s.llm.Generate(ctx, prompt)
    // 3. json.Unmarshal into Intent
    //    - On unmarshal error: log at warn, return reply = "I didn't understand that."
    // 4. switch intent.Name:
    //      turn_on_light  -> s.ha.TurnOnLight(...)
    //      turn_off_light -> s.ha.TurnOffLight(...)
    //      list_entities  -> s.ha.ListEntities(...); format into reply
    //      none / default -> reply = intent.Reply
    // 5. Return QueryResult
}
```

**Error handling:**
- LLM call failure → return error; inbound adapter maps to gRPC `Unavailable`.
- JSON parse failure → do NOT error; return a friendly "I didn't understand" reply and log the raw LLM output at `warn` with the original text (not error).
- HA dispatch failure → log at `error`, return reply "I couldn't reach Home Assistant right now."; `ActionTaken=false`.

---

## 6. Outbound Adapters

### `adapters/outbound/ollama/client.go`

- Plain `net/http.Client` with configured timeout.
- POST to `{OLLAMA_URL}/api/generate` with body:
  ```json
  { "model": "<OLLAMA_MODEL>", "prompt": "<prompt>", "stream": false }
  ```
- Decode JSON response, return the `response` field as a string.
- Implement `domain.LLMClient`.
- Wrap the HTTP client with OTel instrumentation (`otelhttp.NewTransport`).

### `adapters/outbound/hagateway/client.go`

This is the mTLS-critical piece.

- Construct a `*grpc.ClientConn` to `HA_GATEWAY_ADDR` with TLS credentials built from the three cert files:
  ```go
  func loadTLSCredentials(caFile, certFile, keyFile, serverName string) (credentials.TransportCredentials, error) {
      caPEM, err := os.ReadFile(caFile)
      if err != nil { return nil, fmt.Errorf("read ca: %w", err) }
      cp := x509.NewCertPool()
      if !cp.AppendCertsFromPEM(caPEM) {
          return nil, errors.New("failed to append CA cert")
      }
      clientCert, err := tls.LoadX509KeyPair(certFile, keyFile)
      if err != nil { return nil, fmt.Errorf("load client keypair: %w", err) }
      return credentials.NewTLS(&tls.Config{
          Certificates: []tls.Certificate{clientCert},
          RootCAs:      cp,
          ServerName:   serverName,
          MinVersion:   tls.VersionTLS13,
      }), nil
  }
  ```
- Use `grpc.NewClient(addr, grpc.WithTransportCredentials(creds), grpc.WithStatsHandler(otelgrpc.NewClientHandler()))`.
- Wrap the generated ha-gateway clients (`LightServiceClient`, `EntityServiceClient`) to satisfy `domain.HAClient`.
- Expose a `Close()` method for graceful shutdown.

**Cert source:** the cert files will be projected into the pod via a Kubernetes `Secret` mounted at `/etc/ai-gateway/tls/`. See deployment manifest below. Issuing the cert is covered in §10.

---

## 7. Inbound Adapter

### `adapters/inbound/grpc/server.go`

- Implements `aiv1.AIServiceServer`.
- `Query(ctx, req)` → calls `domain.Service.Query(ctx, req.Text)` → maps `QueryResult` to `QueryResponse`.
- Attach OTel interceptor: `grpc.StatsHandler(otelgrpc.NewServerHandler())`.
- Attach a slog unary interceptor that logs method, duration, caller `source`, and error code.
- Register reflection service only if `LOG_LEVEL=debug` (convenience for `grpcurl`).

---

## 8. Observability (`internal/observability/`)

Copy the pattern from `ha-gateway`:

### `logging.go`
- `NewLogger(level, format string) *slog.Logger` returning either `slog.NewJSONHandler` or `slog.NewTextHandler` wrapping `os.Stdout`.

### `otel.go`
- `InitOTel(ctx, endpoint, serviceName) (shutdown func(context.Context) error, err error)`.
- Uses `otlptracegrpc` + `otlpmetricgrpc` exporters, insecure credentials (in-cluster).
- Sets global `TracerProvider` and `MeterProvider`.
- Resource attributes: `service.name`, `service.namespace=home-services`.

---

## 9. Entry Point (`cmd/ai-gateway/main.go`)

Standard startup sequence:

1. Load config.
2. Build logger.
3. Init OTel; defer shutdown.
4. Build Ollama client.
5. Build ha-gateway client (mTLS); defer `Close()`.
6. Build domain service.
7. Build gRPC server with interceptors, register `AIService`.
8. Listen on `GRPC_LISTEN_ADDR`.
9. Handle `SIGINT`/`SIGTERM` for graceful shutdown: `server.GracefulStop()` with a 10s timeout, then OTel shutdown.

---

## 10. TLS / mTLS Plumbing

`ha-gateway` requires mTLS. `ai-gateway` needs a client certificate signed by the same CA that ha-gateway trusts.

### Approach: cert-manager + internal-ca-issuer

Create a `Certificate` resource for `ai-gateway` (file: `manifests/home-services/ai-gateway-client-cert.yaml`):

```yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: ai-gateway-client
  namespace: home-services
spec:
  secretName: ai-gateway-client-tls
  duration: 2160h      # 90d
  renewBefore: 360h    # 15d
  subject:
    organizations: [home-services]
  commonName: ai-gateway
  usages:
    - client auth
  issuerRef:
    name: internal-ca-issuer
    kind: ClusterIssuer
    group: cert-manager.io
```

**Important:** use `internal-ca-issuer` (the CA issuer), **never** `internal-ca` (the bootstrap self-signed issuer). This matches the homelab convention.

The resulting secret `ai-gateway-client-tls` contains `tls.crt`, `tls.key`, and `ca.crt` — mount all three.

### Verify ha-gateway's CA trust
Confirm ha-gateway's server TLS config trusts `internal-ca-issuer`'s CA (it should, since both use the same cluster CA). If ha-gateway uses a separate client-auth CA, adjust the issuer accordingly.

---

## 11. Kubernetes Manifest

### File: `manifests/home-services/ai-gateway.yaml`

Single file with `---` separators per repo convention.

```yaml
apiVersion: v1
kind: Service
metadata:
  name: ai-gateway
  namespace: home-services
spec:
  selector: { app: ai-gateway }
  ports:
    - name: grpc
      port: 50052
      targetPort: 50052
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-gateway
  namespace: home-services
spec:
  replicas: 1
  selector: { matchLabels: { app: ai-gateway } }
  template:
    metadata:
      labels: { app: ai-gateway }
    spec:
      containers:
        - name: ai-gateway
          image: gitea.nik4nao.com/nik/ai-gateway:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 50052
              name: grpc
          env:
            - { name: GRPC_LISTEN_ADDR, value: ":50052" }
            - { name: OLLAMA_URL, value: "http://192.168.7.96:11434" }
            - { name: OLLAMA_MODEL, value: "llama3" }
            - { name: HA_GATEWAY_ADDR, value: "ha-gateway.home-services.svc.cluster.local:50051" }
            - { name: HA_GATEWAY_TLS_CA_FILE,   value: "/etc/ai-gateway/tls/ca.crt" }
            - { name: HA_GATEWAY_TLS_CERT_FILE, value: "/etc/ai-gateway/tls/tls.crt" }
            - { name: HA_GATEWAY_TLS_KEY_FILE,  value: "/etc/ai-gateway/tls/tls.key" }
            - { name: HA_GATEWAY_SERVER_NAME,   value: "ha-gateway.home-services.svc.cluster.local" }
            - { name: LOG_LEVEL,  value: "info" }
            - { name: LOG_FORMAT, value: "json" }
            - { name: OTEL_EXPORTER_OTLP_ENDPOINT,
                value: "otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317" }
            - { name: OTEL_SERVICE_NAME, value: "ai-gateway" }
          volumeMounts:
            - name: tls
              mountPath: /etc/ai-gateway/tls
              readOnly: true
          readinessProbe:
            tcpSocket: { port: 50052 }
            initialDelaySeconds: 3
            periodSeconds: 10
          livenessProbe:
            tcpSocket: { port: 50052 }
            initialDelaySeconds: 10
            periodSeconds: 20
      volumes:
        - name: tls
          secret:
            secretName: ai-gateway-client-tls
      imagePullSecrets:
        - name: gitea-registry
```

No resource `limits`/`requests` yet — matches current repo convention (memory limits not yet enforced on pods).

---

## 12. discord-bot Changes

### New: `services/discord-bot/adapters/outbound/aigateway/client.go`
- gRPC client to `ai-gateway.home-services.svc.cluster.local:50052`, **plaintext** (no auth on ai-gateway's inbound surface yet).
- Exposes `Query(ctx, text string) (reply string, err error)`.
- Inject into existing command handler.

### Removed / simplified
- If `discord-bot` currently contains any direct Ollama calls, remove them.
- Slash command handler for free-form queries simply calls `aigateway.Query(ctx, msg.Content)` and posts the returned reply.
- Event-notification path (existing Discord → notify flow) is untouched.

### Config additions to discord-bot
- `AI_GATEWAY_ADDR` (default `ai-gateway.home-services.svc.cluster.local:50052`).

---

## 13. CI / Build

### `services/ai-gateway/Dockerfile`
Multi-stage build matching existing services:

```dockerfile
FROM golang:1.26 AS build
WORKDIR /src
COPY go.work go.work.sum ./
COPY gen ./gen
COPY services/ai-gateway ./services/ai-gateway
WORKDIR /src/services/ai-gateway
RUN CGO_ENABLED=0 go build -o /out/ai-gateway ./cmd/ai-gateway

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /out/ai-gateway /ai-gateway
USER nonroot:nonroot
ENTRYPOINT ["/ai-gateway"]
```

### Gitea Actions workflow
Mirror the existing `ha-gateway` workflow:
- Trigger on pushes touching `services/ai-gateway/**`, `gen/ai/**`, or `proto/ai/**`.
- `docker buildx` multiarch build (`linux/amd64,linux/arm64`).
- Push to `gitea.nik4nao.com/nik/ai-gateway:latest` and `:${{ github.sha }}`.
- Use the Gitea API token (`read:package` + `write:package`) as registry password — **not** the account password.
- Remember: buildkit CA must be injected each run (existing runner pattern).

---

## 14. Workspace Wiring

### `go.work` — add line:
```
use ./services/ai-gateway
```
Keep the existing `replace gitea.nik4nao.com/nik/home-services/gen => ../gen` in `services/ai-gateway/go.mod`.

### `services/ai-gateway/go.mod` dependencies
- `google.golang.org/grpc`
- `google.golang.org/protobuf`
- `go.opentelemetry.io/otel`
- `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`
- `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`
- `go.opentelemetry.io/otel/sdk`
- `go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc`
- `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp`

---

## 15. Testing

### Unit tests (`services/ai-gateway/domain/service_test.go`)
- Fake `LLMClient` returning canned JSON strings for each intent.
- Fake `HAClient` recording calls.
- Assert:
  - Valid `turn_on_light` JSON → `HAClient.TurnOnLight` called with correct entity; reply matches.
  - Invalid JSON → graceful reply, no panic, no HA call.
  - `intent="none"` → no HA call; reply passed through.
  - HA call returning error → reply contains "couldn't reach Home Assistant"; `ActionTaken=false`.

### Integration smoke test (manual, post-deploy)
```bash
# From inside the cluster:
grpcurl -plaintext -d '{"text":"turn on the living room light","source":"manual"}' \
  ai-gateway.home-services.svc.cluster.local:50052 ai.v1.AIService/Query
```

### mTLS verification
```bash
# Should succeed (using mounted cert):
kubectl exec -n home-services deploy/ai-gateway -- /ai-gateway --selftest  # if implemented
# Or inspect via openssl from within the pod if distroless allows a debug sidecar.
```

---

## 16. Rollout Order

Implement in this order. Each step should compile and tests should pass before the next.

1. **Proto + gen** — add `proto/ai/v1/ai.proto`, run `buf generate`, commit `gen/ai/v1/`.
2. **Scaffold** — create `services/ai-gateway/` with `go.mod`, `main.go` (stub), update `go.work`.
3. **Domain** — `intent.go`, `prompt.go`, `service.go` + unit tests with fakes.
4. **Ollama adapter** — HTTP client, manual curl-based validation against `192.168.7.96:11434`.
5. **ha-gateway adapter** — mTLS dial, wrap generated clients, satisfy `domain.HAClient`.
6. **Inbound gRPC adapter** — server, interceptors.
7. **Observability** — logging + OTel init.
8. **Entry point** — wire everything in `cmd/ai-gateway/main.go`.
9. **Dockerfile + CI** — build and push image to Gitea registry.
10. **Cert-manager Certificate** — apply `ai-gateway-client-cert.yaml`; verify `ai-gateway-client-tls` secret is created.
11. **Deployment manifest** — apply `ai-gateway.yaml`; verify pod ready, logs clean, `grpcurl` smoke test passes.
12. **discord-bot update** — add `aigateway` outbound adapter, remove any direct Ollama usage, redeploy.
13. **End-to-end test** — issue a Discord slash command, observe:
    - Discord → ai-gateway → Ollama → ai-gateway → ha-gateway (mTLS) → HA → reply back.
    - Traces visible in Tempo, logs in Loki, metrics in Prometheus.

---

## 17. Open Questions / Deferred

- **Auth on ai-gateway's inbound surface:** currently none. Revisit when `alexa-bridge` lands — Alexa path is public-ingress, so ai-gateway may eventually need mTLS inbound too.
- **Intent schema evolution:** if the set of intents grows meaningfully, consider moving the schema into the proto (enum + oneof) rather than free-form JSON. For now, JSON keeps the LLM prompt simple.
- **Conversation memory:** out of scope. If needed later, add a per-`source` session store (Valkey in `home-services`).
- **Prompt templates per model:** `llama3` works with the current system prompt. If swapping to a smaller model, prompt may need tuning — keep `BuildPrompt` easy to override via config.

---

## 18. Acceptance Criteria

- [ ] `ai-gateway` pod runs ready in `home-services` namespace.
- [ ] `grpcurl` smoke test (§15) returns a structured `QueryResponse` for a light command.
- [ ] Light actually turns on/off in Home Assistant when tested end-to-end.
- [ ] ha-gateway logs show mTLS handshake succeeded with CN=`ai-gateway`.
- [ ] Traces for a full Discord query show three spans: `discord-bot` → `ai-gateway` → `ha-gateway`.
- [ ] `discord-bot` contains no direct references to `OLLAMA_URL` or Ollama HTTP client code.
- [ ] Unit tests pass in CI; Docker image builds multiarch.