- Implemented new gRPC service `AIService` in `proto/ai/v1/ai.proto` for handling natural language queries. - Generated Go code for the gRPC service and messages in `gen/ai/v1/`. - Created `services/ai-gateway/` directory structure with necessary files for the service. - Added configuration loading and structured logging. - Implemented domain logic for intent parsing and interaction with Home Assistant. - Established outbound adapters for Ollama and Home Assistant with mTLS support. - Updated `go.work` to include the new service and maintain existing dependencies. - Modified `discord-bot` to use the new `ai-gateway` for AI interactions. - Added deployment manifest for Kubernetes and CI/CD configuration for building and deploying the service.
22 KiB
ai-gateway — Implementation Plan
This plan describes the implementation of a new Go microservice, ai-gateway, in the home-services monorepo (gitea.nik4nao.com/nik/home-services). It centralizes all AI/LLM logic behind a gRPC API so callers (discord-bot, alexa-bridge) remain thin transport adapters with zero AI knowledge.
1. Goals & Non-Goals
Goals
- New gRPC service
ai-gatewaylistening on:50052. - Owns all AI logic: Ollama connection, prompt construction, LLM intent parsing, dispatch to
ha-gateway. - Callers send raw user text via
QueryRequest; receive a human-readable reply inQueryResponse. - mTLS client authentication when calling
ha-gateway(ha-gateway requires mTLS). - Hexagonal architecture, matching the existing
ha-gatewaylayout. - Structured logging via
slog, OTel OTLP gRPC traces/metrics. - Deployed to the
home-servicesnamespace on K3s.
Non-Goals
- No auth on
ai-gateway's own inbound gRPC surface in this iteration (in-cluster only; match currentha-gatewayposture). - No streaming responses — unary only.
- No conversation memory — each
Queryis stateless. - No new Home Assistant features beyond what
ha-gatewayalready exposes (LightService + EntityService).
2. Repository Layout
All paths are relative to the home-services repo root.
proto/
ai/v1/ai.proto # NEW
gen/
ai/v1/ # NEW (generated; committed)
ai.pb.go
ai_grpc.pb.go
services/
ai-gateway/ # NEW
go.mod
cmd/
ai-gateway/
main.go
config/
config.go
domain/
prompt.go
service.go
intent.go
adapters/
inbound/
grpc/
server.go
outbound/
ollama/
client.go
hagateway/
client.go
internal/
observability/
logging.go
otel.go
Dockerfile
.dockerignore
discord-bot/ # MODIFIED
adapters/outbound/aigateway/client.go # NEW
(remove any direct Ollama code if present)
Also update:
go.work— add./services/ai-gatewayand keepreplacedirective to../gen.buf.gen.yaml/buf.yaml— include the newai/v1proto package.
3. Proto Definition
File: proto/ai/v1/ai.proto
syntax = "proto3";
package ai.v1;
option go_package = "gitea.nik4nao.com/nik/home-services/gen/ai/v1;aiv1";
// AIService accepts free-form natural language queries and returns a
// human-readable reply. It encapsulates LLM prompting, intent parsing,
// and dispatch to downstream services (e.g. ha-gateway).
service AIService {
rpc Query(QueryRequest) returns (QueryResponse);
}
message QueryRequest {
// Raw user text, e.g. "turn on the living room light".
string text = 1;
// Optional caller identifier for logging/tracing (e.g. "discord-bot").
string source = 2;
}
message QueryResponse {
// Human-readable reply to show the user.
string reply = 1;
// Parsed intent name, if any. Empty if no actionable intent was detected.
string intent = 2;
// True if an action was dispatched to a downstream service.
bool action_taken = 3;
}
Generation
- Run
buf generatefrom repo root. - Commit
gen/ai/v1/*.pb.goandgen/ai/v1/*_grpc.pb.go(per existing convention —gen/is committed to avoid CI codegen dependency).
4. Configuration (services/ai-gateway/config/config.go)
Load from environment. Use os.Getenv with defaults (matches existing ha-gateway style — no new dep).
| Env Var | Default | Purpose |
|---|---|---|
GRPC_LISTEN_ADDR |
:50052 |
Inbound gRPC bind address |
OLLAMA_URL |
http://192.168.7.96:11434 |
Ollama HTTP API (direct LAN IP; no K8s Service) |
OLLAMA_MODEL |
llama3 |
Model name |
OLLAMA_TIMEOUT |
30s |
HTTP timeout for Ollama calls |
HA_GATEWAY_ADDR |
ha-gateway.home-services.svc.cluster.local:50051 |
ha-gateway gRPC endpoint |
HA_GATEWAY_TLS_CA_FILE |
/etc/ai-gateway/tls/ca.crt |
CA cert that signed ha-gateway's server cert |
HA_GATEWAY_TLS_CERT_FILE |
/etc/ai-gateway/tls/tls.crt |
ai-gateway's client cert (for mTLS) |
HA_GATEWAY_TLS_KEY_FILE |
/etc/ai-gateway/tls/tls.key |
ai-gateway's client key |
HA_GATEWAY_SERVER_NAME |
ha-gateway.home-services.svc.cluster.local |
SNI / cert verification name |
LOG_LEVEL |
info |
debug/info/warn/error |
LOG_FORMAT |
json |
json or text |
OTEL_EXPORTER_OTLP_ENDPOINT |
otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317 |
OTLP gRPC endpoint |
OTEL_SERVICE_NAME |
ai-gateway |
Service name for traces/metrics |
Provide a Config struct with a Load() function returning (Config, error). Validate required files exist at startup.
5. Domain Layer
domain/intent.go
Define the intent contract the LLM must produce:
package domain
type Intent struct {
Name string `json:"intent"` // e.g. "turn_on_light", "turn_off_light", "none"
Entity string `json:"entity"` // e.g. "living_room" (friendly name or entity_id)
Params map[string]string `json:"params"` // optional, e.g. {"brightness":"80"}
Reply string `json:"reply"` // what to say back to the user
}
const (
IntentNone = "none"
IntentTurnOnLight = "turn_on_light"
IntentTurnOffLight = "turn_off_light"
IntentListEntities = "list_entities"
)
domain/prompt.go
Build the Ollama prompt. The system prompt MUST instruct the model to return only a single JSON object matching the Intent schema. No markdown fences, no prose.
package domain
import "fmt"
const systemPrompt = `You are a home automation assistant. Given a user request, respond with a single JSON object and nothing else — no markdown, no code fences, no explanation.
Schema:
{
"intent": "turn_on_light" | "turn_off_light" | "list_entities" | "none",
"entity": "<friendly_name_or_empty>",
"params": { "<key>": "<value>" },
"reply": "<short human-readable reply>"
}
Rules:
- If the request is not actionable, use intent="none" and put the conversational answer in "reply".
- Always include all four fields. Use "" or {} for empty values.
- Do not wrap the JSON in backticks.`
func BuildPrompt(userText string) string {
return fmt.Sprintf("%s\n\nUser: %s", systemPrompt, userText)
}
domain/service.go
The orchestrator. Depends on two ports (interfaces) defined here:
package domain
import "context"
type LLMClient interface {
Generate(ctx context.Context, prompt string) (string, error)
}
type HAClient interface {
TurnOnLight(ctx context.Context, entity string, params map[string]string) error
TurnOffLight(ctx context.Context, entity string) error
ListEntities(ctx context.Context) ([]string, error)
}
type Service struct {
llm LLMClient
ha HAClient
log *slog.Logger
}
func NewService(llm LLMClient, ha HAClient, log *slog.Logger) *Service { /* ... */ }
type QueryResult struct {
Reply string
Intent string
ActionTaken bool
}
func (s *Service) Query(ctx context.Context, text string) (QueryResult, error) {
// 1. BuildPrompt(text)
// 2. s.llm.Generate(ctx, prompt)
// 3. json.Unmarshal into Intent
// - On unmarshal error: log at warn, return reply = "I didn't understand that."
// 4. switch intent.Name:
// turn_on_light -> s.ha.TurnOnLight(...)
// turn_off_light -> s.ha.TurnOffLight(...)
// list_entities -> s.ha.ListEntities(...); format into reply
// none / default -> reply = intent.Reply
// 5. Return QueryResult
}
Error handling:
- LLM call failure → return error; inbound adapter maps to gRPC
Unavailable. - JSON parse failure → do NOT error; return a friendly "I didn't understand" reply and log the raw LLM output at
warnwith the original text (not error). - HA dispatch failure → log at
error, return reply "I couldn't reach Home Assistant right now.";ActionTaken=false.
6. Outbound Adapters
adapters/outbound/ollama/client.go
- Plain
net/http.Clientwith configured timeout. - POST to
{OLLAMA_URL}/api/generatewith body:{ "model": "<OLLAMA_MODEL>", "prompt": "<prompt>", "stream": false } - Decode JSON response, return the
responsefield as a string. - Implement
domain.LLMClient. - Wrap the HTTP client with OTel instrumentation (
otelhttp.NewTransport).
adapters/outbound/hagateway/client.go
This is the mTLS-critical piece.
- Construct a
*grpc.ClientConntoHA_GATEWAY_ADDRwith TLS credentials built from the three cert files:func loadTLSCredentials(caFile, certFile, keyFile, serverName string) (credentials.TransportCredentials, error) { caPEM, err := os.ReadFile(caFile) if err != nil { return nil, fmt.Errorf("read ca: %w", err) } cp := x509.NewCertPool() if !cp.AppendCertsFromPEM(caPEM) { return nil, errors.New("failed to append CA cert") } clientCert, err := tls.LoadX509KeyPair(certFile, keyFile) if err != nil { return nil, fmt.Errorf("load client keypair: %w", err) } return credentials.NewTLS(&tls.Config{ Certificates: []tls.Certificate{clientCert}, RootCAs: cp, ServerName: serverName, MinVersion: tls.VersionTLS13, }), nil } - Use
grpc.NewClient(addr, grpc.WithTransportCredentials(creds), grpc.WithStatsHandler(otelgrpc.NewClientHandler())). - Wrap the generated ha-gateway clients (
LightServiceClient,EntityServiceClient) to satisfydomain.HAClient. - Expose a
Close()method for graceful shutdown.
Cert source: the cert files will be projected into the pod via a Kubernetes Secret mounted at /etc/ai-gateway/tls/. See deployment manifest below. Issuing the cert is covered in §10.
7. Inbound Adapter
adapters/inbound/grpc/server.go
- Implements
aiv1.AIServiceServer. Query(ctx, req)→ callsdomain.Service.Query(ctx, req.Text)→ mapsQueryResulttoQueryResponse.- Attach OTel interceptor:
grpc.StatsHandler(otelgrpc.NewServerHandler()). - Attach a slog unary interceptor that logs method, duration, caller
source, and error code. - Register reflection service only if
LOG_LEVEL=debug(convenience forgrpcurl).
8. Observability (internal/observability/)
Copy the pattern from ha-gateway:
logging.go
NewLogger(level, format string) *slog.Loggerreturning eitherslog.NewJSONHandlerorslog.NewTextHandlerwrappingos.Stdout.
otel.go
InitOTel(ctx, endpoint, serviceName) (shutdown func(context.Context) error, err error).- Uses
otlptracegrpc+otlpmetricgrpcexporters, insecure credentials (in-cluster). - Sets global
TracerProviderandMeterProvider. - Resource attributes:
service.name,service.namespace=home-services.
9. Entry Point (cmd/ai-gateway/main.go)
Standard startup sequence:
- Load config.
- Build logger.
- Init OTel; defer shutdown.
- Build Ollama client.
- Build ha-gateway client (mTLS); defer
Close(). - Build domain service.
- Build gRPC server with interceptors, register
AIService. - Listen on
GRPC_LISTEN_ADDR. - Handle
SIGINT/SIGTERMfor graceful shutdown:server.GracefulStop()with a 10s timeout, then OTel shutdown.
10. TLS / mTLS Plumbing
ha-gateway requires mTLS. ai-gateway needs a client certificate signed by the same CA that ha-gateway trusts.
Approach: cert-manager + internal-ca-issuer
Create a Certificate resource for ai-gateway (file: manifests/home-services/ai-gateway-client-cert.yaml):
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: ai-gateway-client
namespace: home-services
spec:
secretName: ai-gateway-client-tls
duration: 2160h # 90d
renewBefore: 360h # 15d
subject:
organizations: [home-services]
commonName: ai-gateway
usages:
- client auth
issuerRef:
name: internal-ca-issuer
kind: ClusterIssuer
group: cert-manager.io
Important: use internal-ca-issuer (the CA issuer), never internal-ca (the bootstrap self-signed issuer). This matches the homelab convention.
The resulting secret ai-gateway-client-tls contains tls.crt, tls.key, and ca.crt — mount all three.
Verify ha-gateway's CA trust
Confirm ha-gateway's server TLS config trusts internal-ca-issuer's CA (it should, since both use the same cluster CA). If ha-gateway uses a separate client-auth CA, adjust the issuer accordingly.
11. Kubernetes Manifest
File: manifests/home-services/ai-gateway.yaml
Single file with --- separators per repo convention.
apiVersion: v1
kind: Service
metadata:
name: ai-gateway
namespace: home-services
spec:
selector: { app: ai-gateway }
ports:
- name: grpc
port: 50052
targetPort: 50052
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-gateway
namespace: home-services
spec:
replicas: 1
selector: { matchLabels: { app: ai-gateway } }
template:
metadata:
labels: { app: ai-gateway }
spec:
containers:
- name: ai-gateway
image: gitea.nik4nao.com/nik/ai-gateway:latest
imagePullPolicy: Always
ports:
- containerPort: 50052
name: grpc
env:
- { name: GRPC_LISTEN_ADDR, value: ":50052" }
- { name: OLLAMA_URL, value: "http://192.168.7.96:11434" }
- { name: OLLAMA_MODEL, value: "llama3" }
- { name: HA_GATEWAY_ADDR, value: "ha-gateway.home-services.svc.cluster.local:50051" }
- { name: HA_GATEWAY_TLS_CA_FILE, value: "/etc/ai-gateway/tls/ca.crt" }
- { name: HA_GATEWAY_TLS_CERT_FILE, value: "/etc/ai-gateway/tls/tls.crt" }
- { name: HA_GATEWAY_TLS_KEY_FILE, value: "/etc/ai-gateway/tls/tls.key" }
- { name: HA_GATEWAY_SERVER_NAME, value: "ha-gateway.home-services.svc.cluster.local" }
- { name: LOG_LEVEL, value: "info" }
- { name: LOG_FORMAT, value: "json" }
- { name: OTEL_EXPORTER_OTLP_ENDPOINT,
value: "otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317" }
- { name: OTEL_SERVICE_NAME, value: "ai-gateway" }
volumeMounts:
- name: tls
mountPath: /etc/ai-gateway/tls
readOnly: true
readinessProbe:
tcpSocket: { port: 50052 }
initialDelaySeconds: 3
periodSeconds: 10
livenessProbe:
tcpSocket: { port: 50052 }
initialDelaySeconds: 10
periodSeconds: 20
volumes:
- name: tls
secret:
secretName: ai-gateway-client-tls
imagePullSecrets:
- name: gitea-registry
No resource limits/requests yet — matches current repo convention (memory limits not yet enforced on pods).
12. discord-bot Changes
New: services/discord-bot/adapters/outbound/aigateway/client.go
- gRPC client to
ai-gateway.home-services.svc.cluster.local:50052, plaintext (no auth on ai-gateway's inbound surface yet). - Exposes
Query(ctx, text string) (reply string, err error). - Inject into existing command handler.
Removed / simplified
- If
discord-botcurrently contains any direct Ollama calls, remove them. - Slash command handler for free-form queries simply calls
aigateway.Query(ctx, msg.Content)and posts the returned reply. - Event-notification path (existing Discord → notify flow) is untouched.
Config additions to discord-bot
AI_GATEWAY_ADDR(defaultai-gateway.home-services.svc.cluster.local:50052).
13. CI / Build
services/ai-gateway/Dockerfile
Multi-stage build matching existing services:
FROM golang:1.26 AS build
WORKDIR /src
COPY go.work go.work.sum ./
COPY gen ./gen
COPY services/ai-gateway ./services/ai-gateway
WORKDIR /src/services/ai-gateway
RUN CGO_ENABLED=0 go build -o /out/ai-gateway ./cmd/ai-gateway
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /out/ai-gateway /ai-gateway
USER nonroot:nonroot
ENTRYPOINT ["/ai-gateway"]
Gitea Actions workflow
Mirror the existing ha-gateway workflow:
- Trigger on pushes touching
services/ai-gateway/**,gen/ai/**, orproto/ai/**. docker buildxmultiarch build (linux/amd64,linux/arm64).- Push to
gitea.nik4nao.com/nik/ai-gateway:latestand:${{ github.sha }}. - Use the Gitea API token (
read:package+write:package) as registry password — not the account password. - Remember: buildkit CA must be injected each run (existing runner pattern).
14. Workspace Wiring
go.work — add line:
use ./services/ai-gateway
Keep the existing replace gitea.nik4nao.com/nik/home-services/gen => ../gen in services/ai-gateway/go.mod.
services/ai-gateway/go.mod dependencies
google.golang.org/grpcgoogle.golang.org/protobufgo.opentelemetry.io/otelgo.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpcgo.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpcgo.opentelemetry.io/otel/sdkgo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpcgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
15. Testing
Unit tests (services/ai-gateway/domain/service_test.go)
- Fake
LLMClientreturning canned JSON strings for each intent. - Fake
HAClientrecording calls. - Assert:
- Valid
turn_on_lightJSON →HAClient.TurnOnLightcalled with correct entity; reply matches. - Invalid JSON → graceful reply, no panic, no HA call.
intent="none"→ no HA call; reply passed through.- HA call returning error → reply contains "couldn't reach Home Assistant";
ActionTaken=false.
- Valid
Integration smoke test (manual, post-deploy)
# From inside the cluster:
grpcurl -plaintext -d '{"text":"turn on the living room light","source":"manual"}' \
ai-gateway.home-services.svc.cluster.local:50052 ai.v1.AIService/Query
mTLS verification
# Should succeed (using mounted cert):
kubectl exec -n home-services deploy/ai-gateway -- /ai-gateway --selftest # if implemented
# Or inspect via openssl from within the pod if distroless allows a debug sidecar.
16. Rollout Order
Implement in this order. Each step should compile and tests should pass before the next.
- Proto + gen — add
proto/ai/v1/ai.proto, runbuf generate, commitgen/ai/v1/. - Scaffold — create
services/ai-gateway/withgo.mod,main.go(stub), updatego.work. - Domain —
intent.go,prompt.go,service.go+ unit tests with fakes. - Ollama adapter — HTTP client, manual curl-based validation against
192.168.7.96:11434. - ha-gateway adapter — mTLS dial, wrap generated clients, satisfy
domain.HAClient. - Inbound gRPC adapter — server, interceptors.
- Observability — logging + OTel init.
- Entry point — wire everything in
cmd/ai-gateway/main.go. - Dockerfile + CI — build and push image to Gitea registry.
- Cert-manager Certificate — apply
ai-gateway-client-cert.yaml; verifyai-gateway-client-tlssecret is created. - Deployment manifest — apply
ai-gateway.yaml; verify pod ready, logs clean,grpcurlsmoke test passes. - discord-bot update — add
aigatewayoutbound adapter, remove any direct Ollama usage, redeploy. - End-to-end test — issue a Discord slash command, observe:
- Discord → ai-gateway → Ollama → ai-gateway → ha-gateway (mTLS) → HA → reply back.
- Traces visible in Tempo, logs in Loki, metrics in Prometheus.
17. Open Questions / Deferred
- Auth on ai-gateway's inbound surface: currently none. Revisit when
alexa-bridgelands — Alexa path is public-ingress, so ai-gateway may eventually need mTLS inbound too. - Intent schema evolution: if the set of intents grows meaningfully, consider moving the schema into the proto (enum + oneof) rather than free-form JSON. For now, JSON keeps the LLM prompt simple.
- Conversation memory: out of scope. If needed later, add a per-
sourcesession store (Valkey inhome-services). - Prompt templates per model:
llama3works with the current system prompt. If swapping to a smaller model, prompt may need tuning — keepBuildPrompteasy to override via config.
18. Acceptance Criteria
ai-gatewaypod runs ready inhome-servicesnamespace.grpcurlsmoke test (§15) returns a structuredQueryResponsefor a light command.- Light actually turns on/off in Home Assistant when tested end-to-end.
- ha-gateway logs show mTLS handshake succeeded with CN=
ai-gateway. - Traces for a full Discord query show three spans:
discord-bot→ai-gateway→ha-gateway. discord-botcontains no direct references toOLLAMA_URLor Ollama HTTP client code.- Unit tests pass in CI; Docker image builds multiarch.