Bonus: Talk to Your Model & Wrap-Up

Your genai-demo namespace now runs Docker Model Runner as a Pod, serving the ai/smollm2 model behind a Service called docker-model-runner. It exposes an OpenAI-compatible API, so you can talk to it exactly like you would the OpenAI API - but it's running on your Kubernetes cluster.

💬 Send the model a prompt

Just like you did with the Service section, you'll launch a throwaway pod to make a request from inside the cluster - this time a chat completion:

kubectl run ask -n genai-demo --rm -i --restart=Never --image=curlimages/curl -- \
  -s http://docker-model-runner/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"ai/smollm2","messages":[{"role":"user","content":"In one short sentence, what is Kubernetes?"}]}'

After a moment you'll get back a JSON response with the model's answer in choices[0].message.content - generated by an LLM running in a Pod somewhere on your 3-node cluster. 🤯

Tip

Change the content text and run it again to ask anything you like. The model is small (fast, but not genius-level) - it's here to prove the architecture, not to win a Turing test.

🔌 How the app finds the model

Remember the models block in the Compose file? Compose Bridge turned that into wiring you'd otherwise do by hand. Look again at compose-bridge/k8s/base/webchat-deployment.yaml - these env vars were injected automatically:

env:
  - name: MODEL_URL
    value: "http://docker-model-runner/engines/v1/"
  - name: MODEL_NAME
    value: "ai/smollm2"

A real frontend would read MODEL_URL and MODEL_NAME from its environment and call the model - no hard-coded addresses, thanks to the same Service-and-DNS pattern you learned in the Services section.

🧹 Clean up

When you're done exploring, tidy up both apps:

kubectl delete -k compose-bridge/k8s/overlays/model-runner
kubectl delete -f k8s/

🏁 You did it - the whole journey

On a real Kubernetes cluster, from scratch, you:

📦 Built an image and ran it in a Pod
🚀 Kept it healthy and self-healing with a Deployment
🔌 Gave it stable networking and load balancing with a Service
📈 Scaled it, did a zero-downtime rolling update, and rolled back
🌐 Exposed it to the world with an Ingress
🌉 Turned a compose.yaml into manifests with Compose Bridge
🤖 Ran a real LLM on Kubernetes and queried it

That's a genuinely complete picture of how apps - including AI apps - run on Kubernetes.

🔭 Where to go next

Run it on Docker Desktop - enable Kubernetes in Settings → Kubernetes, then kubectl apply -f k8s/ and docker compose bridge convert on your own machine. Everything here works there too.
ConfigMaps & Secrets - externalize configuration and credentials instead of baking them into images.
Persistent storage - you already used a PersistentVolumeClaim for the model; explore PersistentVolumes for your own stateful apps.
Health & resources - tune liveness/readiness probes and CPU/memory requests and limits.
Helm - package and template manifests for reuse across environments.

Thanks for spending time with Kubernetes 101 - now go deploy something. 🐳