Bonus: Talk to Your Model & Wrap-Up
Your genai-demo namespace now runs Docker Model Runner as a Pod, serving the ai/smollm2 model behind a Service called docker-model-runner. It exposes an OpenAI-compatible API, so you can talk to it exactly like you would the OpenAI API - but it's running on your Kubernetes cluster.
๐ฌ Send the model a prompt
Just like you did with the Service section, you'll launch a throwaway pod to make a request from inside the cluster - this time a chat completion:
kubectl run ask -n genai-demo --rm -i --restart=Never --image=curlimages/curl -- \
-s http://docker-model-runner/engines/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"ai/smollm2","messages":[{"role":"user","content":"In one short sentence, what is Kubernetes?"}]}'
After a moment you'll get back a JSON response with the model's answer in choices[0].message.content - generated by an LLM running in a Pod somewhere on your 3-node cluster. ๐คฏ
Tip
Change the content text and run it again to ask anything you like. The model is small (fast, but not genius-level) - it's here to prove the architecture, not to win a Turing test.
๐ How the app finds the model
Remember the models block in the Compose file? Compose Bridge turned that into wiring you'd otherwise do by hand. Look again at compose-bridge/k8s/base/webchat-deployment.yaml - these env vars were injected automatically:
env:
- name: MODEL_URL
value: "http://docker-model-runner/engines/v1/"
- name: MODEL_NAME
value: "ai/smollm2"
A real frontend would read MODEL_URL and MODEL_NAME from its environment and call the model - no hard-coded addresses, thanks to the same Service-and-DNS pattern you learned in the Services section.
๐งน Clean up
When you're done exploring, tidy up both apps:
kubectl delete -k compose-bridge/k8s/overlays/model-runner
kubectl delete -f k8s/
๐ You did it - the whole journey
On a real Kubernetes cluster, from scratch, you:
- ๐ฆ Built an image and ran it in a Pod
- ๐ Kept it healthy and self-healing with a Deployment
- ๐ Gave it stable networking and load balancing with a Service
- ๐ Scaled it, did a zero-downtime rolling update, and rolled back
- ๐ Exposed it to the world with an Ingress
- ๐ Turned a
compose.yamlinto manifests with Compose Bridge - ๐ค Ran a real LLM on Kubernetes and queried it
That's a genuinely complete picture of how apps - including AI apps - run on Kubernetes.
๐ญ Where to go next
- Run it on Docker Desktop - enable Kubernetes in Settings โ Kubernetes, then
kubectl apply -f k8s/anddocker compose bridge converton your own machine. Everything here works there too. - ConfigMaps & Secrets - externalize configuration and credentials instead of baking them into images.
- Persistent storage - you already used a PersistentVolumeClaim for the model; explore PersistentVolumes for your own stateful apps.
- Health & resources - tune liveness/readiness probes and CPU/memory requests and limits.
- Helm - package and template manifests for reuse across environments.
Thanks for spending time with Kubernetes 101 - now go deploy something. ๐ณ