Skip to content

Bonus: Talk to Your Model & Wrap-Up

Your genai-demo namespace now runs Docker Model Runner as a Pod, serving the ai/smollm2 model behind a Service called docker-model-runner. It exposes an OpenAI-compatible API, so you can talk to it exactly like you would the OpenAI API - but it's running on your Kubernetes cluster.

๐Ÿ’ฌ Send the model a prompt

Just like you did with the Service section, you'll launch a throwaway pod to make a request from inside the cluster - this time a chat completion:

kubectl run ask -n genai-demo --rm -i --restart=Never --image=curlimages/curl -- \
  -s http://docker-model-runner/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"ai/smollm2","messages":[{"role":"user","content":"In one short sentence, what is Kubernetes?"}]}'

After a moment you'll get back a JSON response with the model's answer in choices[0].message.content - generated by an LLM running in a Pod somewhere on your 3-node cluster. ๐Ÿคฏ

Tip

Change the content text and run it again to ask anything you like. The model is small (fast, but not genius-level) - it's here to prove the architecture, not to win a Turing test.

๐Ÿ”Œ How the app finds the model

Remember the models block in the Compose file? Compose Bridge turned that into wiring you'd otherwise do by hand. Look again at compose-bridge/k8s/base/webchat-deployment.yaml - these env vars were injected automatically:

env:
  - name: MODEL_URL
    value: "http://docker-model-runner/engines/v1/"
  - name: MODEL_NAME
    value: "ai/smollm2"

A real frontend would read MODEL_URL and MODEL_NAME from its environment and call the model - no hard-coded addresses, thanks to the same Service-and-DNS pattern you learned in the Services section.

๐Ÿงน Clean up

When you're done exploring, tidy up both apps:

kubectl delete -k compose-bridge/k8s/overlays/model-runner
kubectl delete -f k8s/

๐Ÿ You did it - the whole journey

On a real Kubernetes cluster, from scratch, you:

  • ๐Ÿ“ฆ Built an image and ran it in a Pod
  • ๐Ÿš€ Kept it healthy and self-healing with a Deployment
  • ๐Ÿ”Œ Gave it stable networking and load balancing with a Service
  • ๐Ÿ“ˆ Scaled it, did a zero-downtime rolling update, and rolled back
  • ๐ŸŒ Exposed it to the world with an Ingress
  • ๐ŸŒ‰ Turned a compose.yaml into manifests with Compose Bridge
  • ๐Ÿค– Ran a real LLM on Kubernetes and queried it

That's a genuinely complete picture of how apps - including AI apps - run on Kubernetes.

๐Ÿ”ญ Where to go next

  • Run it on Docker Desktop - enable Kubernetes in Settings โ†’ Kubernetes, then kubectl apply -f k8s/ and docker compose bridge convert on your own machine. Everything here works there too.
  • ConfigMaps & Secrets - externalize configuration and credentials instead of baking them into images.
  • Persistent storage - you already used a PersistentVolumeClaim for the model; explore PersistentVolumes for your own stateful apps.
  • Health & resources - tune liveness/readiness probes and CPU/memory requests and limits.
  • Helm - package and template manifests for reuse across environments.

Thanks for spending time with Kubernetes 101 - now go deploy something. ๐Ÿณ