Dev.to · about 3 hours ago · 2 min read General

Solving the GPU Pinning Saga and Gemma's Meta-Commentary

Our team fixed several issues with GPU pinning and Gemma's meta-commentary. A bug in LiteLLM 1.89.2 caused requests to hit the default port, and we had to strip the global assignment to let per-model api_base overrides function. We also fixed issues with Ollama instance pinning, scheduled-task processes, and offsite backups. Additionally, we hardened our guards against Gemma's meta-commentary dialect and upgraded our publishing funnel to cohort-based stats in Grafana. The system is now significantly quieter.

#gpu#gemma#lite-llm#ollama#grafana

Source →