Solving the GPU Pinning Saga and Gemma's Meta-Commentary

Our team fixed several issues with GPU pinning and Gemma's meta-commentary. A bug in LiteLLM 1.89.2 caused requests to hit the default port, and we had to strip the global assignment to let per-model api_base overrides function. We also fixed issues with Ollama instance pinning, scheduled-task processes, and offsite backups. Additionally, we hardened our guards against Gemma's meta-commentary dialect and upgraded our publishing funnel to cohort-based stats in Grafana. The system is now significantly quieter.

Source →
FeedLens — Signal over noise Last 7 days