The standard guidelines for building large language models (LLMs) optimize only for training costs and ignore inference costs ...