Solid breakdown on the abstraction trap. I worked on a project where we tried building on GPT APIs for regulatory compliance and hit the wall within two months when costs ballooned past budget. The routing point is often overlooked, like you dont need frontier models for every prompt when a tuned small model cuts cost by 80%. Also the benchamrking idea is key because public evals rarley map to real use cases.
Solid breakdown on the abstraction trap. I worked on a project where we tried building on GPT APIs for regulatory compliance and hit the wall within two months when costs ballooned past budget. The routing point is often overlooked, like you dont need frontier models for every prompt when a tuned small model cuts cost by 80%. Also the benchamrking idea is key because public evals rarley map to real use cases.
Awesome example, thanks for sharing your experience.
Curious, did you end up moving to a multi-model setup?