Skip to main content
Back to List
economics

Inference Cost

The per-request execution cost incurred when a trained model processes real user workloads

#inference cost#LLM pricing#token cost#cost per request

What is inference cost?

Inference cost is the cost of running a model after training is complete, when handling real prompts and generating outputs.

How is it measured?

In API settings, it is usually tracked by input and output token pricing.
In local deployments, teams estimate it from hardware depreciation, power usage, and operations overhead.

Why does it matter?

Inference cost directly affects pricing strategy, feature scope, and unit economics, making it a core business metric for AI products.

Related terms