What is inference cost?

Inference cost is the cost of running a model after training is complete, when handling real prompts and generating outputs.

How is it measured?

In API settings, it is usually tracked by input and output token pricing.
In local deployments, teams estimate it from hardware depreciation, power usage, and operations overhead.

Why does it matter?

Inference cost directly affects pricing strategy, feature scope, and unit economics, making it a core business metric for AI products.

Inference Cost

What is inference cost?

How is it measured?

Why does it matter?

Related terms