Scaling Laws Calculator

Cost

The money you have to train the model.

$

Training time

The time you have to train the model.

day(s)

# GPUs

The number of GPUs we're running on.

GPU Type

The type of GPU we're running on.

GPU Utilization

The percentage of theoretical FLOPS that are used for training.

%

GPU Price

The cost of renting a single GPU per hour.

$

GPU FLOP/s

The theoretical maximum FLOP/s of a single GPU at your desired precision.

TFLOP/s

Total FLOP/s

The total effective FLOP/s of all GPUs (FLOP/s per GPU x number of GPUs x GPU utilization).

TFLOP/s

Total FLOPs

The total FLOPs used to train the model.

TFLOPs

# params

The number of non-embedding parameters in the model

# tokens

The number of tokens used to train the model

Loss

The estimated final loss of the model.

Loss