Scaling Laws Calculator
Cost
The money you have to train the model.
$
Training time
The time you have to train the model.
day(s)
# GPUs
The number of GPUs we're running on.
GPU Type
The type of GPU we're running on.
GPU Utilization
The percentage of theoretical FLOPS that are used for training.
%
GPU Price
The cost of renting a single GPU per hour.
$
GPU FLOP/s
The theoretical maximum FLOP/s of a single GPU at your desired precision.
TFLOP/s
Total FLOP/s
The total effective FLOP/s of all GPUs (FLOP/s per GPU x number of GPUs x GPU utilization).
TFLOP/s
Total FLOPs
The total FLOPs used to train the model.
TFLOPs
# params
The number of non-embedding parameters in the model
# tokens
The number of tokens used to train the model
Loss
The estimated final loss of the model.
Loss