Error in section A3
#126
by
Gusanidas
- opened
In the ratio between t_comm and t_compute for data parallel the term model_params should not be there. The t_comm should depend linearly of model_params (which is not stated) thus cancelling the term, and we end up with the same ratio (for slighlty different reasons) than in the fsdp below.
As a common sense check, it doesnt really make sense that the t_comms/t_compute ratio is model_params (that could be billions) times bigger than for fsdp.