Batch accumulation
Batch accumulation is a technique for reducing the GPU memory requirements of machine learning training. In a normal model training step, the gradients with respect to each parameter of a machine learning model are stored, and a single update is performed. If the batch size is too small, these gradients can cause an "out-of-memory" error. With gradient accumulation, you can accumulate these gradients in place across a set of smaller batches. This trades time for memory, allowing for performing training as if there was a GPU with more VRAM, at the cost of a longer training time.
![](https://cdn.prod.website-files.com/649f003940a53a75a2e42068/6569a51b8fdc8fd942734f04_1db855a7-51af-4656-9f08-40d017f22537.png)
Related Articles
No items found.