Tensor Parallelelism

Tensor parallelism is a technique used to distribute a large model across multiple GPUs. For instance, during the multiplication of input tensors with the first weight tensor, the process involves splitting the weight tensor column-wise, multiplying each column separately with the input, and then concatenating the resulting outputs. These outputs are transferred from the GPUs and combined to produce the final result, as illustrated below.

Image courtesy of Anton Lozkhov
Source: HuggingFace

Related Articles

No items found.