vllm/vllm/distributed/communication_op.py

from typing import Any, Dict, Optional, Union

import torch
import torch.distributed

from .parallel_state import get_tp_group


def tensor_model_parallel_all_reduce(input_: torch.Tensor) -> torch.Tensor:
    """All-reduce the input tensor across model parallel group."""
    return get_tp_group().all_reduce(input_)


def tensor_model_parallel_all_gather(input_: torch.Tensor,
                                     dim: int = -1) -> torch.Tensor:
    """All-gather the input tensor across model parallel group."""
    return get_tp_group().all_gather(input_, dim)


def tensor_model_parallel_gather(input_: torch.Tensor,
                                 dst: int = 0,
                                 dim: int = -1) -> Optional[torch.Tensor]:
    """Gather the input tensor across model parallel group."""
    return get_tp_group().gather(input_, dst, dim)


def broadcast_tensor_dict(tensor_dict: Optional[Dict[Any, Union[torch.Tensor,
                                                                Any]]] = None,
                          src: int = 0):
    if not torch.distributed.is_initialized():
        return tensor_dict
    return get_tp_group().broadcast_tensor_dict(tensor_dict, src)
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`from typing import Any, Dict, Optional, Union`
Simplify broadcast logic for control messages (#2501) 2024-01-20 03:23:30 +08:00
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-03 06:36:09 +08:00			`import torch`
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`import torch.distributed`
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-03 06:36:09 +08:00
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`from .parallel_state import get_tp_group`
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-03 06:36:09 +08:00

Simplify broadcast logic for control messages (#2501) 2024-01-20 03:23:30 +08:00			`def tensor_model_parallel_all_reduce(input_: torch.Tensor) -> torch.Tensor:`
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`"""All-reduce the input tensor across model parallel group."""`
			`return get_tp_group().all_reduce(input_)`
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-03 06:36:09 +08:00

Simplify broadcast logic for control messages (#2501) 2024-01-20 03:23:30 +08:00			`def tensor_model_parallel_all_gather(input_: torch.Tensor,`
			`dim: int = -1) -> torch.Tensor:`
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-03 06:36:09 +08:00			`"""All-gather the input tensor across model parallel group."""`
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`return get_tp_group().all_gather(input_, dim)`
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-04 03:30:22 +08:00

Simplify broadcast logic for control messages (#2501) 2024-01-20 03:23:30 +08:00			`def tensor_model_parallel_gather(input_: torch.Tensor,`
			`dst: int = 0,`
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410) 2024-08-13 13:33:41 +08:00			`dim: int = -1) -> Optional[torch.Tensor]:`
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`"""Gather the input tensor across model parallel group."""`
			`return get_tp_group().gather(input_, dst, dim)`
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-04 03:30:22 +08:00

[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`def broadcast_tensor_dict(tensor_dict: Optional[Dict[Any, Union[torch.Tensor,`
			`Any]]] = None,`
			`src: int = 0):`
			`if not torch.distributed.is_initialized():`
[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com> 2024-05-16 15:53:51 +08:00			`return tensor_dict`
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293) 2024-06-13 08:27:08 +08:00			`return get_tp_group().broadcast_tensor_dict(tensor_dict, src)`