vllm/parallel_utils at 2c08ff23c07f2f8d51da8e1783c5346dccc1fd12 - vllm

History

Massimiliano Pronesti 93dc5a2870 chore(vllm): codespell for spell checking (#2820 )		2024-02-21 18:56:01 -08:00
..
__init__.py	TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181 )	2023-10-02 15:36:09 -07:00
communication_op.py	Use CuPy for CUDA graphs (#2811 )	2024-02-13 11:32:06 -08:00
cupy_utils.py	Use CuPy for CUDA graphs (#2811 )	2024-02-13 11:32:06 -08:00
custom_all_reduce.py	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
parallel_state.py	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
README.md	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
utils.py	TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )	2023-11-15 22:50:41 -08:00

The files in this folder are ported from Megatron-LM. We only keep the codes that are used in inference.