vllm/parallel_utils at fe6d09ae61f2281417e35f53a948b6fa898a4eba - vllm

History

Woosuk Kwon 105a40f53a [Minor] Fix false warning when TP=1 (#2674 )		2024-01-30 14:39:40 -08:00
..
__init__.py	TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181 )	2023-10-02 15:36:09 -07:00
communication_op.py	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.py	[Minor] Fix false warning when TP=1 (#2674 )	2024-01-30 14:39:40 -08:00
parallel_state.py	[Minor] Fix a small typo (#2672 )	2024-01-30 13:40:37 -08:00
README.md	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
utils.py	TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )	2023-11-15 22:50:41 -08:00

The files in this folder are ported from Megatron-LM. We only keep the codes that are used in inference.