vllm/layers at beb89f68b448a43ac112b48e3834f80a2df626cb - vllm

History

Casper beb89f68b4 AWQ: Up to 2.66x higher throughput (#2566 )		2024-01-26 23:53:17 -08:00
..
quantization	AWQ: Up to 2.66x higher throughput (#2566 )	2024-01-26 23:53:17 -08:00
triton_kernel	[Experimental] Prefix Caching Support (#1669 )	2024-01-17 16:32:10 -08:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	Add PyTorch-native implementation of custom layers (#1898 )	2023-12-02 21:18:40 -08:00
attention.py	[Experimental] Prefix Caching Support (#1669 )	2024-01-17 16:32:10 -08:00
layernorm.py	Add PyTorch-native implementation of custom layers (#1898 )	2023-12-02 21:18:40 -08:00
linear.py	fix weigit loading for GQA with TP (#2379 )	2024-01-15 15:43:59 -08:00
rejection_sampler.py	[Speculative decoding 1/9] Optimized rejection sampler (#2336 )	2024-01-09 15:38:41 -08:00
rotary_embedding.py	Add PyTorch-native implementation of custom layers (#1898 )	2023-12-02 21:18:40 -08:00
sampler.py	[Experimental] Add multi-LoRA support (#1804 )	2024-01-23 15:26:37 -08:00
vocab_parallel_embedding.py	[Experimental] Add multi-LoRA support (#1804 )	2024-01-23 15:26:37 -08:00