vllm/vllm/model_executor/layers
2024-01-26 23:53:17 -08:00
..
quantization AWQ: Up to 2.66x higher throughput (#2566) 2024-01-26 23:53:17 -08:00
triton_kernel [Experimental] Prefix Caching Support (#1669) 2024-01-17 16:32:10 -08:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
activation.py Add PyTorch-native implementation of custom layers (#1898) 2023-12-02 21:18:40 -08:00
attention.py [Experimental] Prefix Caching Support (#1669) 2024-01-17 16:32:10 -08:00
layernorm.py Add PyTorch-native implementation of custom layers (#1898) 2023-12-02 21:18:40 -08:00
linear.py fix weigit loading for GQA with TP (#2379) 2024-01-15 15:43:59 -08:00
rejection_sampler.py [Speculative decoding 1/9] Optimized rejection sampler (#2336) 2024-01-09 15:38:41 -08:00
rotary_embedding.py Add PyTorch-native implementation of custom layers (#1898) 2023-12-02 21:18:40 -08:00
sampler.py [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00
vocab_parallel_embedding.py [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00