vllm/vllm/model_executor/layers
2024-02-28 21:52:23 -08:00
..
fused_moe [Minor] Fix type annotation in fused moe (#3045) 2024-02-26 19:44:29 -08:00
quantization Add Support for 2/3/8-bit GPTQ Quantization Models (#2330) 2024-02-28 21:52:23 -08:00
triton_kernel Enable GQA support in the prefix prefill kernels (#3007) 2024-02-27 01:14:31 -08:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
activation.py Optimize GeGLU layer in Gemma (#2975) 2024-02-21 20:17:52 -08:00
attention.py Enable GQA support in the prefix prefill kernels (#3007) 2024-02-27 01:14:31 -08:00
layernorm.py Revert "Refactor llama family models (#2637)" (#2851) 2024-02-13 09:24:59 -08:00
linear.py Remove hardcoded device="cuda" to support more devices (#2503) 2024-02-01 15:46:39 -08:00
rejection_sampler.py [Speculative decoding 1/9] Optimized rejection sampler (#2336) 2024-01-09 15:38:41 -08:00
rotary_embedding.py [Fix] Fissertion on YaRN model len (#2984) 2024-02-23 12:57:48 -08:00
sampler.py [Neuron] Support inference with transformers-neuronx (#2569) 2024-02-28 09:34:34 -08:00
vocab_parallel_embedding.py Remove hardcoded device="cuda" to support more devices (#2503) 2024-02-01 15:46:39 -08:00