vllm/layers at 2c08ff23c07f2f8d51da8e1783c5346dccc1fd12 - vllm

History

CHU Tianxiang 01a5d18a53 Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )		2024-02-28 21:52:23 -08:00
..
fused_moe	[Minor] Fix type annotation in fused moe (#3045 )	2024-02-26 19:44:29 -08:00
quantization	Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )	2024-02-28 21:52:23 -08:00
triton_kernel	Enable GQA support in the prefix prefill kernels (#3007 )	2024-02-27 01:14:31 -08:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	Optimize GeGLU layer in Gemma (#2975 )	2024-02-21 20:17:52 -08:00
attention.py	Enable GQA support in the prefix prefill kernels (#3007 )	2024-02-27 01:14:31 -08:00
layernorm.py	Revert "Refactor llama family models (#2637 )" (#2851 )	2024-02-13 09:24:59 -08:00
linear.py	Remove hardcoded `device="cuda"` to support more devices (#2503 )	2024-02-01 15:46:39 -08:00
rejection_sampler.py	[Speculative decoding 1/9] Optimized rejection sampler (#2336 )	2024-01-09 15:38:41 -08:00
rotary_embedding.py	[Fix] Fissertion on YaRN model len (#2984 )	2024-02-23 12:57:48 -08:00
sampler.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
vocab_parallel_embedding.py	Remove hardcoded `device="cuda"` to support more devices (#2503 )	2024-02-01 15:46:39 -08:00