vllm/model_executor at 2c08ff23c07f2f8d51da8e1783c5346dccc1fd12 - vllm

History

Seonghyeon bfdcfa6a05 Support starcoder2 architecture (#3089 )		2024-02-29 00:51:48 -08:00
..
layers	Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )	2024-02-28 21:52:23 -08:00
models	Support starcoder2 architecture (#3089 )	2024-02-29 00:51:48 -08:00
parallel_utils	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
__init__.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
input_metadata.py	Support FP8-E5M2 KV Cache (#2279 )	2024-01-28 16:43:54 -08:00
model_loader.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
neuron_model_loader.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
sampling_metadata.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
utils.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
weight_utils.py	Use revision when downloading the quantization config file (#2697 )	2024-02-01 15:41:58 -08:00