vllm/model_executor at eeab52a4ff02e15f970880a689df2861ad173770 - vllm

History

Antoni Baum c33afd89f5 Fix lint (#3388 )		2024-03-13 13:56:49 -07:00
..
layers	Fix lint (#3388 )	2024-03-13 13:56:49 -07:00
models	Support Mistral Model Inference with transformers-neuronx (#3153 )	2024-03-11 13:19:51 -07:00
parallel_utils	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
__init__.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
guided_decoding.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
guided_logits_processors.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
input_metadata.py	Support FP8-E5M2 KV Cache (#2279 )	2024-01-28 16:43:54 -08:00
model_loader.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
neuron_model_loader.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
sampling_metadata.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
utils.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
weight_utils.py	Move model filelocks from `/tmp/` to `~/.cache/vllm/locks/` dir (#3241 )	2024-03-08 13:33:10 -08:00