vllm/vllm/model_executor
2024-09-05 11:09:46 -04:00
..
guided_decoding [Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649) 2024-09-04 13:18:13 -07:00
layers Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165) 2024-09-05 11:09:46 -04:00
model_loader [Neuron] Adding support for adding/ overriding neuron configuration a… (#8062) 2024-09-04 16:33:43 -07:00
models [MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029) 2024-09-05 12:48:10 +00:00
__init__.py [Performance] Optimize e2e overheads: Reduce python allocations (#7162) 2024-08-08 21:34:28 -07:00
custom_op.py [XPU] fallback to native implementation for xpu custom op (#7670) 2024-08-20 00:26:09 -07:00
parameter.py [Misc] Update GPTQ to use vLLMParameters (#7976) 2024-09-03 17:21:44 -04:00
pooling_metadata.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
sampling_metadata.py [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
utils.py [Hardware][Neuron] Refactor neuron support (#3471) 2024-03-22 01:22:17 +00:00