vllm/model_executor at db3bf7c991cd1a0297d1a8ba501e59cfa226c337 - vllm

History

Michael Goin 2ee45281a5 Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165 )		2024-09-05 11:09:46 -04:00
..
guided_decoding	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 )	2024-09-04 13:18:13 -07:00
layers	Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165 )	2024-09-05 11:09:46 -04:00
model_loader	[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062 )	2024-09-04 16:33:43 -07:00
models	[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029 )	2024-09-05 12:48:10 +00:00
__init__.py	[Performance] Optimize e2e overheads: Reduce python allocations (#7162 )	2024-08-08 21:34:28 -07:00
custom_op.py	[XPU] fallback to native implementation for xpu custom op (#7670 )	2024-08-20 00:26:09 -07:00
parameter.py	[Misc] Update `GPTQ` to use `vLLMParameters` (#7976 )	2024-09-03 17:21:44 -04:00
pooling_metadata.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
sampling_metadata.py	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
utils.py	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00