vllm/vllm/model_executor
2024-08-25 11:51:20 +00:00
..
guided_decoding [misc][core] lazy import outlines (#7831) 2024-08-24 00:51:38 -07:00
layers [Misc] Update marlin to use vLLMParameters (#7803) 2024-08-23 14:30:52 -04:00
model_loader Fix ShardedStateLoader for vllm fp8 quantization (#7708) 2024-08-22 08:25:04 -04:00
models [Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783) 2024-08-25 11:51:20 +00:00
__init__.py [Performance] Optimize e2e overheads: Reduce python allocations (#7162) 2024-08-08 21:34:28 -07:00
custom_op.py [XPU] fallback to native implementation for xpu custom op (#7670) 2024-08-20 00:26:09 -07:00
parameter.py [Misc] update fp8 to use vLLMParameter (#7437) 2024-08-22 08:36:18 -04:00
pooling_metadata.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
sampling_metadata.py [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
utils.py [Hardware][Neuron] Refactor neuron support (#3471) 2024-03-22 01:22:17 +00:00