vllm/vllm/model_executor
2024-08-15 13:10:22 -07:00
..
guided_decoding Support for guided decoding for offline LLM (#6878) 2024-08-04 03:12:09 +00:00
layers [Misc] Revert compressed-tensors code reuse (#7521) 2024-08-14 15:07:37 -07:00
model_loader [Bugfix] Fix default weight loading for scalars (#7534) 2024-08-15 13:10:22 -07:00
models [VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126) 2024-08-14 17:55:42 +00:00
__init__.py [Performance] Optimize e2e overheads: Reduce python allocations (#7162) 2024-08-08 21:34:28 -07:00
custom_op.py [hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102) 2024-08-13 00:16:42 -07:00
parameter.py [Misc] Update gptq_marlin to use new vLLMParameters (#7281) 2024-08-13 14:30:11 -04:00
pooling_metadata.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
sampling_metadata.py [Performance] Optimize e2e overheads: Reduce python allocations (#7162) 2024-08-08 21:34:28 -07:00
utils.py [Hardware][Neuron] Refactor neuron support (#3471) 2024-03-22 01:22:17 +00:00