vllm/model_executor at e165528778d1bfeb8e9bd8a33d6cd64fb6c78e4e - vllm

History

Michael Goin 21313e09e3 [Bugfix] Fix default weight loading for scalars (#7534 )		2024-08-15 13:10:22 -07:00
..
guided_decoding	Support for guided decoding for offline LLM (#6878 )	2024-08-04 03:12:09 +00:00
layers	[Misc] Revert `compressed-tensors` code reuse (#7521 )	2024-08-14 15:07:37 -07:00
model_loader	[Bugfix] Fix default weight loading for scalars (#7534 )	2024-08-15 13:10:22 -07:00
models	[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126 )	2024-08-14 17:55:42 +00:00
__init__.py	[Performance] Optimize e2e overheads: Reduce python allocations (#7162 )	2024-08-08 21:34:28 -07:00
custom_op.py	[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102 )	2024-08-13 00:16:42 -07:00
parameter.py	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
pooling_metadata.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
sampling_metadata.py	[Performance] Optimize e2e overheads: Reduce python allocations (#7162 )	2024-08-08 21:34:28 -07:00
utils.py	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00