vllm/layers at 26a68d5d7e7dd47c7d8538a326493c8a171f5016 - vllm

History

ElizaWszola d081da0064 [Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>		2024-09-28 18:19:40 -07:00
..
fused_moe	[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 )	2024-09-28 18:19:40 -07:00
mamba	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
quantization	[CI/Build] Update models tests & examples (#8874 )	2024-09-28 09:54:35 -07:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	[Hardware][intel GPU] bump up ipex version to 2.3 (#8365 )	2024-09-13 16:54:34 -07:00
layernorm.py	[Hardware][intel GPU] bump up ipex version to 2.3 (#8365 )	2024-09-13 16:54:34 -07:00
linear.py	[Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434 )	2024-09-17 08:09:12 -07:00
logits_processor.py	[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410 )	2024-08-13 05:33:41 +00:00
pooler.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
rejection_sampler.py	[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701 )	2024-09-22 12:34:14 -07:00
resampler.py	[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029 )	2024-09-05 12:48:10 +00:00
rotary_embedding.py	[Misc] Use RoPE cache for MRoPE (#8396 )	2024-09-11 23:13:14 -07:00
sampler.py	[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047 )	2024-09-24 17:29:56 -07:00
spec_decode_base_sampler.py	[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701 )	2024-09-22 12:34:14 -07:00
typical_acceptance_sampler.py	Fix typical acceptance sampler with correct recovered token ids (#8562 )	2024-09-23 12:32:27 -07:00
vocab_parallel_embedding.py	[Misc] Update `GPTQ` to use `vLLMParameters` (#7976 )	2024-09-03 17:21:44 -04:00