vllm/layers at 2cd402e1692417b7645e4ece11bc2ab91072f47c - vllm

History

Robert Shaw 2cd402e169 [ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>		2024-06-28 18:43:49 +00:00
..
fused_moe	Unmark fused_moe config json file as executable (#5960 )	2024-06-28 06:36:12 -07:00
ops	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
quantization	[ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921 )	2024-06-28 18:43:49 +00:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	[Kernel][CPU] Add Quick `gelu` to CPU (#5717 )	2024-06-21 06:39:40 +00:00
layernorm.py	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
linear.py	[ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921 )	2024-06-28 18:43:49 +00:00
logits_processor.py	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
pooler.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
rejection_sampler.py	[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131 )	2024-06-17 21:29:09 -05:00
rotary_embedding.py	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
sampler.py	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
spec_decode_base_sampler.py	[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131 )	2024-06-17 21:29:09 -05:00
typical_acceptance_sampler.py	[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131 )	2024-06-17 21:29:09 -05:00
vocab_parallel_embedding.py	[Bugfix] Fix embedding to support 2D inputs (#5829 )	2024-06-26 00:15:22 -07:00