vllm/layers at 80ca1e6a3a28a0373dc00c5b4fe956c16de952fa - vllm

History

sroy745 80ca1e6a3a [Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348 )		2024-07-01 00:33:05 -07:00
..
fused_moe	[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k (#5939 )	2024-06-29 21:04:20 +08:00
ops	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
quantization	[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007 )	2024-06-30 20:07:34 -07:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	[Kernel][CPU] Add Quick `gelu` to CPU (#5717 )	2024-06-21 06:39:40 +00:00
layernorm.py	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
linear.py	[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify Weight Loading) (#5940 )	2024-06-30 23:06:27 +00:00
logits_processor.py	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
pooler.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
rejection_sampler.py	[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348 )	2024-07-01 00:33:05 -07:00
rotary_embedding.py	Support Deepseek-V2 (#4650 )	2024-06-28 13:24:57 -07:00
sampler.py	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
spec_decode_base_sampler.py	[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348 )	2024-07-01 00:33:05 -07:00
typical_acceptance_sampler.py	[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348 )	2024-07-01 00:33:05 -07:00
vocab_parallel_embedding.py	[Bugfix] Fix embedding to support 2D inputs (#5829 )	2024-06-26 00:15:22 -07:00