vllm/vllm/model_executor/layers
Robert Shaw 2cd402e169
[ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-28 18:43:49 +00:00
..
fused_moe Unmark fused_moe config json file as executable (#5960) 2024-06-28 06:36:12 -07:00
ops [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) 2024-04-22 21:32:44 -07:00
quantization [ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921) 2024-06-28 18:43:49 +00:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
activation.py [Kernel][CPU] Add Quick gelu to CPU (#5717) 2024-06-21 06:39:40 +00:00
layernorm.py [Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
linear.py [ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921) 2024-06-28 18:43:49 +00:00
logits_processor.py [Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
pooler.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
rejection_sampler.py [Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131) 2024-06-17 21:29:09 -05:00
rotary_embedding.py [Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
sampler.py [Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
spec_decode_base_sampler.py [Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131) 2024-06-17 21:29:09 -05:00
typical_acceptance_sampler.py [Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131) 2024-06-17 21:29:09 -05:00
vocab_parallel_embedding.py [Bugfix] Fix embedding to support 2D inputs (#5829) 2024-06-26 00:15:22 -07:00