vllm/vllm/model_executor/layers
Sage Moore 7e0861bd0b
[CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-01 11:11:24 -07:00
..
fused_moe [Bugfix] Allow vllm to still work if triton is not installed. (#6786) 2024-07-29 14:51:27 -07:00
ops [CI/Build] Update PyTorch to 2.4.0 (#6951) 2024-08-01 11:11:24 -07:00
quantization Support W4A8 quantization for vllm (#5218) 2024-07-31 07:55:21 -06:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
activation.py [Doc] Add Nemotron to supported model docs (#6843) 2024-07-26 17:32:44 -04:00
layernorm.py [Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
linear.py Fix ReplicatedLinear weight loading (#6793) 2024-07-25 19:24:58 -07:00
logits_processor.py [TPU] Support collective communications in XLA devices (#6813) 2024-07-27 01:45:57 +00:00
pooler.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
rejection_sampler.py [BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00
rotary_embedding.py [Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) 2024-07-26 14:33:42 -04:00
sampler.py [Bugfix] Allow vllm to still work if triton is not installed. (#6786) 2024-07-29 14:51:27 -07:00
spec_decode_base_sampler.py [BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00
typical_acceptance_sampler.py [Bugfix] Make spec. decode respect per-request seed. (#6034) 2024-07-18 19:22:08 -07:00
vocab_parallel_embedding.py [ Misc ] fbgemm checkpoints (#6559) 2024-07-20 09:36:57 -07:00