vllm/layers at fb3db616881d7225c4bbe64bb709ea6bcd6157f7 - vllm

History

Sage Moore 7e0861bd0b [CI/Build] Update PyTorch to 2.4.0 (#6951 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>		2024-08-01 11:11:24 -07:00
..
fused_moe	[Bugfix] Allow vllm to still work if triton is not installed. (#6786 )	2024-07-29 14:51:27 -07:00
ops	[CI/Build] Update PyTorch to 2.4.0 (#6951 )	2024-08-01 11:11:24 -07:00
quantization	Support W4A8 quantization for vllm (#5218 )	2024-07-31 07:55:21 -06:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	[Doc] Add Nemotron to supported model docs (#6843 )	2024-07-26 17:32:44 -04:00
layernorm.py	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
linear.py	Fix ReplicatedLinear weight loading (#6793 )	2024-07-25 19:24:58 -07:00
logits_processor.py	[TPU] Support collective communications in XLA devices (#6813 )	2024-07-27 01:45:57 +00:00
pooler.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
rejection_sampler.py	[BugFix] Fix use of per-request seed with pipeline parallel (#6698 )	2024-07-30 10:40:08 -07:00
rotary_embedding.py	[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611 )	2024-07-26 14:33:42 -04:00
sampler.py	[Bugfix] Allow vllm to still work if triton is not installed. (#6786 )	2024-07-29 14:51:27 -07:00
spec_decode_base_sampler.py	[BugFix] Fix use of per-request seed with pipeline parallel (#6698 )	2024-07-30 10:40:08 -07:00
typical_acceptance_sampler.py	[Bugfix] Make spec. decode respect per-request seed. (#6034 )	2024-07-18 19:22:08 -07:00
vocab_parallel_embedding.py	[ Misc ] `fbgemm` checkpoints (#6559 )	2024-07-20 09:36:57 -07:00