vllm/vllm/model_executor/layers
2024-11-01 12:15:05 -07:00
..
fused_moe [torch.compile] directly register custom op (#9896) 2024-10-31 21:56:09 -07:00
mamba [Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189) 2024-10-16 12:12:43 -04:00
quantization [Bugfix/Core] Flashinfer k_scale and v_scale (#9861) 2024-11-01 12:15:05 -07:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
activation.py [Kernel] add kernel for FATReLU (#9610) 2024-10-24 16:18:27 +08:00
layernorm.py [Model] FalconMamba Support (#9325) 2024-10-21 12:50:16 -04:00
linear.py [Hardware][CPU] Support AWQ for CPU backend (#7515) 2024-10-09 10:28:08 -06:00
logits_processor.py [V1] Implement vLLM V1 [1/N] (#9289) 2024-10-22 01:24:07 -07:00
pooler.py [Model] Support math-shepherd-mistral-7b-prm model (#9697) 2024-10-30 09:33:42 -07:00
rejection_sampler.py [SpecDec][Misc] Cleanup, remove bonus token logic. (#8701) 2024-09-22 12:34:14 -07:00
resampler.py [MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029) 2024-09-05 12:48:10 +00:00
rotary_embedding.py [torch.compile] Fine-grained CustomOp enabling mechanism (#9300) 2024-10-17 18:36:37 +00:00
sampler.py [CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267) 2024-10-16 22:55:59 +00:00
spec_decode_base_sampler.py [SpecDec][Misc] Cleanup, remove bonus token logic. (#8701) 2024-09-22 12:34:14 -07:00
typical_acceptance_sampler.py Fix typical acceptance sampler with correct recovered token ids (#8562) 2024-09-23 12:32:27 -07:00
vocab_parallel_embedding.py [Bugfix] Fix lm_head weights tying with lora for llama (#9227) 2024-10-10 21:11:56 +08:00