| .. |
|
fused_moe
|
[torch.compile] directly register custom op (#9896)
|
2024-10-31 21:56:09 -07:00 |
|
mamba
|
[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189)
|
2024-10-16 12:12:43 -04:00 |
|
quantization
|
[Bugfix/Core] Flashinfer k_scale and v_scale (#9861)
|
2024-11-01 12:15:05 -07:00 |
|
__init__.py
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
activation.py
|
[Kernel] add kernel for FATReLU (#9610)
|
2024-10-24 16:18:27 +08:00 |
|
layernorm.py
|
[Model] FalconMamba Support (#9325)
|
2024-10-21 12:50:16 -04:00 |
|
linear.py
|
[Hardware][CPU] Support AWQ for CPU backend (#7515)
|
2024-10-09 10:28:08 -06:00 |
|
logits_processor.py
|
[V1] Implement vLLM V1 [1/N] (#9289)
|
2024-10-22 01:24:07 -07:00 |
|
pooler.py
|
[Model] Support math-shepherd-mistral-7b-prm model (#9697)
|
2024-10-30 09:33:42 -07:00 |
|
rejection_sampler.py
|
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
|
2024-09-22 12:34:14 -07:00 |
|
resampler.py
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
|
2024-09-05 12:48:10 +00:00 |
|
rotary_embedding.py
|
[torch.compile] Fine-grained CustomOp enabling mechanism (#9300)
|
2024-10-17 18:36:37 +00:00 |
|
sampler.py
|
[CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267)
|
2024-10-16 22:55:59 +00:00 |
|
spec_decode_base_sampler.py
|
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
|
2024-09-22 12:34:14 -07:00 |
|
typical_acceptance_sampler.py
|
Fix typical acceptance sampler with correct recovered token ids (#8562)
|
2024-09-23 12:32:27 -07:00 |
|
vocab_parallel_embedding.py
|
[Bugfix] Fix lm_head weights tying with lora for llama (#9227)
|
2024-10-10 21:11:56 +08:00 |