| .. |
|
fused_moe
|
[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)
|
2024-09-28 18:19:40 -07:00 |
|
mamba
|
[Kernel] Fullgraph and opcheck tests (#8479)
|
2024-09-25 08:35:52 -06:00 |
|
quantization
|
[CI/Build] Update models tests & examples (#8874)
|
2024-09-28 09:54:35 -07:00 |
|
__init__.py
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
activation.py
|
[Hardware][intel GPU] bump up ipex version to 2.3 (#8365)
|
2024-09-13 16:54:34 -07:00 |
|
layernorm.py
|
[Hardware][intel GPU] bump up ipex version to 2.3 (#8365)
|
2024-09-13 16:54:34 -07:00 |
|
linear.py
|
[Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434)
|
2024-09-17 08:09:12 -07:00 |
|
logits_processor.py
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
pooler.py
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
rejection_sampler.py
|
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
|
2024-09-22 12:34:14 -07:00 |
|
resampler.py
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
|
2024-09-05 12:48:10 +00:00 |
|
rotary_embedding.py
|
[Misc] Use RoPE cache for MRoPE (#8396)
|
2024-09-11 23:13:14 -07:00 |
|
sampler.py
|
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047)
|
2024-09-24 17:29:56 -07:00 |
|
spec_decode_base_sampler.py
|
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
|
2024-09-22 12:34:14 -07:00 |
|
typical_acceptance_sampler.py
|
Fix typical acceptance sampler with correct recovered token ids (#8562)
|
2024-09-23 12:32:27 -07:00 |
|
vocab_parallel_embedding.py
|
[Misc] Update GPTQ to use vLLMParameters (#7976)
|
2024-09-03 17:21:44 -04:00 |