vllm/vllm/attention
2024-07-05 10:41:01 -07:00
..
backends [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051) 2024-07-04 16:35:51 -07:00
ops [Bugfix] Add verbose error if scipy is missing for blocksparse attention (#5695) 2024-07-05 10:41:01 -07:00
__init__.py [Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) 2024-05-15 14:00:10 +09:00
layer.py [Bugfix] Only add Attention.kv_scale if kv cache quantization is enabled (#5936) 2024-06-28 21:12:40 +00:00
selector.py [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051) 2024-07-04 16:35:51 -07:00