vllm/attention at aa48e502fba074a3c3afeeba0267d0f9e9f205db - vllm

History

Michael Goin d59eb98489 [Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )		2024-07-12 10:47:17 +08:00
..
backends	[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888 )	2024-07-08 17:12:15 +00:00
ops	[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )	2024-07-12 10:47:17 +08:00
__init__.py	[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 )	2024-05-15 14:00:10 +09:00
layer.py	[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888 )	2024-07-08 17:12:15 +00:00
selector.py	[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351 )	2024-07-12 01:32:06 +00:00