This website requires JavaScript.
Explore
Help
Register
Sign In
squall
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
1
Packages
Projects
Releases
Wiki
Activity
aa48e502fb
vllm
/
vllm
/
attention
History
Michael Goin
d59eb98489
[Model][Phi3-Small] Remove scipy from blocksparse_attention (
#6343
)
2024-07-12 10:47:17 +08:00
..
backends
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (
#4888
)
2024-07-08 17:12:15 +00:00
ops
[Model][Phi3-Small] Remove scipy from blocksparse_attention (
#6343
)
2024-07-12 10:47:17 +08:00
__init__.py
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (
#4681
)
2024-05-15 14:00:10 +09:00
layer.py
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (
#4888
)
2024-07-08 17:12:15 +00:00
selector.py
[Misc] Remove flashinfer warning, add flashinfer tests to CI (
#6351
)
2024-07-12 01:32:06 +00:00