Go to file
2023-03-11 23:23:14 -08:00
cacheflow Add memory analyzer & utomatically configure KV cache size (#6) 2023-03-11 23:23:14 -08:00
csrc Support beam search & parallel generation (#7) 2023-03-10 09:58:21 -08:00
tests/kernels Use FlashAttention for multi_query_kv_attention (#4) 2023-03-01 21:13:08 -08:00
.gitignore Add gitignore 2023-02-16 07:47:21 +00:00
README.md Add memory analyzer & utomatically configure KV cache size (#6) 2023-03-11 23:23:14 -08:00
server.py Add memory analyzer & utomatically configure KV cache size (#6) 2023-03-11 23:23:14 -08:00
setup.py Implement single_query_cached_kv_attention kernel (#3) 2023-03-01 15:02:19 -08:00

CacheFlow

Installation

pip install psutil numpy torch transformers
pip install flash-attn # This may take up to 10 mins.
pip install -e .

Run

python server.py