Go to file

Woosuk Kwon 64e0e38314 Add cache watermark to avoid frequent cache eviction (#11 )		2023-03-29 16:38:48 -07:00
cacheflow	Add cache watermark to avoid frequent cache eviction (#11 )	2023-03-29 16:38:48 -07:00
csrc	Add miscellaneous updates (#8 )	2023-03-13 13:48:38 -07:00
playground	FastAPI-based working frontend (#10 )	2023-03-29 14:48:56 +08:00
tests/kernels	Use FlashAttention for `multi_query_kv_attention` (#4 )	2023-03-01 21:13:08 -08:00
.gitignore	Add gitignore	2023-02-16 07:47:21 +00:00
README.md	FastAPI-based working frontend (#10 )	2023-03-29 14:48:56 +08:00
setup.py	Implement `single_query_cached_kv_attention` kernel (#3 )	2023-03-01 15:02:19 -08:00
simple_server.py	FastAPI-based working frontend (#10 )	2023-03-29 14:48:56 +08:00

CacheFlow

Installation

pip install psutil numpy torch transformers
pip install flash-attn # This may take up to 10 mins.
pip install -e .

ray start --head
python simple_server.py

The detailed arguments for simple_server.py can be found by:

python simple_server.py --help

Install the following additional dependencies:

pip install fastapi uvicorn

To start the server:

ray start --head
python -m cacheflow.http_frontend.fastapi_frontend

To test the server:

python -m cacheflow.http_frontend.test_cli_client

Install the following additional dependencies:

pip install gradio

Start the server:

python -m cacheflow.http_frontend.fastapi_frontend
# At another terminal
python -m cacheflow.http_frontend.gradio_webserver