| cacheflow | ||
| csrc | ||
| playground | ||
| tests/kernels | ||
| .gitignore | ||
| README.md | ||
| setup.py | ||
| simple_server.py | ||
CacheFlow
Installation
pip install psutil numpy torch transformers
pip install flash-attn # This may take up to 10 mins.
pip install -e .
Test simple server
ray start --head
python simple_server.py
The detailed arguments for simple_server.py can be found by:
python simple_server.py --help
FastAPI server
Install the following additional dependencies:
pip install fastapi uvicorn
To start the server:
ray start --head
python -m cacheflow.http_frontend.fastapi_frontend
To test the server:
python -m cacheflow.http_frontend.test_cli_client
Gradio web server
Install the following additional dependencies:
pip install gradio
Start the server:
python -m cacheflow.http_frontend.fastapi_frontend
# At another terminal
python -m cacheflow.http_frontend.gradio_webserver