vllm/requirements.txt

ninja  # For faster builds.
psutil
ray >= 2.9
sentencepiece  # Required for LLaMA tokenizer.
numpy
torch == 2.1.2
transformers >= 4.38.0  # Required for Gemma.
xformers == 0.0.23.post1  # Required for CUDA 12.1.
fastapi
uvicorn[standard]
pydantic >= 2.0  # Required for OpenAI server.
prometheus_client >= 0.18.0
pynvml == 11.5.0
triton >= 2.1.0
cupy-cuda12x == 12.1.0  # Required for CUDA graphs. CUDA 11.8 users should install cupy-cuda11x instead.
Specify python package dependencies in requirements.txt (#78) 2023-05-08 07:30:43 +08:00			`ninja # For faster builds.`
			`psutil`
Update Ray version requirements (#2636) 2024-01-29 06:27:22 +08:00			`ray >= 2.9`
Specify python package dependencies in requirements.txt (#78) 2023-05-08 07:30:43 +08:00			`sentencepiece # Required for LLaMA tokenizer.`
			`numpy`
Pin PyTorch & xformers versions (#2155) 2023-12-17 17:46:54 +08:00			`torch == 2.1.2`
Upgrade transformers to v4.38.0 (#2965) 2024-02-22 01:38:03 +08:00			`transformers >= 4.38.0 # Required for Gemma.`
[Minor] Fix xformers version (#2158) 2023-12-17 18:28:02 +08:00			`xformers == 0.0.23.post1 # Required for CUDA 12.1.`
Specify python package dependencies in requirements.txt (#78) 2023-05-08 07:30:43 +08:00			`fastapi`
Use standard extras for uvicorn (#1166) 2023-09-28 08:41:36 +08:00			`uvicorn[standard]`
migrate pydantic from v1 to v2 (#2531) 2024-01-22 08:05:56 +08:00			`pydantic >= 2.0 # Required for OpenAI server.`
Restrict prometheus_client >= 0.18.0 to prevent errors when importing pkgs (#3070) 2024-02-28 13:38:26 +08:00			`prometheus_client >= 0.18.0`
Implement custom all reduce kernels (#2192) 2024-01-28 04:46:35 +08:00			`pynvml == 11.5.0`
Require triton >= 2.1.0 (#2746) Co-authored-by: yangrui1 <yangrui@lanjingren.com> 2024-02-05 15:07:36 +08:00			`triton >= 2.1.0`
Fix docker python version (#2845) 2024-02-15 02:17:57 +08:00			`cupy-cuda12x == 12.1.0 # Required for CUDA graphs. CUDA 11.8 users should install cupy-cuda11x instead.`