From d1a3b52f17b914c93bf740654387b566a7330687 Mon Sep 17 00:00:00 2001 From: Tri Dao Date: Mon, 17 Jul 2023 23:17:47 -0700 Subject: [PATCH] Add instruction about limiting number of ninja jobs --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index e4151a6..536dbe7 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,14 @@ Alternatively you can compile from source: python setup.py install ``` +If your machine has less than 96GB of RAM and lots of CPU cores, `ninja` might +run too many parallel compilation jobs that could exhaust the amount of RAM. To +limit the number of parallel compilation jobs, you can set the environment +variable `MAX_JOBS`: +``` +MAX_JOBS=4 pip install flash-attn --no-build-isolation +``` + Interface: `src/flash_attention_interface.py` FlashAttention-2 currently supports: