Tri Dao
7a3bd55f1a
[Gen] Fix decode function not using top_p during iterative decoding
2023-08-26 15:14:41 -07:00
Tri Dao
847abe653c
[Gen] Refactor decode function a bit
2023-08-26 14:47:25 -07:00
Tri Dao
371e20658c
[GPT] Test generation when passing in multiple tokens
2023-08-26 13:56:41 -07:00
Tri Dao
c000c3a2c0
[GPT] Move more tests to test_gpt.py
2023-08-26 13:00:40 -07:00
Tri Dao
a2974e850a
Change causal for CrossAttention in mha.py to align to bottom right
2023-08-26 12:57:33 -07:00
Tri Dao
9b713872ea
[GPT] Move GPT and OPT generation tests to test_{gpt,opt}.py
2023-08-26 12:55:02 -07:00
Tri Dao
73bd3f3bbb
Move pyproject.toml to flash-attn and tests dir to avoid PEP 517
2023-08-25 15:05:28 -07:00
Aman Gupta Karmani
b4b6e90334
add benchmark for xformers fa2 wrapper ( #492 )
2023-08-25 14:10:05 -07:00
Tri Dao
45ba93cd96
Add newlines to README
2023-08-24 23:54:13 -07:00
Tri Dao
9e5e8bc91e
Change causal mask to be aligned to bottom-right instead of top-left
2023-08-24 23:41:07 -07:00
BoxiangW
e07aa036db
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. ( #436 )
2023-08-24 16:42:34 -07:00
Aman Gupta Karmani
e0b09891c6
add llama support to GPTPreTrainedModel.from_pretrained ( #479 )
2023-08-24 16:31:16 -07:00
Tri Dao
6711b3bc40
Bump version to 2.0.9
2023-08-22 00:21:14 -07:00
Tri Dao
ef6d8c75d9
[GPT] Fix loading weights from HF hub
2023-08-21 22:56:02 -07:00
GAOXinyu
a8c35b4f57
FEAT: add codes which supporting for baichuan-inc/Baichuan-7B ( #425 )
2023-08-21 11:05:06 -07:00
Xuechen Li
25d6b1dbcb
handle uneven heads across ranks when combining state_dicts; resolves #467 ( #468 )
...
* q
* add comment.
2023-08-20 14:57:34 -07:00
Tri Dao
d431f16751
Import torch before flash_attn_2_cuda
2023-08-19 21:07:33 -07:00
Tri Dao
0e8c46ae08
Run isort and black on test files
2023-08-18 20:59:35 -07:00
Xuechen Li
7fcd3e6a04
map custom model state_dict back to huggingface format ( #465 )
...
* fix name.
* set inv function.
* add map back function.
* handle gqa.
* add type annotation to avoid confusion.
* fix docstr.
* test inverse remap logic.
2023-08-18 20:51:39 -07:00
Tri Dao
f1a73d0740
Run isort and black on python files
2023-08-18 14:22:11 -07:00
Tri Dao
cbb4cf5f46
Don't need to set TORCH_CUDA_ARCH_LIST in setup.py
2023-08-18 14:18:54 -07:00
Xuechen Li
bb4cded17b
support when num_heads is not divisible by world_size; resolves #459 ( #461 )
...
* uneql rank.
* trim.
* enable passing in number of heads for each rank.
* simplify.
* simplify.
* cleanup.
* fix col parallel.
* fix bug with row parallel.
* fit out proj.
* refac.
* fix sharding logic.
* refac sharding.
* refac.
* support multiple of.
* make fn reuseable.
* fix bug in dimensions.
* scaffold.
* test uneven heads.
* fix test by adding barrier.
* refac.
* reuse code.
* clean up.
2023-08-18 14:10:35 -07:00
Tri Dao
ada4710d70
[ViT] Run black on vit.py
2023-08-17 17:45:09 -07:00
Tri Dao
a81900d4c1
[ViT] Minor fix so it runs
2023-08-17 17:25:34 -07:00
Tri Dao
4b661a569d
[GPT] Run black on gpt.py
2023-08-16 23:47:50 -07:00
Tri Dao
bec5b3d374
[MHA] Run black on mha.py
2023-08-16 23:47:13 -07:00
Tri Dao
cb0daccc41
[FusedDense] Allow Row/ColumnParallelLinear to have uneven split
2023-08-16 23:43:35 -07:00
Tri Dao
bcfa7c9751
[FusedDense] Run black on fused_dense.py
2023-08-16 23:41:36 -07:00
Tri Dao
2286d7cea7
Bump to v2.0.8
2023-08-16 15:13:12 -07:00
Tri Dao
c65b5106ac
Fix Bwd NaN for varlen when seqlen_q >> seqlen_k and causal
2023-08-16 15:12:36 -07:00
Xuechen Li
0f7853c6a1
enable loading hf llama checkpoints for training ( #446 )
...
* prelim.
* add hf convertion fn.
* mlp.
* change name.
* fix bug.
* inverse permute.
* change comment.
* revert style changes.
* fix.
* add doc.
* revert.
* enable load safe.
* fix safe load.
* fix import.
* fix typing-related lints.
* fix ckpt loading logic.
* make single gpu work.
* test with parallel.
* ckpt format.
* enable pretrained state dict.
* remove unused imports.
* remove unused.
* mark idea related.
2023-08-15 08:33:15 -07:00
Tri Dao
c60851a825
Bump to v2.0.7
2023-08-14 14:55:35 -07:00
Aman Gupta Karmani
aab603af4f
fix binary wheel installation when nvcc is not available ( #448 )
2023-08-14 14:54:26 -07:00
Tri Dao
f8dccfc90a
[CI] Fix MATRIX_CUDA_VERSION check
2023-08-14 10:27:26 -07:00
Tri Dao
9c531bdc0a
Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI
2023-08-14 10:03:31 -07:00
Tri Dao
67ae6fd74b
Bump to v2.0.6
2023-08-13 16:52:48 -07:00
Tri Dao
2ddeaa406c
Fix wheel building
2023-08-13 16:48:47 -07:00
Tri Dao
d8ec6a2f13
Merge branch 'piercefreeman-feature/demo-wheels'
...
* piercefreeman-feature/demo-wheels: (25 commits)
Install standard non-wheel package
Remove release creation
Build wheel on each push
Isolate 2.0.0 & cuda12
Clean setup.py imports
Remove builder project
Bump version
Add notes to github action workflow
Add torch dependency to final build
Exclude cuda erroring builds
Exclude additional disallowed matrix params
Full version matrix
Add CUDA 11.7
Release is actually unsupported
echo OS version
Temp disable deploy
OS version build numbers
Restore full build matrix
Refactor and clean of setup.py
Strip cuda name from torch version
...
2023-08-13 16:09:38 -07:00
Tri Dao
3c458cff77
Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels
...
* 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention : (25 commits)
Install standard non-wheel package
Remove release creation
Build wheel on each push
Isolate 2.0.0 & cuda12
Clean setup.py imports
Remove builder project
Bump version
Add notes to github action workflow
Add torch dependency to final build
Exclude cuda erroring builds
Exclude additional disallowed matrix params
Full version matrix
Add CUDA 11.7
Release is actually unsupported
echo OS version
Temp disable deploy
OS version build numbers
Restore full build matrix
Refactor and clean of setup.py
Strip cuda name from torch version
...
2023-08-13 16:03:51 -07:00
Tri Dao
dbd7923782
Prepare for Cutlass 3.2
2023-08-13 15:24:32 -07:00
Tri Dao
c5e87b11e9
Bump to v2.0.5
2023-08-13 13:55:04 -07:00
Tri Dao
3524e13c11
Update to Cutlass 3.1
2023-08-13 13:53:17 -07:00
Pierce Freeman
6ef3bd800e
Install standard non-wheel package
2023-08-10 20:12:20 -07:00
Pierce Freeman
ecc6535443
Remove release creation
2023-08-10 19:56:24 -07:00
Pierce Freeman
bc6d4992f2
Build wheel on each push
2023-08-10 19:55:52 -07:00
Pierce Freeman
565615c603
Isolate 2.0.0 & cuda12
2023-08-10 19:54:29 -07:00
Tri Dao
364a5b4a71
[MLP] Change the check for out_features being None
2023-08-10 00:04:38 -07:00
Tri Dao
d30f2e1cd5
Bump to v2.0.4
2023-08-01 09:01:07 -07:00
Tri Dao
1c41d2b0e5
Fix race condition in bwd (overwriting sK)
2023-08-01 09:00:10 -07:00
Tri Dao
a4e5d1eddd
Bump to v2.0.3
2023-07-31 17:49:23 -07:00