Commit Graph

17 Commits

Author SHA1 Message Date
Tri Dao
dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead 2023-09-18 15:29:06 -07:00
Tri Dao
ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 2023-09-16 01:20:16 -07:00
Tri Dao
a86442f0f3 [Gen] Use flash_attn_with_kvcache in generation 2023-09-07 08:24:43 -07:00
Tri Dao
fd20f16a4e Support cache_seqlens being integer 2023-09-05 11:27:48 -07:00
Tri Dao
913922cac5 [Gen] Refactor decoding function 2023-09-04 17:01:38 -07:00
dan_the_3rd
011ec323d6
Support MQA + MP for decoding (#490)
Co-authored-by: danthe3rd <danthe3rd>
2023-08-30 10:29:54 -07:00
Tri Dao
9f42cb6e7a [Gen] Clone logits before returning when cg=True 2023-08-27 23:19:58 -07:00
Tri Dao
f8aea6ead0 [GPT] Generalize last_token_only arg to num_last_tokens 2023-08-26 20:47:53 -07:00
Tri Dao
371e20658c [GPT] Test generation when passing in multiple tokens 2023-08-26 13:56:41 -07:00
Tri Dao
c000c3a2c0 [GPT] Move more tests to test_gpt.py 2023-08-26 13:00:40 -07:00
Tri Dao
9b713872ea [GPT] Move GPT and OPT generation tests to test_{gpt,opt}.py 2023-08-26 12:55:02 -07:00
Tri Dao
0e8c46ae08 Run isort and black on test files 2023-08-18 20:59:35 -07:00
Tri Dao
4d87e4d875 Implement GPT-J 2023-03-22 16:16:58 -07:00
Tri Dao
88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP 2023-01-17 18:12:27 -08:00
Tri Dao
ff34123bd4 Reorder LN in Block, support OPT 2023-01-15 22:14:31 -08:00
Tri Dao
63670fd84a Implement generation for GPT 2022-12-27 21:01:50 -08:00
Tri Dao
9d797d8848 Support loading GPT2 weights from Huggingface 2022-12-27 11:22:48 -08:00