Tri Dao
|
85b51d61ee
|
Bump version to 1.0.7
|
2023-05-30 14:18:44 -07:00 |
|
Tri Dao
|
48bc6eacd6
|
[Gen] Add rotary base as an argument to FT attention kernel
|
2023-05-30 13:38:34 -07:00 |
|
Kirthi Shankar Sivamani
|
dd9c3a1fc2
|
bump to v1.0.6
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
2023-05-26 17:44:10 -07:00 |
|
Max H. Gerlach
|
31f78a9814
|
Allow adding an optional local version to the package version
|
2023-05-19 17:27:41 +02:00 |
|
Federico Berto
|
69f5f7d0a2
|
[BugFix] cannot unpack non-iterable NoneType object
|
2023-05-07 03:07:44 +09:00 |
|
Federico Berto
|
3889ba168b
|
[BugFix] cannot unpack non-iterable NoneType object
|
2023-05-07 03:07:30 +09:00 |
|
Tri Dao
|
a9a4b4e4f2
|
[LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm
|
2023-05-04 23:39:43 -07:00 |
|
Tri Dao
|
fcab93b43a
|
[Gen] Minor tweak to allocate_inference_cache
|
2023-04-21 11:56:47 -07:00 |
|
Tri Dao
|
ba2fe7f378
|
[Gen] Move allocate_inference_cache to within the model
|
2023-04-20 18:15:12 -07:00 |
|
Tri Dao
|
3da42d24b1
|
[GPT] Add option to only return the logit for the last token
|
2023-04-20 17:21:08 -07:00 |
|
Tri Dao
|
311d6606bf
|
[Gen] Fix FT kernel smem size, CG when batch size changed
|
2023-04-20 17:03:13 -07:00 |
|
Tri Dao
|
96d10f6545
|
Implement LLaMa
|
2023-04-18 21:51:35 -07:00 |
|
Tri Dao
|
b630aef53f
|
Implement GatedMlp
|
2023-04-18 03:37:14 -07:00 |
|
Tri Dao
|
ac3b684cdb
|
Have a separate nn.Dropout module in SelfAttention module
|
2023-04-17 22:34:05 -07:00 |
|
Kirthi Shankar Sivamani
|
a0997bc77c
|
Merge branch 'HazyResearch:main' into enable_cuda_graph_capture
|
2023-04-14 21:45:37 -07:00 |
|
Tri Dao
|
605655bc66
|
[Gen] Fix FT kernel when using CG
|
2023-04-14 16:50:01 -07:00 |
|
Kirthi Shankar Sivamani
|
081c2b012a
|
Merge branch 'HazyResearch:main' into enable_cuda_graph_capture
|
2023-04-13 19:36:45 -07:00 |
|
Tri Dao
|
1c9ef9b399
|
[Gen] Measure prompt processing + decoding time, not just decoding
|
2023-04-13 15:39:56 -07:00 |
|
Tri Dao
|
6f6e9a9aaf
|
[FusedDense] Enable sqrelu activation in FusedMLP
|
2023-04-13 15:29:32 -07:00 |
|
Kirthi Shankar Sivamani
|
7d25a4ec4f
|
Handle FlashAttnQKVPackedSplitFunc by making rng_state optional in backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
2023-04-13 06:25:52 +00:00 |
|
Kirthi Shankar Sivamani
|
315fd31f0c
|
Merge branch 'HazyResearch:main' into enable_cuda_graph_capture
|
2023-04-12 22:42:24 -07:00 |
|
Zhiyuan Chen
|
8c42415664
|
make mlp hidden_features defaults to 4*in_features
|
2023-04-13 11:08:21 +08:00 |
|
Kirthi Shankar Sivamani
|
31018c5fa0
|
Support CUDA graph capture
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
2023-04-12 16:53:22 -07:00 |
|
Tri Dao
|
d6fc860573
|
Merge pull request #147 from ksivaman/add_deterministic_execution_option
Add option for deterministic execution
|
2023-03-31 17:32:50 -04:00 |
|
Tri Dao
|
393882bc08
|
[LayerNorm] Implement LN with parallel residual, support dim 8k
|
2023-03-31 14:23:45 -07:00 |
|
Kirthi Shankar Sivamani
|
b6aa059bbf
|
Add option for deterministic execution
|
2023-03-30 18:23:35 -07:00 |
|
Tri Dao
|
993d12448e
|
Implement GPT-NeoX
|
2023-03-29 01:21:25 -07:00 |
|
Tri Dao
|
f5d0fbd468
|
[FT] Fix FT's single query attention for bf16 hdim128 rotary
|
2023-03-28 21:27:00 -07:00 |
|
Tri Dao
|
4d87e4d875
|
Implement GPT-J
|
2023-03-22 16:16:58 -07:00 |
|
Tri Dao
|
5d079fdd7a
|
[Triton] Fix benchmark_causal, mention Triton version
|
2023-03-22 00:51:16 -07:00 |
|
Tri Dao
|
dc08ea1c33
|
Support H100 for other CUDA extensions
|
2023-03-15 16:59:27 -07:00 |
|
Vik Paruchuri
|
3165398074
|
Remove unused kwargs in flashattention
|
2023-03-15 10:36:19 -07:00 |
|
Tri Dao
|
e45a46a5b7
|
[Rotary] Implement GPT-J style (interleaved) rotary
|
2023-03-14 14:35:53 -07:00 |
|
Tri Dao
|
78b7a1dc18
|
[OPT] Load fp16 weights on CPU before moving to GPU
|
2023-01-22 17:01:32 -08:00 |
|
Tri Dao
|
eb33e587e9
|
[LayerNorm] Rename x1 -> residual
|
2023-01-19 13:07:27 -08:00 |
|
Tri Dao
|
f68d41ec77
|
[Gen] Add OPT to generation test
|
2023-01-17 19:59:06 -08:00 |
|
Tri Dao
|
88173a1aaf
|
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
2023-01-17 18:12:27 -08:00 |
|
Tri Dao
|
780e8eeabb
|
[ViT] Support timm checkpoint, add tests
|
2023-01-16 01:20:34 -08:00 |
|
Tri Dao
|
2ec7d3f72c
|
Merge pull request #105 from jamaliki/patch-1
Change default dropout value in documentation
|
2023-01-15 23:01:20 -08:00 |
|
Tri Dao
|
ef085cfcda
|
[ViT] Fix extra norm_0, use new LN order in Block
|
2023-01-15 22:58:56 -08:00 |
|
Tri Dao
|
ff34123bd4
|
Reorder LN in Block, support OPT
|
2023-01-15 22:14:31 -08:00 |
|
Tri Dao
|
7c2191542a
|
[Gen] Make generation work with Tensor Parallel
|
2023-01-15 11:34:27 -08:00 |
|
Kiarash Jamali
|
41cb909741
|
Change default dropout value in documentation
Documentation says default is 0.1, but the code has attention_dropout default at 0.0
|
2023-01-13 10:50:07 +00:00 |
|
Tri Dao
|
f95c2fc108
|
[Gen] Remove commented code
|
2023-01-07 19:06:39 -08:00 |
|
Tri Dao
|
b48599002a
|
[Gen] Add timing option
|
2023-01-07 19:05:09 -08:00 |
|
Tri Dao
|
0938298e4c
|
[Gen] Adjust shape of kv_cache when using FT
|
2023-01-07 17:27:54 -08:00 |
|
Tri Dao
|
e02fd588aa
|
[Gen] Implement top-k and top-p sampling
|
2023-01-07 17:00:02 -08:00 |
|
Tri Dao
|
11be742aa3
|
[Gen] Test generation with rotary embedding
|
2023-01-07 14:37:54 -08:00 |
|
Tri Dao
|
8d9674ed08
|
Merge pull request #102 from Lamikins/main
fixed cross attention typeerror
|
2023-01-07 13:56:20 -08:00 |
|
Tri Dao
|
93383bd55b
|
[TP] Implement TensorParallel without sequence parallel
|
2023-01-07 13:45:22 -08:00 |
|