Tri Dao
|
8c20cfef49
|
[Rotary] Support qkv block layout from GQA
|
2024-09-11 10:39:58 -07:00 |
|
Tri Dao
|
c7f32a8409
|
[CrossEntropy] Support precomputed LSE
|
2024-09-08 09:24:43 -07:00 |
|
Tri Dao
|
d79f9b41a8
|
[CrossEntropy] Use online softmax to simplify implementation
|
2024-08-24 17:40:39 -07:00 |
|
lancerts
|
22339db185
|
remove an unused import (#960)
|
2024-05-23 11:12:31 -07:00 |
|
Tri Dao
|
ec6d22143b
|
[CrossEntropy] Change ignored_index -> ignore_index
|
2024-04-26 10:50:41 -07:00 |
|
Curtis "Fjord" Hawthorne
|
d8aacc510c
|
return z_loss (#768)
|
2024-01-21 15:23:41 -08:00 |
|
Tri Dao
|
08124c8f9c
|
[CrossEntropy] Implement logit_scale option
|
2023-12-16 18:39:37 -08:00 |
|
Tri Dao
|
aaa1474129
|
[CrossEntropy] Simplify the case of large vocab with Tensor Parallel
|
2023-11-19 23:19:36 -08:00 |
|
Shijie
|
abf04a56e1
|
fix flash ce mp large vocab (#673)
|
2023-11-19 23:01:07 -08:00 |
|
Tri Dao
|
c79de85ffa
|
[CrossEntropy] Fix triton cross_entropy_loss IMA for >=2B elements
|
2023-10-24 00:17:34 -07:00 |
|
Tri Dao
|
5400fdc4ac
|
[CE] Implement CrossEntropyLoss in Triton
|
2023-09-15 20:05:28 -07:00 |
|