Commit Graph

4 Commits

Author SHA1 Message Date
dan_the_3rd
c3f2a632aa
[ft_attention] Fix for seqlen=8136 (#488)
When seqlen=8136, `smem_sz = 48840`, and apparently starting the kernel returns an `invalid argument` CUDA error.

`48840 < 48 * 1024` but apparently it's still above the limit somehow..?
Tested on A100
2023-08-28 10:00:22 -07:00
Tri Dao
62e9814466 [Rotary] Make sure frequency calculation is in fp32 2023-07-02 16:39:39 -07:00
Tri Dao
311d6606bf [Gen] Fix FT kernel smem size, CG when batch size changed 2023-04-20 17:03:13 -07:00
Tri Dao
a01d1213d7 [Gen] Add kernel from FasterTransformer for benchmarking 2023-01-03 17:37:43 -08:00