* Add custom ops for compatibility with PT Compile * Add support for varlen functions too * Add version checks for pytorch API * Fix PT compile interfaces so it works e2e * Make sure PT < 2.4 runs fine * Fix python mistake * Fix all the autograd magic issues * typo on head_dim * Fix deterministic test failures, remove unneeded detaches() * remove test requires_grad * Resolve all the pytorch versioning issues * C++ and python refactor to improve padding management for torch.compile() * Add improvements suggested by @anijain2305 |
||
|---|---|---|
| .. | ||
| composable_kernel@a9b170b541 | ||
| cutlass@756c351b49 | ||
| flash_attn | ||
| flash_attn_ck | ||
| ft_attention | ||
| fused_dense_lib | ||
| fused_softmax | ||
| layer_norm | ||
| rotary | ||
| xentropy | ||