* adding files for fp8 changes.
* removed contiguous check.
* enable all tests except odd-seq-lengths, where it crashes now.
* undid clang formatting.
* change to correct tile size for headdim=128.
* fixed odd-seq-len-k.
* minor formatting.
* minor reformatting.
---------
Co-authored-by: Tri Dao <tridao@users.noreply.github.com>