* ex42: Fused MHA imported from xFormers
* Remove std:: references
* Support K>128 in the example
* Support causal option
* Support different head size for V, and different seqlength for KV
* Update FLOPS counter
* Remove bit_cast
* fix build: Replace M_LOG2E
* Add doc
* Revert "Remove bit_cast"
This reverts commit 9662fa86bb7c57c1a015ac0bf52cb52940fbbf80.
* Explicit casts to int32_t for windows build
Co-authored-by: danthe3rd <danthe3rd>