* Add custom ops for compatibility with PT Compile
* Add support for varlen functions too
* Add version checks for pytorch API
* Fix PT compile interfaces so it works e2e
* Make sure PT < 2.4 runs fine
* Fix python mistake
* Fix all the autograd magic issues
* typo on head_dim
* Fix deterministic test failures, remove unneeded detaches()
* remove test requires_grad
* Resolve all the pytorch versioning issues
* C++ and python refactor to improve padding management for torch.compile()
* Add improvements suggested by @anijain2305