flash-attention/csrc/xentropy/README.md

This CUDA extension implements optimized cross-entropy loss, adapted from Apex's
[Xentropy](https://github.com/NVIDIA/apex/tree/master/apex/contrib/xentropy).
We make it work for bfloat16 and support in-place backward to save memory.

It has only been tested on A100s.

```sh
cd csrc/xentropy && pip install .
```

As of 2023-09-15, this extension is no longer used in the FlashAttention repo.
We've instead switched to a Triton-based
[implementation](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/ops/triton/cross_entropy.py). 
See the CrossEntropyLoss [module](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/losses/cross_entropy.py) for more details.
Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-14 13:52:00 +08:00			`This CUDA extension implements optimized cross-entropy loss, adapted from Apex's`
			`[Xentropy](https://github.com/NVIDIA/apex/tree/master/apex/contrib/xentropy).`
			`We make it work for bfloat16 and support in-place backward to save memory.`
Mention that some CUDA extensions have only been tested on A100s 2022-11-15 23:10:25 +08:00
			`It has only been tested on A100s.`

Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-14 13:52:00 +08:00			```sh
			`cd csrc/xentropy && pip install .`
			```
[CE] Implement CrossEntropyLoss in Triton 2023-09-16 10:27:18 +08:00
			`As of 2023-09-15, this extension is no longer used in the FlashAttention repo.`
			`We've instead switched to a Triton-based`
			`[implementation](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/ops/triton/cross_entropy.py).`
			`See the CrossEntropyLoss [module](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/losses/cross_entropy.py) for more details.`