[Docs] Mention FasterTransformer integration

This commit is contained in:
Tri Dao 2022-12-05 00:34:09 -08:00
parent 4a6eaa9f27
commit a84d07283c

View File

@ -50,6 +50,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
uses FlashAttention as part of their approach to speed up Transformer uses FlashAttention as part of their approach to speed up Transformer
inference (up to 5.3x on BERT). inference (up to 5.3x on BERT).
- Nvidia's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) is a
state-of-the-art Transformer inference library. As of version
[5.2](https://github.com/NVIDIA/FasterTransformer/commit/b672f49e256ba7a2d4fc9691d270b60b7fc1a2ff),
FlashAttention is used as a component of FasterTransformer to speed up GPT inference.
- [Kernl](https://github.com/ELS-RD/kernl) is a library for fast Transformer - [Kernl](https://github.com/ELS-RD/kernl) is a library for fast Transformer
inference. They use FlashAttention as part of their inference. They use FlashAttention as part of their
[approach](https://twitter.com/pommedeterre33/status/1585284221014245377) to [approach](https://twitter.com/pommedeterre33/status/1585284221014245377) to