From ece8f05d09f539e1412dec17905e60f062126aef Mon Sep 17 00:00:00 2001
From: Tri Dao <tridpq@gmail.com>
Date: Thu, 15 Dec 2022 19:44:59 -0800
Subject: [PATCH] [Docs] Mention PubMedGPT

---
 usage.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/usage.md b/usage.md
index fd26668..5b6f24c 100644
--- a/usage.md
+++ b/usage.md
@@ -45,6 +45,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
 
 ## Language model training & inference
 
+- [PubMedGPT 2.7B](https://crfm.stanford.edu/2022/12/15/pubmedgpt.html), a
+  domain-specific LLM for biomedicine, by Stanford CRFM, trained on
+  [MosaicML](https://www.mosaicml.com/blog/introducing-pubmed-gpt) Cloud. Just
+  using FlashAttention nearly halves the total training time.
+
 - Meta's
   [AITemplate](https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/)
   uses FlashAttention as part of their approach to speed up Transformer