flash-attention/flash_attn/utils
Xuechen Li 0f7853c6a1
enable loading hf llama checkpoints for training (#446)
* prelim.

* add hf convertion fn.

* mlp.

* change name.

* fix bug.

* inverse permute.

* change comment.

* revert style changes.

* fix.

* add doc.

* revert.

* enable load safe.

* fix safe load.

* fix import.

* fix typing-related lints.

* fix ckpt loading logic.

* make single gpu work.

* test with parallel.

* ckpt format.

* enable pretrained state dict.

* remove unused imports.

* remove unused.

* mark idea related.
2023-08-15 08:33:15 -07:00
..
__init__.py Add __init__.py files to subdirectories for installation 2022-11-17 16:55:44 -08:00
benchmark.py [Benchmark] Add script to benchmark FlashAttention 2023-07-28 00:26:52 -10:00
distributed.py [TP] Implement TensorParallel without sequence parallel 2023-01-07 13:45:22 -08:00
generation.py [Gen] Minor tweak to allocate_inference_cache 2023-04-21 11:56:47 -07:00
pretrained.py enable loading hf llama checkpoints for training (#446) 2023-08-15 08:33:15 -07:00