* uneql rank. * trim. * enable passing in number of heads for each rank. * simplify. * simplify. * cleanup. * fix col parallel. * fix bug with row parallel. * fit out proj. * refac. * fix sharding logic. * refac sharding. * refac. * support multiple of. * make fn reuseable. * fix bug in dimensions. * scaffold. * test uneven heads. * fix test by adding barrier. * refac. * reuse code. * clean up. |
||
|---|---|---|
| .. | ||
| layers | ||
| losses | ||
| models | ||
| modules | ||
| ops | ||
| utils | ||
| __init__.py | ||
| bert_padding.py | ||
| flash_attn_interface.py | ||
| flash_attn_triton_og.py | ||
| flash_attn_triton.py | ||
| flash_blocksparse_attention.py | ||
| flash_blocksparse_attn_interface.py | ||
| fused_softmax.py | ||