ferdinand.mom
|
0360ec0d2a
|
use hf_transfer which improve download time by 3
|
2024-12-18 14:51:14 +00:00 |
|
ferdinand.mom
|
75cd0d77f9
|
download safetensors when creating config time. If we do it in training, barrier() may tiemout while waiting for download
|
2024-12-17 15:46:16 +00:00 |
|
ferdinand.mom
|
b57b8277d1
|
breaking: add new version of initi meta device but memory leaks
|
2024-12-17 15:46:16 +00:00 |
|
ferdinand.mom
|
859650a2c0
|
breaking: refactor loading big model to only download safetensors files
|
2024-12-17 15:46:09 +00:00 |
|
ferdinand.mom
|
9d4f0ee4ff
|
fix requirements to avoid drop in throughput
|
2024-11-04 14:33:07 +00:00 |
|
ferdinand.mom
|
6f6bc1945a
|
add wandb support
|
2024-09-25 14:19:16 +00:00 |
|
ferdinand.mom
|
7a57407c54
|
breaking: socketStartConnect: Connect to <ip address> failed : Software caused connection abort
|
2024-09-23 21:14:48 +00:00 |
|
ferdinand.mom
|
c36d415b47
|
add training and generate for pp
|
2024-09-19 14:06:46 +00:00 |
|