flash-attention/flash_attn/utils/pretrained.py

import torch

from transformers.utils import WEIGHTS_NAME
from transformers.utils.hub import cached_file


def state_dict_from_pretrained(model_name, device=None):
    return torch.load(cached_file(model_name, WEIGHTS_NAME), map_location=device)
Tweak CrossEntropyLoss to take process_group in init 2022-12-28 01:49:59 +08:00			`import torch`

			`from transformers.utils import WEIGHTS_NAME`
			`from transformers.utils.hub import cached_file`


[Gen] Test generation with rotary embedding 2023-01-08 06:33:54 +08:00			`def state_dict_from_pretrained(model_name, device=None):`
			`return torch.load(cached_file(model_name, WEIGHTS_NAME), map_location=device)`