优化器

2021-04-11  本文已影响0人  三方斜阳

torch.optim :

import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3)
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr = 0.0001)
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']#将权重衰减应用于除了'偏移''gamma''beta'这些之外的所有参数
optimizer_grouped_parameters = [
        {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
        'weight_decay_rate': 0.01},
        {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
        'weight_decay_rate': 0.0}]
optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5)

no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
            {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
            {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
            ]
optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)

for input, target in dataset:
    optimizer.zero_grad()||梯度清零
    output = model(input)
    loss = loss_fn(output, target)||计算loss
    loss.backward() ||计算梯度
    optimizer.step()|| 参数更新

2. 保存模型:

torch.save(the_model.state_dict(), PATH)
output_dir = './models/'
output_model_file = os.path.join(output_dir, WEIGHTS_NAME)
output_config_file = os.path.join(output_dir, CONFIG_NAME)

torch.save(model.state_dict(), output_model_file)
model.config.to_json_file(output_config_file)
model =model_name.from_pretrained(output_dir).to(device)
torch.save(the_model, PATH)
the_model = torch.load(PATH)

参考:

https://ptorch.com/docs/1/optim#how-to-use-an-optimizer

上一篇下一篇

猜你喜欢

热点阅读