微软AutoML-NNI自动化调参踩坑笔记（持续更新）

2021-05-04 本文已影响0人 AI信仰者

module 'nni.retiarii.nn.pytorch.nn' has no attribute 'Hardsigmoid'
module 'torch.nn' has no attribute 'Hardsigmoid'
解决办法：把出错的行注释

RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 8.00 GiB total capacity; 6.72 GiB already allocated; 0 bytes free; 6.73 GiB reserved in total by PyTorch)
解决办法：
1、不要同时跑很多占显存的程序
2、换一块内存更高的显卡

'"nvidia_pyindex uninstall"' 不是内部或外部命令，也不是可运行的程序
解决办法：

Package nvidia-dali is now deprecated. Please install nvidia-dali-cuda100 instead:
解决办法：

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist\ nvidia-dali-cuda100
For more information, go to https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html#installing-prebuilt-dali-packages

TypeError: 'torch._C.Node' object is not subscriptable

如何不使用nnictl create --config来运行和脱离配置文件运行？

name 'YAML' is not defined

在common.py 中将YAML()改成yaml

Please set "use_active_gpu"
在主运行代码中加入这些代码，或者直接抛弃yml文件运行py文件

experiment = Experiment(['local', 'remote'])
experiment.config.experiment_name = 'test'
experiment.config.trial_concurrency = 3
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(file).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service[0].use_active_gpu = True
experiment.config.training_service[1].reuse_mode = True

'Trial job wQQiw status changed from WAITING to FAILED'
一般是因为主运行代码出了问题

trial:
command: python3 main.py
codeDir: .
gpuNum: 0

应该改成python，因为有可能你并没有配置python3的环境变量

trial:
command: python main.py
codeDir: .
gpuNum: 0

微软AutoML-NNI自动化调参踩坑笔记（持续更新）

猜你喜欢

热点阅读