微软AutoML-NNI自动化调参踩坑笔记(持续更新)
module 'nni.retiarii.nn.pytorch.nn' has no attribute 'Hardsigmoid'
module 'torch.nn' has no attribute 'Hardsigmoid'
解决办法:把出错的行注释
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 8.00 GiB total capacity; 6.72 GiB already allocated; 0 bytes free; 6.73 GiB reserved in total by PyTorch)
解决办法:
1、不要同时跑很多占显存的程序
2、换一块内存更高的显卡
'"nvidia_pyindex uninstall"' 不是内部或外部命令,也不是可运行的程序
解决办法:
Package nvidia-dali is now deprecated. Please install nvidia-dali-cuda100 instead:
解决办法:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist\ nvidia-dali-cuda100
For more information, go to https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html#installing-prebuilt-dali-packages
TypeError: 'torch._C.Node' object is not subscriptable
如何不使用nnictl create --config来运行和脱离配置文件运行?
name 'YAML' is not defined
在common.py 中将YAML()改成yaml
Please set "use_active_gpu"
在主运行代码中加入这些代码,或者直接抛弃yml文件运行py文件
experiment = Experiment(['local', 'remote'])
experiment.config.experiment_name = 'test'
experiment.config.trial_concurrency = 3
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(file).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service[0].use_active_gpu = True
experiment.config.training_service[1].reuse_mode = True
'Trial job wQQiw status changed from WAITING to FAILED'
一般是因为主运行代码出了问题
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
应该改成python,因为有可能你并没有配置python3的环境变量
trial:
command: python main.py
codeDir: .
gpuNum: 0