Terminal for slurm part 1

2023-05-06  本文已影响0人  李时刻

If you don't have root access to run commands with sudo, you can still modify the slurm.conf configuration file by following these steps:

  1. Copy the slurm.conf file to a location where you have write access, such as your home directory.

    cp /etc/slurm/slurm.conf ~/slurm.conf
    
  2. Open the slurm.conf file for editing using a text editor such as nano, vim, or gedit.

    nano ~/slurm.conf
    
  3. Modify the various configuration parameters to suit your needs.

  4. Save the modified slurm.conf file and exit the text editor.

  5. Set the SLURM_CONF environment variable to point to the modified configuration file.

    export SLURM_CONF=~/slurm.conf
    
  6. Verify that the SLURM_CONF environment variable is set correctly by running the following command:

    echo $SLURM_CONF
    
  7. Run the Slurm commands as usual, such as:

    srun -N 2 --ntasks-per-node=4 hostname
    sinfo
    sacctmgr list users
    

    These commands respectively launch a job that runs the hostname command on two nodes with 4 CPU cores each, display the status of nodes and partitions in the cluster, and list users in the accounting database.

Note that setting the SLURM_CONF environment variable only affects the current shell session. If you want to make the variable persistent, you can add it to your shell startup file, such as .bashrc or .zshrc.

To check if Slurm is installed properly on your system, you can check the Slurm version installed using the following command in your terminal or command prompt:

$ sinfo --version

The above command will return the version of Slurm which is currently installed on your system. If Slurm is not installed, this command will return an error.

You can also check the status of the Slurm daemon and the control daemon by running the following command:

$ systemctl status slurmd slurmctld

This command will show you the status of the Slurm daemons and if they are running or not. If both daemons are running with active status, then Slurm is installed correctly on this system.

The error message suggests that Slurm is not correctly installed or configured, and it is unable to find the configuration file. This error message occurs because Slurm is trying to use DNS SRV lookup to find the configuration source, but it is unable to find it.

You can try to troubleshoot this issue by checking the Slurm configuration file slurm.conf and making sure that it is located in the correct directory /etc/slurm and has the correct permissions set.

You may also need to check the Slurm service and make sure it is running using the following command:

$ systemctl status slurmd

If the service is not running, you can start it using:

$ sudo systemctl start slurmd

If you have made any changes to the slurm.conf file, you should restart the Slurm service using the following command:

$ sudo systemctl restart slurmd

After that, you can try running the sinfo --version command again to verify that Slurm is installed and running properly.

If you get the error message "Unit slurmd.service could not be found" when running systemctl --user status slurmd, it indicates that the slurmd daemon is not running under your user account.

This could be due to one of the following reasons:

To resolve this issue, you can try the following steps:

  1. Check if Slurm was installed under your user account or system-wide. You can check by running the following command:

    $ which slurmd
    

    If the command returns a path that starts with /usr, then Slurm is installed system-wide and you may need administrator privileges to manage it. If the command returns a path that starts with /home, then Slurm was installed under your user account.

  2. If Slurm was installed system-wide, you should run the systemctl status slurmd command with administrator privileges:

    $ sudo systemctl status slurmd
    

    This will show you the status of the slurmd daemon and whether it is running or not.

  3. If Slurm was installed under your user account, you can start the slurmd daemon by running the following command:

    $ slurmd -Dvvv
    

    This will start the slurmd daemon in debug mode and will show you any error messages that may be occurring.

    Alternatively, you can try running the following command to start slurmd:

    $ systemctl --user start slurmd
    

    If slurmd was already running, you may need to stop it first before you can start it again:

    $ systemctl --user stop slurmd
    

After performing these steps, running systemctl --user status slurmd should show you the status of the slurmd daemon and whether it is running correctly or not.

The error message suggests that Slurm is not correctly installed or configured, and it is unable to find the configuration file. This error message occurs because Slurm is trying to use DNS SRV lookup to find the configuration source, but it is unable to find it.

You can try to troubleshoot this issue by checking the Slurm configuration file slurm.conf and making sure that it is located in the correct directory /etc/slurm and has the correct permissions set.

You may also need to check the Slurm service and make sure it is running using the following command:

$ systemctl status slurmd

If the service is not running, you can start it using:

$ sudo systemctl start slurmd

If you have made any changes to the slurm.conf file, you should restart the Slurm service using the following command:

$ sudo systemctl restart slurmd

After that, you can try running the sinfo --version command again to verify that Slurm is installed and running properly.

It seems that Slurm is installed under your user account .

You can start the slurmd daemon by running the following command:

$ slurmd -Dvvv

This will start the slurmd daemon in debug mode and will show you any errors or informational messages that may be occurring.

Alternatively, you can try running the following command to start slurmd:

$ systemctl --user start slurmd

If slurmd was already running, you may need to stop it first before you can start it again:

$ systemctl --user stop slurmd

After performing these steps, running systemctl --user status slurmd should show you the status of the slurmd daemon and whether it is running correctly or not.

上一篇下一篇

猜你喜欢

热点阅读