Terminal for slurm part 1
If you don't have root access to run commands with sudo
, you can still modify the slurm.conf
configuration file by following these steps:
-
Copy the
slurm.conf
file to a location where you have write access, such as your home directory.cp /etc/slurm/slurm.conf ~/slurm.conf
-
Open the
slurm.conf
file for editing using a text editor such asnano
,vim
, orgedit
.nano ~/slurm.conf
-
Modify the various configuration parameters to suit your needs.
-
Save the modified
slurm.conf
file and exit the text editor. -
Set the
SLURM_CONF
environment variable to point to the modified configuration file.export SLURM_CONF=~/slurm.conf
-
Verify that the
SLURM_CONF
environment variable is set correctly by running the following command:echo $SLURM_CONF
-
Run the Slurm commands as usual, such as:
srun -N 2 --ntasks-per-node=4 hostname sinfo sacctmgr list users
These commands respectively launch a job that runs the
hostname
command on two nodes with 4 CPU cores each, display the status of nodes and partitions in the cluster, and list users in the accounting database.
Note that setting the SLURM_CONF
environment variable only affects the current shell session. If you want to make the variable persistent, you can add it to your shell startup file, such as .bashrc
or .zshrc
.
To check if Slurm is installed properly on your system, you can check the Slurm version installed using the following command in your terminal or command prompt:
$ sinfo --version
The above command will return the version of Slurm which is currently installed on your system. If Slurm is not installed, this command will return an error.
You can also check the status of the Slurm daemon and the control daemon by running the following command:
$ systemctl status slurmd slurmctld
This command will show you the status of the Slurm daemons and if they are running or not. If both daemons are running with active status, then Slurm is installed correctly on this system.
The error message suggests that Slurm is not correctly installed or configured, and it is unable to find the configuration file. This error message occurs because Slurm is trying to use DNS SRV lookup to find the configuration source, but it is unable to find it.
You can try to troubleshoot this issue by checking the Slurm configuration file slurm.conf
and making sure that it is located in the correct directory /etc/slurm
and has the correct permissions set.
You may also need to check the Slurm service and make sure it is running using the following command:
$ systemctl status slurmd
If the service is not running, you can start it using:
$ sudo systemctl start slurmd
If you have made any changes to the slurm.conf
file, you should restart the Slurm service using the following command:
$ sudo systemctl restart slurmd
After that, you can try running the sinfo --version
command again to verify that Slurm is installed and running properly.
If you get the error message "Unit slurmd.service could not be found" when running systemctl --user status slurmd
, it indicates that the slurmd daemon is not running under your user account.
This could be due to one of the following reasons:
- The Slurm service was not installed under your user account, but instead was installed system-wide requiring administrator privileges to manage it.
- The
slurmd
daemon was not started or was stopped after installation.
To resolve this issue, you can try the following steps:
-
Check if Slurm was installed under your user account or system-wide. You can check by running the following command:
$ which slurmd
If the command returns a path that starts with
/usr
, then Slurm is installed system-wide and you may need administrator privileges to manage it. If the command returns a path that starts with/home
, then Slurm was installed under your user account. -
If Slurm was installed system-wide, you should run the
systemctl status slurmd
command with administrator privileges:$ sudo systemctl status slurmd
This will show you the status of the
slurmd
daemon and whether it is running or not. -
If Slurm was installed under your user account, you can start the
slurmd
daemon by running the following command:$ slurmd -Dvvv
This will start the
slurmd
daemon in debug mode and will show you any error messages that may be occurring.Alternatively, you can try running the following command to start
slurmd
:$ systemctl --user start slurmd
If
slurmd
was already running, you may need to stop it first before you can start it again:$ systemctl --user stop slurmd
After performing these steps, running systemctl --user status slurmd
should show you the status of the slurmd
daemon and whether it is running correctly or not.
The error message suggests that Slurm is not correctly installed or configured, and it is unable to find the configuration file. This error message occurs because Slurm is trying to use DNS SRV lookup to find the configuration source, but it is unable to find it.
You can try to troubleshoot this issue by checking the Slurm configuration file slurm.conf
and making sure that it is located in the correct directory /etc/slurm
and has the correct permissions set.
You may also need to check the Slurm service and make sure it is running using the following command:
$ systemctl status slurmd
If the service is not running, you can start it using:
$ sudo systemctl start slurmd
If you have made any changes to the slurm.conf
file, you should restart the Slurm service using the following command:
$ sudo systemctl restart slurmd
After that, you can try running the sinfo --version
command again to verify that Slurm is installed and running properly.
It seems that Slurm is installed under your user account .
You can start the slurmd
daemon by running the following command:
$ slurmd -Dvvv
This will start the slurmd
daemon in debug mode and will show you any errors or informational messages that may be occurring.
Alternatively, you can try running the following command to start slurmd
:
$ systemctl --user start slurmd
If slurmd
was already running, you may need to stop it first before you can start it again:
$ systemctl --user stop slurmd
After performing these steps, running systemctl --user status slurmd
should show you the status of the slurmd
daemon and whether it is running correctly or not.