Sample Python Jobs
Instead of using the system-wide python, it is recommended that you create your own local python conda environment. With this setup, you can install any specific packages or versions of python that you need, and have full administrative control over your python installations. A nice cheat sheet of conda commands can be found here.
Creating a Python Virtual Environment
Begin by logging in to rcfcluster
as described in the beginning of this guide.
To use conda, first load the corresponding miniconda
module:
module load miniconda
Initialize your shell to use conda:
conda init bash
This step modifies your ~/.bashrc
file, and only needs to be done once. Then, activate the base environment:
source ~/.bashrc
This will be reflected in your shell prompt:
(base) user@rcfcluster:~$
To create a new environment in your home directory, type the following command (where “py37” is simply whatever you would like to name the envioronment, and the “python3.7” is the version of python you’d like to install):
conda create --name py37 python=3.7
You will only need to complete this step once. By default, the environment will be installed in your ~/.conda/envs
folder, which is accessible to all nodes in the cluster.
Note: although you will only need to initialize your shell once, the step source ~/.bashrc
must be completed each session.
Activating / Using the Python Environment
Before you can activate your conda environment, you must make sure the miniconda module is loaded and that you have sourced your .bashrc
file. These two commands need to be run once per login:
module load miniconda
source ~/.bashrc
Then, to enter the virtual environment, use the following command (substituting py37
with whatever you chose to name your environment in the previous section):
conda activate py37
If the environment has been activated successfully, you should now see your command line prompt prefaced with (py37)
, indicating that you have entered the environment.
To see the list of packages and versions installed in your active environment, enter:
conda list
You can then install any additional needed packages, such as “numpy”, with:
conda install numpy
You can continue to install packages at any time. Just note that you must always run the conda install
command from within the virtual environment.
To exit the virtual environment, type conda deactivate
at the command line. You can also close the environment simply by logging off the cluster.
Note: during any of these steps, you might come across the following errors:
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
which is thrown in case you forget to source ~/.bashrc
. There is no need to run conda init bash
again; simply running source ~/.bashrc
will fix the issue.
Another common error is:
conda: command not found
in which case you probably forgot to load the conda module! Enter module load miniconda
and then try again to activate the environment.
Submitting a Python Script to Demonstrate Parallel Processing (within a single node)
Below is an example of a python script (gridsearch.py
) that can be run on the cluster:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import time
time_start = time.time()
# Read in data. (Since this is an example, we will instead create random data):
n, m = 1000, 10
train_x = np.random.rand(n,m)
train_y = np.ones((n,))
train_y[0:n//2] = 0
np.random.shuffle(train_y)
print('Train=', train_x.shape)
print('Train=', train_y.shape)
# Do cross validation for a SVM model:
# Choose the parameter space to explore:
kernels = ['rbf', 'poly', 'linear']
Cs = np.arange(0.001, 10, 0.5)
gammas = [0.5, 0.75, 1.0, 1.25, 1.5]
# Run gridsearch - Note the n_jobs=-1 option allows sklearn gridsearch
# to use as many processors in parallel as are available
gridsearch = GridSearchCV(SVC(degree=3), cv=5, \
param_grid={"kernel": kernels, \
"C": Cs, \
"gamma": gammas}, \
scoring='accuracy', refit=True, n_jobs=-1)
gridsearch.fit(train_x, train_y)
# Print output (will be saved into the output file specified in your sbatch script)
print("GridSearchCV Out of Sample Error (accuracy) for each model:")
for mean, params in zip(gridsearch.cv_results_['mean_test_score'], gridsearch.cv_results_['params']):
print("%0.6f %r" % (mean, params))
print()
print("Best model found during grid search:", gridsearch.best_estimator_, "(Accuracy =",gridsearch.best_score_,")")
# Print run-time info:
time_end = time.time()
print('Computation time: '+str(round(time_end-time_start,2))+' seconds.')
To run this script, submit either an interactive job or sbatch script to slurm. Be sure to activate you virtual enviornment before executing the python command. Interactive job example:
srun --time=0:10:00 --mem-per-cpu=4G --cpus-per-task=4 --pty bash
module load miniconda
conda activate py37
python ~/path/to/gridsearch.py
Sbatch job example:
sbatch run_gridsearch.sh
where the file run_gridsearch.sh
reads:
#!/bin/bash
#SBATCH --time=0:10:00
#SBATCH --mem-per-cpu=4G
#SBATCH --cpus-per-task=4
# Activate conda environment:
module load miniconda
eval "$(conda shell.bash hook)"
conda activate py37
# Run the script (edit the path below to the location of the gridsearch.py script):
python /path/to/gridsearch.py
where the line eval "$(conda shell.bash hook)"
initializes the shell to use conda. Forgetting to add this line will lead to the same error CommandNotFoundError
mentioned above, asking you to run conda init bash
.
Utilizing Batch Scripts to Submit Multiple Jobs in Parallel
While the above example demonstrates how to launch a single python job on a compute node, you will receive the most benefit by running multiple jobs in parallel. We can adapt the above example so that we are submitting multiple batch scripts, each exploring a different parameter space.
EXAMPLE SCRIPT COMING SOON!
More Examples
To take advantage of parallel computing (within a single compute node), you can use the “Parallel” function from the python “joblib” package. An example of such a python script (example.py
) is included below:
from joblib import Parallel, delayed
import numpy as np
import time
import sys
# inputs
number_of_simulations = 10 #100000
time_length = 1000
num_cores = int(sys.argv[1]) # Must input number of cores
# function to produce a single random walk of time length T
def random_noise(T):
data = []
for t in range(T):
data.append(np.random.normal(0,1))
return np.array(data)
# Parallel compute each random walk
time_start = time.time()
output = Parallel(n_jobs=num_cores)(delayed(random_noise)(time_length) for n in range(number_of_simulations))
output_size = np.matrix(output).shape
time_end = time.time()
# Print information
print('Your data size is '+str(output_size)+' with '+str(output_size[0])+' random noise time-series with time length '+str(output_size[1])+'.')
print('Computation time: '+str(round(time_end-time_start,2))+' seconds using '+str(num_cores)+' cores.')
Credit to University of California Merced Research Computing Facility for this sample script, which is available at: http://hpcwiki.ucmerced.edu/knowledgebase/writing-slurm-job-scripts/
The python example above utilizes the Parallel function from joblib. However this parallelization is limited to a “single-computer” and doesn’t take advantage of the full capabilities of the cluster. To write jobs that can be run on multiple nodes, we have to use something called Message Passing Interface (MPI). Documentation on this is still in the works, and will hopefully be available soon!