Getting Organized

Automating Everything

WorkFlow¶

Coding¶

Prototype: jupyter-lab
Packaging: vscode

Big Jobs¶

Setup Codes on My machine
Prototype pipeline with jupyter-lab + vscode
Setup organization on cal1 server.
Prototype job scripts on slurm
Sync files across cal1 <---> jean-zay and/or gricad
Run mega-jobs on slurm
Sync files across cal1 <---> jean-zay and/or gricad
Visualize, Play, Host Results on cal1 server

File Structure¶

data/project
logs/project
project/
config/
credentials/

project: this is where all of your source code lives. It should be an environment under version control (e.g. git) so that you can track all of the changes.

data/project: this should contain all of your data. It can be a symbolic link, or a mounted drive. It’s important that all of the subsequent files within the data-drive have the same file structure across

Syncing Files Across Servers¶

`Projects`¶

You should be using git and github. This is the best way to make sure all changes are being captured and you have the entire history.

# add files to be commited
git add file1 file2

# create a commit message
git commit -m "commit message"

# push to remote server
git push origin master

# pull from remote server
git pull origin master

PreProcessing¶

salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=16 --account=cli@cpu

`Data`¶

`cal1` <---> `gricad`¶

Big Data Transfer

rsync -avxH /path/to/project/data login@scp.univ-grenoble-alpes.fr:/path/to/project/data

Light Data Transfer

rsync -avxH /path/to/project/data login@cargo.univ-grenoble-alpes.fr:/path/to/project/data

`cal1` <---> `jean-zay`¶

`Logs`¶

Light Data Transfer

RSYNC¶

rsync -avxH jean_zay:/gpfswork/rech/cli/uvo53rl/test_logs/wandb/ /mnt/meom/workdir/johnsonj/test_logs/wandb/

Functions¶

function pull_wandb_changes(){
    rsync -avxH jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/ /mnt/meom/workdir/johnsonj/logs/wandb/
}
function push_wandb_changes(){
    rsync -avxH /mnt/meom/workdir/johnsonj/logs/wandb/ jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/
}
function sync_wandb_changes(){
    wandb sync
}

# sync offline runs
wandb sync --include-offline /mnt/meom/workdir/johnsonj/logs/wandb/offline-*

wandb sync --include-offline /mnt/meom/workdir/johnsonj/logs/wandb/offline-run-20220601_065448-2m11j69u

# make directory for wandb logs
if [ ! -d mkdir /gpfsscratch/rech/cli/uvo53rl/logs ]; then
  mkdir -p mkdir /gpfsscratch/rech/cli/uvo53rl/logs;
fi

# make directory for wandb logs
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/wandb ]; then
  mkdir -p /gpfsscratch/rech/cli/uvo53rl/wandb;
fi

# make directory for wandb logs
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/errs ]; then
  mkdir -p /gpfsscratch/rech/cli/uvo53rl/errs;
fi

# make directory for wandb logs
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/jobs ]; then
  mkdir -p /gpfsscratch/rech/cli/uvo53rl/jobs;
fi

# make dot files
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/.conda ]; then
  mkdir -p /gpfsscratch/rech/cli/uvo53rl/.conda &&
  conda create --prefix=/gpfsscratch/rech/cli/uvo53rl/.conda/envs/jaxtf_gpu_py39 --clone jax_gpu_py39 &&
  conda create --prefix=/gpfsscratch/rech/cli/uvo53rl/.conda/envs/jaxtftorch_gpu_py39 --clone jax_gpu_py39;
fi
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/.cache ]; then
  mkdir -p /gpfsscratch/rech/cli/uvo53rl/.cache;
fi

SCP¶

A lot of times you’ll get coworkers who can’t access or they don’t use (or don’t want to learn) how to use the server effectively. So they might ask you to help them transfer some files. One way to do it is to use the scp package. The command I use is below.

Forward Transfer

scp -r test jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/

Inverse Transfer

scp -r jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/test ./

Other Resources:

Organization¶

Create a project directory - where our code is living
Create a bin directory - where we put all of our executables
Create $WORKDIR -
Create $LOGSDIR
Create necessary files (logs, jobs, errs)

Example

# ===================
# Custom directories
# ===================
# work directory
export WORKDIR=/mnt/meom/workdir/johnsonj
# log directory
export LOGDIR=$WORKDIR/logs

Step 1: Ensure $WORKDIR is set.

Check if it exists in the environments.

printenv WORKDIR

Make sure to add it to the .bashrc or .profile.

# add this to the .profile
export WORKDIR=/mnt/meom/workdir/username:$WORKDIR

Check again if it exists.

# check if exists (it should now)
printenv WORKDIR

Step 2: Ensure $LOGSDIR is set.

Check if it exists in the environments.

printenv LOGDIR

Make sure to add it to the .bashrc or .profile.

# add this to the .profile
export LOGDIR=$WORKDIR/logs

Check again if it exists.

# check if exists (it should now)
printenv LOGDIR

Step 3: Create necessary directories

This is so that we can save logs, errors and job configurations. This will be helpful for automating things later. I like to have these available:

$LOGDIR/logs
$LOGDIR/jobs
$LOGDIR/errs

logs is a subdirectory within logs which will hold all of the slurm log files.
errs - a subdirectory which will hold all of the slurm error log files.
jobs - a subdirectory which will hold all of the current job configurations.

Space¶

Check Project Space

# summary
idr_quota_user
# more detail
idr_quota_project

Check Home Space

# summary
idrquota -m -t Gio
# more detail
du -h --max-depth=1 $HOME

Check Work Space

# summary
idrquota -w -t Gio
# more detail
du -h --max-depth=1 $WORK

Symbolic Links¶

We need to move everything to the other drive. Otherwise, we run out of disk space really quickly...

workdir
.cache
.local
.ipython
.keras

Remote Computing

Jupyter Lab 4 Remote Servers

Software Reviews

Numpyro Overview

WorkFlow¶

Coding¶

Big Jobs¶

File Structure¶

Syncing Files Across Servers¶

Projects¶

PreProcessing¶

Data¶

cal1 <---> gricad¶

cal1 <---> jean-zay¶

Logs¶

RSYNC¶

Functions¶

SCP¶

Organization¶

Space¶

Symbolic Links¶

`Projects`¶

`Data`¶

`cal1` <---> `gricad`¶

`cal1` <---> `jean-zay`¶

`Logs`¶