Automating Everything
WorkFlow¶
Coding¶
Prototype:
jupyter-labPackaging:
vscode
Big Jobs¶
Setup Codes on My machine
Prototype pipeline with
jupyter-lab+vscodeSetup organization on
cal1server.Prototype job scripts on
slurmSync files across
cal1<--->jean-zayand/orgricadRun mega-jobs on
slurmSync files across
cal1<--->jean-zayand/orgricadVisualize, Play, Host Results on
cal1server
File Structure¶
data/project
logs/project
project/
config/
credentials/project: this is where all of your source code lives. It should be an environment under version control (e.g. git) so that you can track all of the changes.
data/project: this should contain all of your data. It can be a symbolic link, or a mounted drive. It’s important that all of the subsequent files within the data-drive have the same file structure across
Syncing Files Across Servers¶
Projects¶
You should be using git and github. This is the best way to make sure all changes are being captured and you have the entire history.
# add files to be commited
git add file1 file2
# create a commit message
git commit -m "commit message"
# push to remote server
git push origin master
# pull from remote server
git pull origin masterPreProcessing¶
salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=16 --account=cli@cpu Data¶
cal1 <---> gricad¶
Big Data Transfer
rsync -avxH /path/to/project/data login@scp.univ-grenoble-alpes.fr:/path/to/project/dataLight Data Transfer
rsync -avxH /path/to/project/data login@cargo.univ-grenoble-alpes.fr:/path/to/project/datacal1 <---> jean-zay¶
Logs¶
Light Data Transfer
RSYNC¶
rsync -avxH jean_zay:/gpfswork/rech/cli/uvo53rl/test_logs/wandb/ /mnt/meom/workdir/johnsonj/test_logs/wandb/Functions¶
function pull_wandb_changes(){
rsync -avxH jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/ /mnt/meom/workdir/johnsonj/logs/wandb/
}
function push_wandb_changes(){
rsync -avxH /mnt/meom/workdir/johnsonj/logs/wandb/ jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/
}
function sync_wandb_changes(){
wandb sync
}# sync offline runs
wandb sync --include-offline /mnt/meom/workdir/johnsonj/logs/wandb/offline-*wandb sync --include-offline /mnt/meom/workdir/johnsonj/logs/wandb/offline-run-20220601_065448-2m11j69u# make directory for wandb logs
if [ ! -d mkdir /gpfsscratch/rech/cli/uvo53rl/logs ]; then
mkdir -p mkdir /gpfsscratch/rech/cli/uvo53rl/logs;
fi
# make directory for wandb logs
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/wandb ]; then
mkdir -p /gpfsscratch/rech/cli/uvo53rl/wandb;
fi
# make directory for wandb logs
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/errs ]; then
mkdir -p /gpfsscratch/rech/cli/uvo53rl/errs;
fi
# make directory for wandb logs
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/jobs ]; then
mkdir -p /gpfsscratch/rech/cli/uvo53rl/jobs;
fi
# make dot files
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/.conda ]; then
mkdir -p /gpfsscratch/rech/cli/uvo53rl/.conda &&
conda create --prefix=/gpfsscratch/rech/cli/uvo53rl/.conda/envs/jaxtf_gpu_py39 --clone jax_gpu_py39 &&
conda create --prefix=/gpfsscratch/rech/cli/uvo53rl/.conda/envs/jaxtftorch_gpu_py39 --clone jax_gpu_py39;
fi
if [ ! -d /gpfsscratch/rech/cli/uvo53rl/.cache ]; then
mkdir -p /gpfsscratch/rech/cli/uvo53rl/.cache;
fiSCP¶
A lot of times you’ll get coworkers who can’t access or they don’t use (or don’t want to learn) how to use the server effectively. So they might ask you to help them transfer some files. One way to do it is to use the scp package. The command I use is below.
Forward Transfer
scp -r test jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/Inverse Transfer
scp -r jean_zay:/gpfswork/rech/cli/uvo53rl/logs/wandb/test ./ Other Resources:
Organization¶
Create a
projectdirectory - where our code is livingCreate a
bindirectory - where we put all of our executablesCreate
$WORKDIR-Create
$LOGSDIRCreate necessary files (
logs,jobs,errs)
Example
# ===================
# Custom directories
# ===================
# work directory
export WORKDIR=/mnt/meom/workdir/johnsonj
# log directory
export LOGDIR=$WORKDIR/logsStep 1: Ensure $WORKDIR is set.
Check if it exists in the environments.
printenv WORKDIRMake sure to add it to the .bashrc or .profile.
# add this to the .profile
export WORKDIR=/mnt/meom/workdir/username:$WORKDIRCheck again if it exists.
# check if exists (it should now)
printenv WORKDIRStep 2: Ensure $LOGSDIR is set.
Check if it exists in the environments.
printenv LOGDIRMake sure to add it to the .bashrc or .profile.
# add this to the .profile
export LOGDIR=$WORKDIR/logsCheck again if it exists.
# check if exists (it should now)
printenv LOGDIRStep 3: Create necessary directories
This is so that we can save logs, errors and job configurations. This will be helpful for automating things later. I like to have these available:
$LOGDIR/logs
$LOGDIR/jobs
$LOGDIR/errslogsis a subdirectory within logs which will hold all of the slurm log files.errs- a subdirectory which will hold all of the slurm error log files.jobs- a subdirectory which will hold all of the current job configurations.
Space¶
Check Project Space
# summary
idr_quota_user
# more detail
idr_quota_project Check Home Space
# summary
idrquota -m -t Gio
# more detail
du -h --max-depth=1 $HOMECheck Work Space
# summary
idrquota -w -t Gio
# more detail
du -h --max-depth=1 $WORKSymbolic Links¶
We need to move everything to the other drive. Otherwise, we run out of disk space really quickly...
workdir.cache.local.ipython.keras