Reference
Julia clusters
Distributed.addprocs — Functionaddprocs(template, ninstances[; kwargs...])Add Azure scale set instances where template is either a dictionary produced via the AzManagers.build_sstemplate method or a string corresponding to a template stored in ~/.azmanagers/templates_scaleset.json.
key word arguments:
subscriptionid=template["subscriptionid"]if exists, orAzManagers._manifest["subscriptionid"]otherwise.resourcegroup=template["resourcegroup"]if exists, orAzManagers._manifest["resourcegroup"]otherwise.sigimagename=""The name of the SIG image[1].sigimageversion=""The version of thesigimagename[1].imagename=""The name of the image (alternative tosigimagenameandsigimageversionused for development work).osdisksize=60The size of the OS disk in GB.customenv=falseIf true, then send the current project environment to the workers where it will be instantiated.session=AzSession(;lazy=true)The Azure session used for authentication.group="cbox"The name of the Azure scale set. If the scale set does not yet exist, it will be created.overprovision=trueUse Azure scle-set overprovisioning?ppi=1The number of Julia processes to start per Azure scale set instance.julia_num_threads="$(Threads.nthreads(),$(Threads.nthreads(:interactive))"set the number of julia threads for the detached process.[2]omp_num_threads=get(ENV, "OMP_NUM_THREADS", 1)set the number of OpenMP threads to run on each workerexename="$(Sys.BINDIR)/julia"name of the julia executable.exeflags=""set additional command line start-up flags for Julia workers. For example,--heap-size-hint=1G.env=Dict()each dictionary entry is an environment variable set on the worker before Julia starts. e.g.env=Dict("OMP_PROC_BIND"=>"close")nretry=20Number of retries for HTTP REST calls to Azure services.verbose=0verbose flag used in HTTP requests.save_cloud_init_failures=falseset to true to copy cloud init logs (/var/log/clout-init-output.log) from workers that fail to join the cluster.show_quota=falseafter various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.user=AzManagers._manifest["ssh_user"]ssh user.spot=falseuse Azure SPOT VMs for the scale-setmaxprice=-1set maximum price per hour for a VM in the scale-set.-1uses the market price.spot_base_regular_priority_count=0If spot is true, only start adding spot machines once there are this many non-spot machines added.spot_regular_percentage_above_baseIf spot is true, then when ading new machines (abovespot_base_reqular_priority_count) use regular (non-spot) priority for this percent of new machines.waitfor=falsewait for the cluster to be provisioned before returning, or return control to the caller immediately[3]mpi_ranks_per_worker=0set the number of MPI ranks per Julia worker[4]mpi_flags="-bind-to core:$(ENV["OMP_NUM_THREADS"]) -map-by numa"extra flags to pass to mpirun (has effect whenmpi_ranks_per_worker>0)nvidia_enable_ecc=trueon NVIDIA machines, ensure that ECC is set totrueorfalsefor all GPUs[5]nvidia_enable_mig=falseon NVIDIA machines, ensure that MIG is set totrueorfalsefor all GPUs[5]hyperthreading=nothingTurn on/off hyperthreading on supported machine sizes. The default uses the setting in the template. To override the template setting, usetrue(on) orfalse(off).use_lvm=falseFor SKUs that have 1 or more nvme disks, combines all disks as a single mount point /scratch vs /scratch, /scratch1, /scratch2, etc..
Notes
[1] If addprocs is called from an Azure VM, then the default imagename,imageversion are the image/version the VM was built with; otherwise, it is the latest version of the image specified in the scale-set template. [2] Interactive threads are supported beginning in version 1.9 of Julia. For earlier versions, the default for julia_num_threads is Threads.nthreads(). [3] waitfor=false reflects the fact that the cluster manager is dynamic. After the call to addprocs returns, use workers() to monitor the size of the cluster. [4] This is inteneded for use with Devito. In particular, it allows Devito to gain performance by using MPI to do domain decomposition using MPI within a single VM. If mpi_ranks_per_worker=0, then MPI is not used on the Julia workers. This feature makes use of package extensions, meaning that you need to ensure that using MPI is somewhere in your calling script. [5] This may result in a re-boot of the VMs
AzManagers.preempted — Functionispreempted,notbefore = preempted([id=myid()|id="instanceid"])Check to see if the machine id::Int has received an Azure spot preempt message. Returns (true, notbefore) if a preempt message is received and (false,"") otherwise. notbefore is the date/time before which the machine is guaranteed to still exist.
Detached service
AzManagers.addproc — Functionaddproc(template[; name="", basename="cbox", subscriptionid="myid", resourcegroup="mygroup", nretry=10, verbose=0, session=AzSession(;lazy=true), sigimagename="", sigimageversion="", imagename="", detachedservice=true])Create a VM, and returns a named tuple (name,ip,resourcegrup,subscriptionid) where name is the name of the VM, and ip is the ip address of the VM. resourcegroup and subscriptionid denote where the VM resides on Azure.
Parameters
name=""name for the VM. If it is not an empty string, then the next paramter (basename) is ignoredbasename="cbox"base name for the VM, we append a random suffix to ensure uniquenesssubscriptionid=template["subscriptionid"]if exists, orAzManagers._manifest["subscriptionid"]otherwise.resourcegroup=template["resourcegroup"]if exists, orAzManagers._manifest["resourcegroup"]otherwise.session=AzSession(;lazy=true)Session used for OAuth2 authenticationsigimagename=""Azure shared image gallery image to use for the VM (defaults to the template's image)sigimageversion=""Azure shared image gallery image version to use for the VM (defaults to latest)imagename=""Azure image name used as an alternative tosigimagenameandsigimageversion(used for development work)osdisksize=60Disk size of the OS disk in GBcustomenv=falseIf true, then send the current project environment to the workers where it will be instantiated.nretry=10Max retries for re-tryable REST call failuresverbose=0Verbosity flag passes to HTTP.jl methodsshow_quota=falseafter various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.julia_num_threads="$(Threads.nthreads(),$(Threads.nthreads(:interactive))"set the number of julia threads for the workers.[1]omp_num_threads = get(ENV, "OMP_NUM_THREADS", 1)setOMP_NUM_THREADSenvironment variable before starting the detached processexename="$(Sys.BINDIR)/julia"name of the julia executable.env=Dict()Dictionary of environemnt variables that will be exported before starting the detached processdetachedservice=truestart the detached service allowing for RESTful remote code executionuse_lvm=falseFor SKUs that have 1 or more nvme disks, combines all disks as a single mount point /scratch vs /scratch, /scratch1, /scratch2, etc..
Notes
[1] Interactive threads are supported beginning in version 1.9 of Julia. For earlier versions, the default for julia_num_threads is Threads.nthreads().
AzManagers.@detachat — Macro@detachat myvm begin ... endRun code on an Azure VM.
Example
using AzManagers
myvm = addproc("myvm")
job = @detachat myvm begin
@info "I'm running detached"
end
read(job)
wait(job)
rmproc(myvm)AzManagers.variablebundle — Functionvariablebundle(:key)Retrieve a variable from a variable bundle. See variablebundle! for more information.
AzManagers.variablebundle! — Functionvariablebundle!(;kwargs...)Define variables that will be passed to a detached job.
Example
using AzManagers
variablebundle(;x=1)
myvm = addproc("myvm")
myjob = @detachat myvm begin
write(stdout, "my variable is $(variablebundle(:x))
")
end
wait(myjob)
read(myjob)Base.read — Functionread(job[;stdio=stdout])returns the stdout from a detached job.
AzManagers.rmproc — Functionrmproc(vm[; session=AzSession(;lazy=true), verbose=0, nretry=10])Delete the VM that was created using the addproc method.
Parameters
session=AzSession(;lazy=true)Azure session for OAuth2 authenticationverbose=0verbosity flag passed to HTTP.jl methodsnretry=10max number of retries for retryable REST callsshow_quota=falseafter various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.
AzManagers.status — Functionstatus(job)returns the status of a detached job.
Base.wait — Functionwait(job[;stdio=stdout])blocks until the detached job, job, is complete.
Configuration
AzManagers.build_nictemplate — FunctionAzManagers.build_nictemplate(nic_name; kwargs...)Returns a dictionary for a NIC template, and that can be passed to the addproc method, or written to AzManagers.jl configuration files.
Required keyword arguments
subscriptionidAzure subscriptionresourcegroup_vnetAzure resource group that holds the virtual network that the NIC is attaching to.vnetAzure virtual network for the NIC to attach to.subnetAzure sub-network name.locationlocation of the Azure data center where the NIC correspond to.
Optional keyword arguments
accelerated=trueuse accelerated networking (not all VM sizes support accelerated networking).
AzManagers.build_sstemplate — FunctionAzManagers.build_sstemplate(name; kwargs...)returns a dictionary that is an Azure scaleset template for use in addprocs or for saving to the ~/.azmanagers folder.
required key-word arguments
subscriptionidAzure subscriptionadmin_usernamessh user for the scaleset virtual machineslocationAzure data-center locationresourcegroupAzure resource-groupimagegalleryAzure image gallery that contains the VM imageimagenameAzure imagevnetAzure virtual network for the scalesetsubnetAzure virtual subnet for the scalesetskunameAzure VM type
optional key-word arguments
subscriptionid_imageAzure subscription corresponding to the image gallery, defaults tosubscriptionidresourcegroup_vnetAzure resource group corresponding to the virtual network, defaults toresourcegroupresourcegroup_imageAzure resource group correcsponding to the image gallery, defaults toresourcegrouposdisksize=60Disk size in GB for the operating system diskskutier = "Standard"Azure SKU tier.datadisks=[]list of data disks to create and attach [1]tempdisk = "sudo mkdir -m 777 /mnt/scratch; ln -s /mnt/scratch /scratch"cloud-init commands used to mount or link to temporary disktags = Dict("azure_tag_name" => "some_tag_value")Optional tags argument for resourceencryption_at_host = falseOptional argument for enabling encryption at host
Notes
[1] Each datadisk is a Dictionary. For example,
Dict("createOption"=>"Empty", "diskSizeGB"=>1023, "managedDisk"=>Dict("storageAccountType"=>"PremiumSSD_LRS"))or, to accept the defaults,
Dict("diskSizeGB"=>1023)The above example is populated with the default options. So, if datadisks=[Dict()], then the default options will be included.
AzManagers.build_vmtemplate — FunctionAzManagers.build_vmtemplate(vm_name; kwargs...)Returns a dictionary for a virtual machine template, and that can be passed to the addproc method or written to AzManagers.jl configuration files.
Required keyword arguments
subscriptionidAzure subscriptionadmin_usernamessh user for the scaleset virtual machineslocationAzure data center locationresourcegroupAzure resource group where the VM will resideimagegalleryAzure shared image gallery nameimagenameAzure image name that is in the shared image galleryvmsizeAzure vm type, e.g. "StandardD8sv3"
Optional keyword arguments
resourcegroup_vnetAzure resource group containing the virtual network, defaults toresourcegroupsubscriptionid_imageAzure subscription containing the image gallery, defaults tosubscriptionidresourcegroup_imageAzure resource group containing the image gallery, defaults tosubscriptionidnicname = "cbox-nic"Name of the NIC for this VMosdisksize = 60size in GB of the OS diskdatadisks=[]additional data disks to attach- `tempdisk = "sudo mkdir -m 777 /mnt/scratch
ln -s /mnt/scratch /scratch"` cloud-init commands used to mount or link to temporary disk
tags = Dict("azure_tag_name" => "some_tag_value")Optional tags argument for resourceencryption_at_host = falseOptional argument for enabling encryption at hostdefault_nic = ""Optional argument for inserting "default_nic" as a key
Notes
[1] Each datadisk is a Dictionary. For example,
Dict("createOption"=>"Empty", "diskSizeGB"=>1023, "managedDisk"=>Dict("storageAccountType"=>"PremiumSSD_LRS"))The above example is populated with the default options. So, if datadisks=[Dict()], then the default options will be included.
AzManagers.write_manifest — FunctionAzManagers.write_manifest(;resourcegroup="", subscriptionid="", ssh_user="", ssh_public_key_file="~/.ssh/azmanagers_rsa.pub", ssh_private_key_file="~/.ssh/azmanagers_rsa")Write an AzManagers manifest file (~/.azmanagers/manifest.json). The manifest file contains information specific to your Azure account.
AzManagers.save_template_nic — FunctionAzManagers.save_template_nic(nic_name, template)Save template::Dict generated by AzManagers.buildnictmplate to /home/runner/.azmanagers/templatesnic.json.
AzManagers.save_template_scaleset — FunctionAzManagers.save_template_scaleset(scalesetname, template)Save template::Dict generated by AzManagers.buildsstemplate to /home/runner/.azmanagers/templatesscaleset.json.
AzManagers.save_template_vm — FunctionAzManagers.save_template_vm(vm_name, template)Save template::Dict generated by AzManagers.buildvmtmplate to /home/runner/.azmanagers/templatesvm.json.