Get featured on INDIAai

Contribute your expertise or opinions and become part of the ecosystem!

Facebook AI has recently released Submitit to help other researchers run their experiments on a Slurm cluster. Slurm is an open-source, highly scalable job-scheduling system for clusters. This is commonly used in both industry and academia. 

Facebook AI Research (FAIR), use a Slurm-administrated cluster with thousands of GPUs on which the researchers train neural networks.

Submitit can simplify the task of scheduling an experiment on the cluster and collecting the results, logs, and so on and allows researchers to easily switch from small-scale experimentation on their machine to large-scale experiments on the cluster. 

With Submitt, researchers can work in a language they are familiar with, and more easily analyze the results and schedule more experiments. The open-source version of Submitit will enable researchers to more easily release and share open-source code. If no cluster is found, Submitit will automatically fall back to run experiments locally, which allows third parties to clone the open source code of a FAIR paper and start running small experiments immediately. 

Submitit is directly integrated into several of Facebook AI’s open-source Python projects, including Nevergrad and Hydra. Nevergrad is a derivative-free optimization platform that can be used to optimize hyperparameters of neural network training. The main method of Nevergrad optimization can take an optional Executor parameter, so that the optimization can run in parallel locally or on a cluster using concurrent.futures, submitit, or dask.distributed.

Hydra framework is for configuring complex applications. This framework supports sweeping on the application parameters and the sweeps can now run on Slurm. 

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE