123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196 |
- r"""
- ``torch.distributed.launch`` is a module that spawns up multiple distributed
- training processes on each of the training nodes.
- .. warning::
- This module is going to be deprecated in favor of :ref:`torchrun <launcher-api>`.
- The utility can be used for single-node distributed training, in which one or
- more processes per node will be spawned. The utility can be used for either
- CPU training or GPU training. If the utility is used for GPU training,
- each distributed process will be operating on a single GPU. This can achieve
- well-improved single-node training performance. It can also be used in
- multi-node distributed training, by spawning up multiple processes on each node
- for well-improved multi-node distributed training performance as well.
- This will especially be benefitial for systems with multiple Infiniband
- interfaces that have direct-GPU support, since all of them can be utilized for
- aggregated communication bandwidth.
- In both cases of single-node distributed training or multi-node distributed
- training, this utility will launch the given number of processes per node
- (``--nproc-per-node``). If used for GPU training, this number needs to be less
- or equal to the number of GPUs on the current system (``nproc_per_node``),
- and each process will be operating on a single GPU from *GPU 0 to
- GPU (nproc_per_node - 1)*.
- **How to use this module:**
- 1. Single-Node multi-process distributed training
- ::
- python -m torch.distributed.launch --nproc-per-node=NUM_GPUS_YOU_HAVE
- YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
- arguments of your training script)
- 2. Multi-Node multi-process distributed training: (e.g. two nodes)
- Node 1: *(IP: 192.168.1.1, and has a free port: 1234)*
- ::
- python -m torch.distributed.launch --nproc-per-node=NUM_GPUS_YOU_HAVE
- --nnodes=2 --node-rank=0 --master-addr="192.168.1.1"
- --master-port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
- and all other arguments of your training script)
- Node 2:
- ::
- python -m torch.distributed.launch --nproc-per-node=NUM_GPUS_YOU_HAVE
- --nnodes=2 --node-rank=1 --master-addr="192.168.1.1"
- --master-port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
- and all other arguments of your training script)
- 3. To look up what optional arguments this module offers:
- ::
- python -m torch.distributed.launch --help
- **Important Notices:**
- 1. This utility and multi-process distributed (single-node or
- multi-node) GPU training currently only achieves the best performance using
- the NCCL distributed backend. Thus NCCL backend is the recommended backend to
- use for GPU training.
- 2. In your training program, you must parse the command-line argument:
- ``--local-rank=LOCAL_PROCESS_RANK``, which will be provided by this module.
- If your training program uses GPUs, you should ensure that your code only
- runs on the GPU device of LOCAL_PROCESS_RANK. This can be done by:
- Parsing the local_rank argument
- ::
- >>> # xdoctest: +SKIP
- >>> import argparse
- >>> parser = argparse.ArgumentParser()
- >>> parser.add_argument("--local-rank", type=int)
- >>> args = parser.parse_args()
- Set your device to local rank using either
- ::
- >>> torch.cuda.set_device(args.local_rank) # before your code runs
- or
- ::
- >>> with torch.cuda.device(args.local_rank):
- >>> # your code to run
- >>> ...
- 3. In your training program, you are supposed to call the following function
- at the beginning to start the distributed backend. It is strongly recommended
- that ``init_method=env://``. Other init methods (e.g. ``tcp://``) may work,
- but ``env://`` is the one that is officially supported by this module.
- ::
- >>> torch.distributed.init_process_group(backend='YOUR BACKEND',
- >>> init_method='env://')
- 4. In your training program, you can either use regular distributed functions
- or use :func:`torch.nn.parallel.DistributedDataParallel` module. If your
- training program uses GPUs for training and you would like to use
- :func:`torch.nn.parallel.DistributedDataParallel` module,
- here is how to configure it.
- ::
- >>> model = torch.nn.parallel.DistributedDataParallel(model,
- >>> device_ids=[args.local_rank],
- >>> output_device=args.local_rank)
- Please ensure that ``device_ids`` argument is set to be the only GPU device id
- that your code will be operating on. This is generally the local rank of the
- process. In other words, the ``device_ids`` needs to be ``[args.local_rank]``,
- and ``output_device`` needs to be ``args.local_rank`` in order to use this
- utility
- 5. Another way to pass ``local_rank`` to the subprocesses via environment variable
- ``LOCAL_RANK``. This behavior is enabled when you launch the script with
- ``--use-env=True``. You must adjust the subprocess example above to replace
- ``args.local_rank`` with ``os.environ['LOCAL_RANK']``; the launcher
- will not pass ``--local-rank`` when you specify this flag.
- .. warning::
- ``local_rank`` is NOT globally unique: it is only unique per process
- on a machine. Thus, don't use it to decide if you should, e.g.,
- write to a networked filesystem. See
- https://github.com/pytorch/pytorch/issues/12042 for an example of
- how things can go wrong if you don't do this correctly.
- """
- import logging
- import warnings
- from torch.distributed.run import get_args_parser, run
- logger = logging.getLogger(__name__)
- def parse_args(args):
- parser = get_args_parser()
- parser.add_argument(
- "--use-env",
- "--use_env",
- default=False,
- action="store_true",
- help="Use environment variable to pass "
- "'local rank'. For legacy reasons, the default value is False. "
- "If set to True, the script will not pass "
- "--local-rank as argument, and will instead set LOCAL_RANK.",
- )
- return parser.parse_args(args)
- def launch(args):
- if args.no_python and not args.use_env:
- raise ValueError(
- "When using the '--no-python' flag,"
- " you must also set the '--use-env' flag."
- )
- run(args)
- def main(args=None):
- warnings.warn(
- "The module torch.distributed.launch is deprecated\n"
- "and will be removed in future. Use torchrun.\n"
- "Note that --use-env is set by default in torchrun.\n"
- "If your script expects `--local-rank` argument to be set, please\n"
- "change it to read from `os.environ['LOCAL_RANK']` instead. See \n"
- "https://pytorch.org/docs/stable/distributed.html#launch-utility for \n"
- "further instructions\n",
- FutureWarning,
- )
- args = parse_args(args)
- launch(args)
- if __name__ == "__main__":
- main()
|