  1. .. _transforms:
  2. Transforming and augmenting images
  3. ==================================
  4. .. currentmodule:: torchvision.transforms
  5. Torchvision supports common computer vision transformations in the
  6. ``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
  7. can be used to transform or augment data for training or inference of different
  8. tasks (image classification, detection, segmentation, video classification).
  9. .. code:: python
  10. # Image Classification
  11. import torch
  12. from torchvision.transforms import v2
  13. H, W = 32, 32
  14. img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
  15. transforms = v2.Compose([
  16. v2.RandomResizedCrop(size=(224, 224), antialias=True),
  17. v2.RandomHorizontalFlip(p=0.5),
  18. v2.ToDtype(torch.float32, scale=True),
  19. v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  20. ])
  21. img = transforms(img)
  22. .. code:: python
  23. # Detection (re-using imports and transforms from above)
  24. from torchvision import tv_tensors
  25. img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
  26. boxes = torch.randint(0, H // 2, size=(3, 4))
  27. boxes[:, 2:] += boxes[:, :2]
  28. boxes = tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H, W))
  29. # The same transforms can be used!
  30. img, boxes = transforms(img, boxes)
  31. # And you can pass arbitrary input structures
  32. output_dict = transforms({"image": img, "boxes": boxes})
  33. Transforms are typically passed as the ``transform`` or ``transforms`` argument
  34. to the :ref:`Datasets <datasets>`.
  35. Start here
  36. ----------
  37. Whether you're new to Torchvision transforms, or you're already experienced with
  38. them, we encourage you to start with
  39. :ref:`` in
  40. order to learn more about what can be done with the new v2 transforms.
  41. Then, browse the sections in below this page for general information and
  42. performance tips. The available transforms and functionals are listed in the
  43. :ref:`API reference <v2_api_ref>`.
  44. More information and tutorials can also be found in our :ref:`example gallery
  45. <gallery>`, e.g. :ref:``
  46. or :ref:``.
  47. .. _conventions:
  48. Supported input types and conventions
  49. -------------------------------------
  50. Most transformations accept both `PIL <>`_ images
  51. and tensor inputs. Both CPU and CUDA tensors are supported.
  52. The result of both backends (PIL or Tensors) should be very
  53. close. In general, we recommend relying on the tensor backend :ref:`for
  54. performance <transforms_perf>`. The :ref:`conversion transforms
  55. <conversion_transforms>` may be used to convert to and from PIL images, or for
  56. converting dtypes and ranges.
  57. Tensor image are expected to be of shape ``(C, H, W)``, where ``C`` is the
  58. number of channels, and ``H`` and ``W`` refer to height and width. Most
  59. transforms support batched tensor input. A batch of Tensor images is a tensor of
  60. shape ``(N, C, H, W)``, where ``N`` is a number of images in the batch. The
  61. :ref:`v2 <v1_or_v2>` transforms generally accept an arbitrary number of leading
  62. dimensions ``(..., C, H, W)`` and can handle batched images or batched videos.
  63. .. _range_and_dtype:
  64. Dtype and expected value range
  65. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  66. The expected range of the values of a tensor image is implicitly defined by
  67. the tensor dtype. Tensor images with a float dtype are expected to have
  68. values in ``[0, 1]``. Tensor images with an integer dtype are expected to
  69. have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
  70. that can be represented in that dtype. Typically, images of dtype
  71. ``torch.uint8`` are expected to have values in ``[0, 255]``.
  72. Use :class:`~torchvision.transforms.v2.ToDtype` to convert both the dtype and
  73. range of the inputs.
  74. .. _v1_or_v2:
  75. V1 or V2? Which one should I use?
  76. ---------------------------------
  77. **TL;DR** We recommending using the ``torchvision.transforms.v2`` transforms
  78. instead of those in ``torchvision.transforms``. They're faster and they can do
  79. more things. Just change the import and you should be good to go.
  80. In Torchvision 0.15 (March 2023), we released a new set of transforms available
  81. in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
  82. advantages compared to the v1 ones (in ``torchvision.transforms``):
  83. - They can transform images **but also** bounding boxes, masks, or videos. This
  84. provides support for tasks beyond image classification: detection, segmentation,
  85. video classification, etc. See
  86. :ref:``
  87. and :ref:``.
  88. - They support more transforms like :class:`~torchvision.transforms.v2.CutMix`
  89. and :class:`~torchvision.transforms.v2.MixUp`. See
  90. :ref:``.
  91. - They're :ref:`faster <transforms_perf>`.
  92. - They support arbitrary input structures (dicts, lists, tuples, etc.).
  93. - Future improvements and features will be added to the v2 transforms only.
  94. These transforms are **fully backward compatible** with the v1 ones, so if
  95. you're already using tranforms from ``torchvision.transforms``, all you need to
  96. do to is to update the import to ``torchvision.transforms.v2``. In terms of
  97. output, there might be negligible differences due to implementation differences.
  98. .. note::
  99. The v2 transforms are still BETA, but at this point we do not expect
  100. disruptive changes to be made to their public APIs. We're planning to make
  101. them fully stable in version 0.17. Please submit any feedback you may have
  102. `here <>`_.
  103. .. _transforms_perf:
  104. Performance considerations
  105. --------------------------
  106. We recommend the following guidelines to get the best performance out of the
  107. transforms:
  108. - Rely on the v2 transforms from ``torchvision.transforms.v2``
  109. - Use tensors instead of PIL images
  110. - Use ``torch.uint8`` dtype, especially for resizing
  111. - Resize with bilinear or bicubic mode
  112. This is what a typical transform pipeline could look like:
  113. .. code:: python
  114. from torchvision.transforms import v2
  115. transforms = v2.Compose([
  116. v2.ToImage(), # Convert to tensor, only needed if you had a PIL image
  117. v2.ToDtype(torch.uint8, scale=True), # optional, most input are already uint8 at this point
  118. # ...
  119. v2.RandomResizedCrop(size=(224, 224), antialias=True), # Or Resize(antialias=True)
  120. # ...
  121. v2.ToDtype(torch.float32, scale=True), # Normalize expects float input
  122. v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  123. ])
  124. The above should give you the best performance in a typical training environment
  125. that relies on the :class:`` with ``num_workers >
  126. 0``.
  127. Transforms tend to be sensitive to the input strides / memory format. Some
  128. transforms will be faster with channels-first images while others prefer
  129. channels-last. Like ``torch`` operators, most transforms will preserve the
  130. memory format of the input, but this may not always be respected due to
  131. implementation details. You may want to experiment a bit if you're chasing the
  132. very best performance. Using :func:`torch.compile` on individual transforms may
  133. also help factoring out the memory format variable (e.g. on
  134. :class:`~torchvision.transforms.v2.Normalize`). Note that we're talking about
  135. **memory format**, not :ref:`tensor shape <conventions>`.
  136. Note that resize transforms like :class:`~torchvision.transforms.v2.Resize`
  137. and :class:`~torchvision.transforms.v2.RandomResizedCrop` typically prefer
  138. channels-last input and tend **not** to benefit from :func:`torch.compile` at
  139. this time.
  140. .. _functional_transforms:
  141. Transform classes, functionals, and kernels
  142. -------------------------------------------
  143. Transforms are available as classes like
  144. :class:`~torchvision.transforms.v2.Resize`, but also as functionals like
  145. :func:`~torchvision.transforms.v2.functional.resize` in the
  146. ``torchvision.transforms.v2.functional`` namespace.
  147. This is very much like the :mod:`torch.nn` package which defines both classes
  148. and functional equivalents in :mod:`torch.nn.functional`.
  149. The functionals support PIL images, pure tensors, or :ref:`TVTensors
  150. <tv_tensors>`, e.g. both ``resize(image_tensor)`` and ``resize(boxes)`` are
  151. valid.
  152. .. note::
  153. Random transforms like :class:`~torchvision.transforms.v2.RandomCrop` will
  154. randomly sample some parameter each time they're called. Their functional
  155. counterpart (:func:`~torchvision.transforms.v2.functional.crop`) does not do
  156. any kind of random sampling and thus have a slighlty different
  157. parametrization. The ``get_params()`` class method of the transforms class
  158. can be used to perform parameter sampling when using the functional APIs.
  159. The ``torchvision.transforms.v2.functional`` namespace also contains what we
  160. call the "kernels". These are the low-level functions that implement the
  161. core functionalities for specific types, e.g. ``resize_bounding_boxes`` or
  162. ```resized_crop_mask``. They are public, although not documented. Check the
  163. `code
  164. <>`_
  165. to see which ones are available (note that those starting with a leading
  166. underscore are **not** public!). Kernels are only really useful if you want
  167. :ref:`torchscript support <transforms_torchscript>` for types like bounding
  168. boxes or masks.
  169. .. _transforms_torchscript:
  170. Torchscript support
  171. -------------------
  172. Most transform classes and functionals support torchscript. For composing
  173. transforms, use :class:`torch.nn.Sequential` instead of
  174. :class:`~torchvision.transforms.v2.Compose`:
  175. .. code:: python
  176. transforms = torch.nn.Sequential(
  177. CenterCrop(10),
  178. Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
  179. )
  180. scripted_transforms = torch.jit.script(transforms)
  181. .. warning::
  182. v2 transforms support torchscript, but if you call ``torch.jit.script()`` on
  183. a v2 **class** transform, you'll actually end up with its (scripted) v1
  184. equivalent. This may lead to slightly different results between the
  185. scripted and eager executions due to implementation differences between v1
  186. and v2.
  187. If you really need torchscript support for the v2 transforms, we recommend
  188. scripting the **functionals** from the
  189. ``torchvision.transforms.v2.functional`` namespace to avoid surprises.
  190. Also note that the functionals only support torchscript for pure tensors, which
  191. are always treated as images. If you need torchscript support for other types
  192. like bounding boxes or masks, you can rely on the :ref:`low-level kernels
  193. <functional_transforms>`.
  194. For any custom transformations to be used with ``torch.jit.script``, they should
  195. be derived from ``torch.nn.Module``.
  196. See also: :ref:``.
  197. .. _v2_api_ref:
  198. V2 API reference - Recommended
  199. ------------------------------
  200. Geometry
  201. ^^^^^^^^
  202. Resizing
  203. """"""""
