transforms.rst 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665
  1. .. _transforms:
  2. Transforming and augmenting images
  3. ==================================
  4. .. currentmodule:: torchvision.transforms
  5. Torchvision supports common computer vision transformations in the
  6. ``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
  7. can be used to transform or augment data for training or inference of different
  8. tasks (image classification, detection, segmentation, video classification).
  9. .. code:: python
  10. # Image Classification
  11. import torch
  12. from torchvision.transforms import v2
  13. H, W = 32, 32
  14. img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
  15. transforms = v2.Compose([
  16. v2.RandomResizedCrop(size=(224, 224), antialias=True),
  17. v2.RandomHorizontalFlip(p=0.5),
  18. v2.ToDtype(torch.float32, scale=True),
  19. v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  20. ])
  21. img = transforms(img)
  22. .. code:: python
  23. # Detection (re-using imports and transforms from above)
  24. from torchvision import tv_tensors
  25. img = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
  26. boxes = torch.randint(0, H // 2, size=(3, 4))
  27. boxes[:, 2:] += boxes[:, :2]
  28. boxes = tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H, W))
  29. # The same transforms can be used!
  30. img, boxes = transforms(img, boxes)
  31. # And you can pass arbitrary input structures
  32. output_dict = transforms({"image": img, "boxes": boxes})
  33. Transforms are typically passed as the ``transform`` or ``transforms`` argument
  34. to the :ref:`Datasets <datasets>`.
  35. Start here
  36. ----------
  37. Whether you're new to Torchvision transforms, or you're already experienced with
  38. them, we encourage you to start with
  39. :ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py` in
  40. order to learn more about what can be done with the new v2 transforms.
  41. Then, browse the sections in below this page for general information and
  42. performance tips. The available transforms and functionals are listed in the
  43. :ref:`API reference <v2_api_ref>`.
  44. More information and tutorials can also be found in our :ref:`example gallery
  45. <gallery>`, e.g. :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`
  46. or :ref:`sphx_glr_auto_examples_transforms_plot_custom_transforms.py`.
  47. .. _conventions:
  48. Supported input types and conventions
  49. -------------------------------------
  50. Most transformations accept both `PIL <https://pillow.readthedocs.io>`_ images
  51. and tensor inputs. Both CPU and CUDA tensors are supported.
  52. The result of both backends (PIL or Tensors) should be very
  53. close. In general, we recommend relying on the tensor backend :ref:`for
  54. performance <transforms_perf>`. The :ref:`conversion transforms
  55. <conversion_transforms>` may be used to convert to and from PIL images, or for
  56. converting dtypes and ranges.
  57. Tensor image are expected to be of shape ``(C, H, W)``, where ``C`` is the
  58. number of channels, and ``H`` and ``W`` refer to height and width. Most
  59. transforms support batched tensor input. A batch of Tensor images is a tensor of
  60. shape ``(N, C, H, W)``, where ``N`` is a number of images in the batch. The
  61. :ref:`v2 <v1_or_v2>` transforms generally accept an arbitrary number of leading
  62. dimensions ``(..., C, H, W)`` and can handle batched images or batched videos.
  63. .. _range_and_dtype:
  64. Dtype and expected value range
  65. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  66. The expected range of the values of a tensor image is implicitly defined by
  67. the tensor dtype. Tensor images with a float dtype are expected to have
  68. values in ``[0, 1]``. Tensor images with an integer dtype are expected to
  69. have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
  70. that can be represented in that dtype. Typically, images of dtype
  71. ``torch.uint8`` are expected to have values in ``[0, 255]``.
  72. Use :class:`~torchvision.transforms.v2.ToDtype` to convert both the dtype and
  73. range of the inputs.
  74. .. _v1_or_v2:
  75. V1 or V2? Which one should I use?
  76. ---------------------------------
  77. **TL;DR** We recommending using the ``torchvision.transforms.v2`` transforms
  78. instead of those in ``torchvision.transforms``. They're faster and they can do
  79. more things. Just change the import and you should be good to go.
  80. In Torchvision 0.15 (March 2023), we released a new set of transforms available
  81. in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
  82. advantages compared to the v1 ones (in ``torchvision.transforms``):
  83. - They can transform images **but also** bounding boxes, masks, or videos. This
  84. provides support for tasks beyond image classification: detection, segmentation,
  85. video classification, etc. See
  86. :ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
  87. and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
  88. - They support more transforms like :class:`~torchvision.transforms.v2.CutMix`
  89. and :class:`~torchvision.transforms.v2.MixUp`. See
  90. :ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py`.
  91. - They're :ref:`faster <transforms_perf>`.
  92. - They support arbitrary input structures (dicts, lists, tuples, etc.).
  93. - Future improvements and features will be added to the v2 transforms only.
  94. These transforms are **fully backward compatible** with the v1 ones, so if
  95. you're already using tranforms from ``torchvision.transforms``, all you need to
  96. do to is to update the import to ``torchvision.transforms.v2``. In terms of
  97. output, there might be negligible differences due to implementation differences.
  98. .. note::
  99. The v2 transforms are still BETA, but at this point we do not expect
  100. disruptive changes to be made to their public APIs. We're planning to make
  101. them fully stable in version 0.17. Please submit any feedback you may have
  102. `here <https://github.com/pytorch/vision/issues/6753>`_.
  103. .. _transforms_perf:
  104. Performance considerations
  105. --------------------------
  106. We recommend the following guidelines to get the best performance out of the
  107. transforms:
  108. - Rely on the v2 transforms from ``torchvision.transforms.v2``
  109. - Use tensors instead of PIL images
  110. - Use ``torch.uint8`` dtype, especially for resizing
  111. - Resize with bilinear or bicubic mode
  112. This is what a typical transform pipeline could look like:
  113. .. code:: python
  114. from torchvision.transforms import v2
  115. transforms = v2.Compose([
  116. v2.ToImage(), # Convert to tensor, only needed if you had a PIL image
  117. v2.ToDtype(torch.uint8, scale=True), # optional, most input are already uint8 at this point
  118. # ...
  119. v2.RandomResizedCrop(size=(224, 224), antialias=True), # Or Resize(antialias=True)
  120. # ...
  121. v2.ToDtype(torch.float32, scale=True), # Normalize expects float input
  122. v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  123. ])
  124. The above should give you the best performance in a typical training environment
  125. that relies on the :class:`torch.utils.data.DataLoader` with ``num_workers >
  126. 0``.
  127. Transforms tend to be sensitive to the input strides / memory format. Some
  128. transforms will be faster with channels-first images while others prefer
  129. channels-last. Like ``torch`` operators, most transforms will preserve the
  130. memory format of the input, but this may not always be respected due to
  131. implementation details. You may want to experiment a bit if you're chasing the
  132. very best performance. Using :func:`torch.compile` on individual transforms may
  133. also help factoring out the memory format variable (e.g. on
  134. :class:`~torchvision.transforms.v2.Normalize`). Note that we're talking about
  135. **memory format**, not :ref:`tensor shape <conventions>`.
  136. Note that resize transforms like :class:`~torchvision.transforms.v2.Resize`
  137. and :class:`~torchvision.transforms.v2.RandomResizedCrop` typically prefer
  138. channels-last input and tend **not** to benefit from :func:`torch.compile` at
  139. this time.
  140. .. _functional_transforms:
  141. Transform classes, functionals, and kernels
  142. -------------------------------------------
  143. Transforms are available as classes like
  144. :class:`~torchvision.transforms.v2.Resize`, but also as functionals like
  145. :func:`~torchvision.transforms.v2.functional.resize` in the
  146. ``torchvision.transforms.v2.functional`` namespace.
  147. This is very much like the :mod:`torch.nn` package which defines both classes
  148. and functional equivalents in :mod:`torch.nn.functional`.
  149. The functionals support PIL images, pure tensors, or :ref:`TVTensors
  150. <tv_tensors>`, e.g. both ``resize(image_tensor)`` and ``resize(boxes)`` are
  151. valid.
  152. .. note::
  153. Random transforms like :class:`~torchvision.transforms.v2.RandomCrop` will
  154. randomly sample some parameter each time they're called. Their functional
  155. counterpart (:func:`~torchvision.transforms.v2.functional.crop`) does not do
  156. any kind of random sampling and thus have a slighlty different
  157. parametrization. The ``get_params()`` class method of the transforms class
  158. can be used to perform parameter sampling when using the functional APIs.
  159. The ``torchvision.transforms.v2.functional`` namespace also contains what we
  160. call the "kernels". These are the low-level functions that implement the
  161. core functionalities for specific types, e.g. ``resize_bounding_boxes`` or
  162. ```resized_crop_mask``. They are public, although not documented. Check the
  163. `code
  164. <https://github.com/pytorch/vision/blob/main/torchvision/transforms/v2/functional/__init__.py>`_
  165. to see which ones are available (note that those starting with a leading
  166. underscore are **not** public!). Kernels are only really useful if you want
  167. :ref:`torchscript support <transforms_torchscript>` for types like bounding
  168. boxes or masks.
  169. .. _transforms_torchscript:
  170. Torchscript support
  171. -------------------
  172. Most transform classes and functionals support torchscript. For composing
  173. transforms, use :class:`torch.nn.Sequential` instead of
  174. :class:`~torchvision.transforms.v2.Compose`:
  175. .. code:: python
  176. transforms = torch.nn.Sequential(
  177. CenterCrop(10),
  178. Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
  179. )
  180. scripted_transforms = torch.jit.script(transforms)
  181. .. warning::
  182. v2 transforms support torchscript, but if you call ``torch.jit.script()`` on
  183. a v2 **class** transform, you'll actually end up with its (scripted) v1
  184. equivalent. This may lead to slightly different results between the
  185. scripted and eager executions due to implementation differences between v1
  186. and v2.
  187. If you really need torchscript support for the v2 transforms, we recommend
  188. scripting the **functionals** from the
  189. ``torchvision.transforms.v2.functional`` namespace to avoid surprises.
  190. Also note that the functionals only support torchscript for pure tensors, which
  191. are always treated as images. If you need torchscript support for other types
  192. like bounding boxes or masks, you can rely on the :ref:`low-level kernels
  193. <functional_transforms>`.
  194. For any custom transformations to be used with ``torch.jit.script``, they should
  195. be derived from ``torch.nn.Module``.
  196. See also: :ref:`sphx_glr_auto_examples_others_plot_scripted_tensor_transforms.py`.
  197. .. _v2_api_ref:
  198. V2 API reference - Recommended
  199. ------------------------------
  200. Geometry
  201. ^^^^^^^^
  202. Resizing
  203. """"""""
  204. .. autosummary::
  205. :toctree: generated/
  206. :template: class.rst
  207. v2.Resize
  208. v2.ScaleJitter
  209. v2.RandomShortestSize
  210. v2.RandomResize
  211. Functionals
  212. .. autosummary::
  213. :toctree: generated/
  214. :template: function.rst
  215. v2.functional.resize
  216. Cropping
  217. """"""""
  218. .. autosummary::
  219. :toctree: generated/
  220. :template: class.rst
  221. v2.RandomCrop
  222. v2.RandomResizedCrop
  223. v2.RandomIoUCrop
  224. v2.CenterCrop
  225. v2.FiveCrop
  226. v2.TenCrop
  227. Functionals
  228. .. autosummary::
  229. :toctree: generated/
  230. :template: function.rst
  231. v2.functional.crop
  232. v2.functional.resized_crop
  233. v2.functional.ten_crop
  234. v2.functional.center_crop
  235. v2.functional.five_crop
  236. Others
  237. """"""
  238. .. autosummary::
  239. :toctree: generated/
  240. :template: class.rst
  241. v2.RandomHorizontalFlip
  242. v2.RandomVerticalFlip
  243. v2.Pad
  244. v2.RandomZoomOut
  245. v2.RandomRotation
  246. v2.RandomAffine
  247. v2.RandomPerspective
  248. v2.ElasticTransform
  249. Functionals
  250. .. autosummary::
  251. :toctree: generated/
  252. :template: function.rst
  253. v2.functional.horizontal_flip
  254. v2.functional.vertical_flip
  255. v2.functional.pad
  256. v2.functional.rotate
  257. v2.functional.affine
  258. v2.functional.perspective
  259. v2.functional.elastic
  260. Color
  261. ^^^^^
  262. .. autosummary::
  263. :toctree: generated/
  264. :template: class.rst
  265. v2.ColorJitter
  266. v2.RandomChannelPermutation
  267. v2.RandomPhotometricDistort
  268. v2.Grayscale
  269. v2.RandomGrayscale
  270. v2.GaussianBlur
  271. v2.RandomInvert
  272. v2.RandomPosterize
  273. v2.RandomSolarize
  274. v2.RandomAdjustSharpness
  275. v2.RandomAutocontrast
  276. v2.RandomEqualize
  277. Functionals
  278. .. autosummary::
  279. :toctree: generated/
  280. :template: function.rst
  281. v2.functional.permute_channels
  282. v2.functional.rgb_to_grayscale
  283. v2.functional.to_grayscale
  284. v2.functional.gaussian_blur
  285. v2.functional.invert
  286. v2.functional.posterize
  287. v2.functional.solarize
  288. v2.functional.adjust_sharpness
  289. v2.functional.autocontrast
  290. v2.functional.adjust_contrast
  291. v2.functional.equalize
  292. v2.functional.adjust_brightness
  293. v2.functional.adjust_saturation
  294. v2.functional.adjust_hue
  295. v2.functional.adjust_gamma
  296. Composition
  297. ^^^^^^^^^^^
  298. .. autosummary::
  299. :toctree: generated/
  300. :template: class.rst
  301. v2.Compose
  302. v2.RandomApply
  303. v2.RandomChoice
  304. v2.RandomOrder
  305. Miscellaneous
  306. ^^^^^^^^^^^^^
  307. .. autosummary::
  308. :toctree: generated/
  309. :template: class.rst
  310. v2.LinearTransformation
  311. v2.Normalize
  312. v2.RandomErasing
  313. v2.Lambda
  314. v2.SanitizeBoundingBoxes
  315. v2.ClampBoundingBoxes
  316. v2.UniformTemporalSubsample
  317. Functionals
  318. .. autosummary::
  319. :toctree: generated/
  320. :template: function.rst
  321. v2.functional.normalize
  322. v2.functional.erase
  323. v2.functional.clamp_bounding_boxes
  324. v2.functional.uniform_temporal_subsample
  325. .. _conversion_transforms:
  326. Conversion
  327. ^^^^^^^^^^
  328. .. note::
  329. Beware, some of these conversion transforms below will scale the values
  330. while performing the conversion, while some may not do any scaling. By
  331. scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
  332. 255] range into [0, 1] (and vice-versa). See :ref:`range_and_dtype`.
  333. .. autosummary::
  334. :toctree: generated/
  335. :template: class.rst
  336. v2.ToImage
  337. v2.ToPureTensor
  338. v2.PILToTensor
  339. v2.ToPILImage
  340. v2.ToDtype
  341. v2.ConvertBoundingBoxFormat
  342. functionals
  343. .. autosummary::
  344. :toctree: generated/
  345. :template: functional.rst
  346. v2.functional.to_image
  347. v2.functional.pil_to_tensor
  348. v2.functional.to_pil_image
  349. v2.functional.to_dtype
  350. v2.functional.convert_bounding_box_format
  351. Deprecated
  352. .. autosummary::
  353. :toctree: generated/
  354. :template: class.rst
  355. v2.ToTensor
  356. v2.functional.to_tensor
  357. v2.ConvertImageDtype
  358. v2.functional.convert_image_dtype
  359. Auto-Augmentation
  360. ^^^^^^^^^^^^^^^^^
  361. `AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
  362. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
  363. ImageNet policies provide significant improvements when applied to other datasets.
  364. In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
  365. The new transform can be used standalone or mixed-and-matched with existing transforms:
  366. .. autosummary::
  367. :toctree: generated/
  368. :template: class.rst
  369. v2.AutoAugment
  370. v2.RandAugment
  371. v2.TrivialAugmentWide
  372. v2.AugMix
  373. CutMix - MixUp
  374. ^^^^^^^^^^^^^^
  375. CutMix and MixUp are special transforms that
  376. are meant to be used on batches rather than on individual images, because they
  377. are combining pairs of images together. These can be used after the dataloader
  378. (once the samples are batched), or part of a collation function. See
  379. :ref:`sphx_glr_auto_examples_transforms_plot_cutmix_mixup.py` for detailed usage examples.
  380. .. autosummary::
  381. :toctree: generated/
  382. :template: class.rst
  383. v2.CutMix
  384. v2.MixUp
  385. Developer tools
  386. ^^^^^^^^^^^^^^^
  387. .. autosummary::
  388. :toctree: generated/
  389. :template: function.rst
  390. v2.functional.register_kernel
  391. V1 API Reference
  392. ----------------
  393. Geometry
  394. ^^^^^^^^
  395. .. autosummary::
  396. :toctree: generated/
  397. :template: class.rst
  398. Resize
  399. RandomCrop
  400. RandomResizedCrop
  401. CenterCrop
  402. FiveCrop
  403. TenCrop
  404. Pad
  405. RandomRotation
  406. RandomAffine
  407. RandomPerspective
  408. ElasticTransform
  409. RandomHorizontalFlip
  410. RandomVerticalFlip
  411. Color
  412. ^^^^^
  413. .. autosummary::
  414. :toctree: generated/
  415. :template: class.rst
  416. ColorJitter
  417. Grayscale
  418. RandomGrayscale
  419. GaussianBlur
  420. RandomInvert
  421. RandomPosterize
  422. RandomSolarize
  423. RandomAdjustSharpness
  424. RandomAutocontrast
  425. RandomEqualize
  426. Composition
  427. ^^^^^^^^^^^
  428. .. autosummary::
  429. :toctree: generated/
  430. :template: class.rst
  431. Compose
  432. RandomApply
  433. RandomChoice
  434. RandomOrder
  435. Miscellaneous
  436. ^^^^^^^^^^^^^
  437. .. autosummary::
  438. :toctree: generated/
  439. :template: class.rst
  440. LinearTransformation
  441. Normalize
  442. RandomErasing
  443. Lambda
  444. Conversion
  445. ^^^^^^^^^^
  446. .. note::
  447. Beware, some of these conversion transforms below will scale the values
  448. while performing the conversion, while some may not do any scaling. By
  449. scaling, we mean e.g. that a ``uint8`` -> ``float32`` would map the [0,
  450. 255] range into [0, 1] (and vice-versa). See :ref:`range_and_dtype`.
  451. .. autosummary::
  452. :toctree: generated/
  453. :template: class.rst
  454. ToPILImage
  455. ToTensor
  456. PILToTensor
  457. ConvertImageDtype
  458. Auto-Augmentation
  459. ^^^^^^^^^^^^^^^^^
  460. `AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
  461. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
  462. ImageNet policies provide significant improvements when applied to other datasets.
  463. In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
  464. The new transform can be used standalone or mixed-and-matched with existing transforms:
  465. .. autosummary::
  466. :toctree: generated/
  467. :template: class.rst
  468. AutoAugmentPolicy
  469. AutoAugment
  470. RandAugment
  471. TrivialAugmentWide
  472. AugMix
  473. Functional Transforms
  474. ^^^^^^^^^^^^^^^^^^^^^
  475. .. currentmodule:: torchvision.transforms.functional
  476. .. autosummary::
  477. :toctree: generated/
  478. :template: function.rst
  479. adjust_brightness
  480. adjust_contrast
  481. adjust_gamma
  482. adjust_hue
  483. adjust_saturation
  484. adjust_sharpness
  485. affine
  486. autocontrast
  487. center_crop
  488. convert_image_dtype
  489. crop
  490. equalize
  491. erase
  492. five_crop
  493. gaussian_blur
  494. get_dimensions
  495. get_image_num_channels
  496. get_image_size
  497. hflip
  498. invert
  499. normalize
  500. pad
  501. perspective
  502. pil_to_tensor
  503. posterize
  504. resize
  505. resized_crop
  506. rgb_to_grayscale
  507. rotate
  508. solarize
  509. ten_crop
  510. to_grayscale
  511. to_pil_image
  512. to_tensor
  513. vflip