inverse_and_implicit_function_theorems.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214
  1. .. default-domain:: cpp
  2. .. cpp:namespace:: ceres
  3. .. _chapter-inverse_function_theorem:
  4. ==========================================
  5. Using Inverse & Implicit Function Theorems
  6. ==========================================
  7. Until now we have considered methods for computing derivatives that
  8. work directly on the function being differentiated. However, this is
  9. not always possible. For example, if the function can only be computed
  10. via an iterative algorithm, or there is no explicit definition of the
  11. function available. In this section we will see how we can use two
  12. basic results from calculus to get around these difficulties.
  13. Inverse Function Theorem
  14. ========================
  15. Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
  16. but evaluating :math:`f(x)` is not easy. Say it involves running an
  17. iterative algorithm. You could try automatically differentiating the
  18. iterative algorithm, but even if that is possible, it can become quite
  19. expensive.
  20. In some cases we get lucky, and computing the inverse of :math:`f(x)`
  21. is an easy operation. In these cases, we can use the `Inverse Function
  22. Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
  23. compute the derivative exactly. Here is the key idea:
  24. Assuming that :math:`y=f(x)` is continuously differentiable in a
  25. neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
  26. Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
  27. the identity :math:`f^{-1}(f(x)) = x`, we have
  28. :math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
  29. i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
  30. :math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.
  31. For example, let :math:`f(x) = e^x`. Now of course we know that
  32. :math:`Df(x) = e^x`, but let's try and compute it via the Inverse
  33. Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
  34. y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
  35. (Df^{-1}(y))^{-1} = y = e^x`.
  36. You maybe wondering why the above is true. A smoothly differentiable
  37. function in a small neighborhood is well approximated by a linear
  38. function. Indeed this is a good way to think about the Jacobian, it is
  39. the matrix that best approximates the function linearly. Once you do
  40. that, it is straightforward to see that *locally* :math:`f^{-1}(y)` is
  41. best approximated linearly by the inverse of the Jacobian of
  42. :math:`f(x)`.
  43. Let us now consider a more practical example.
  44. Geodetic Coordinate System Conversion
  45. -------------------------------------
  46. When working with data related to the Earth, one can use two different
  47. coordinate systems. The familiar (latitude, longitude, height)
  48. Latitude-Longitude-Altitude coordinate system or the `ECEF
  49. <http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
  50. is familiar but is not terribly convenient analytically. The latter is
  51. a Cartesian system but not particularly intuitive. So systems that
  52. process earth related data have to go back and forth between these
  53. coordinate systems.
  54. The conversion between the LLA and the ECEF coordinate system requires
  55. a model of the Earth, the most commonly used one being `WGS84
  56. <https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
  57. Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
  58. :math:`(x,y,z)` coordinates is easy.
  59. .. math::
  60. \chi &= \sqrt{1 - e^2 \sin^2 \phi}
  61. X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda
  62. Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda
  63. Z &= \left(\frac{a(1-e^2)}{\chi} +h \right) \sin \phi
  64. Here :math:`a` and :math:`e^2` are constants defined by `WGS84
  65. <https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
  66. Going from ECEF to LLA coordinates requires an iterative algorithm. So
  67. to compute the derivative of the this transformation we invoke the
  68. Inverse Function Theorem as follows:
  69. .. code-block:: c++
  70. Eigen::Vector3d ecef; // Fill some values
  71. // Iterative computation.
  72. Eigen::Vector3d lla = ECEFToLLA(ecef);
  73. // Analytic derivatives
  74. Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
  75. bool invertible;
  76. Eigen::Matrix3d ecef_to_lla_jacobian;
  77. lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);
  78. Implicit Function Theorem
  79. =========================
  80. Consider now the problem where we have two variables :math:`x \in
  81. \mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
  82. :math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
  83. such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
  84. :math:`y` with respect to `x`. How do we do this?
  85. If for a given value of :math:`(x,y)`, the partial Jacobian
  86. :math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
  87. <https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
  88. that there exists a neighborhood of :math:`x` and a function :math:`G`
  89. such :math:`y = G(x)` in this neighborhood. Differentiating
  90. :math:`F(x,G(x)) = 0` gives us
  91. .. math::
  92. D_1F(x,y) + D_2F(x,y)DG(x) &= 0
  93. DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
  94. D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
  95. This means that we can compute the derivative of :math:`y` with
  96. respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
  97. :math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.
  98. Let's consider two examples.
  99. Roots of a Polynomial
  100. ---------------------
  101. The first example we consider is a classic. Let :math:`p(x) = a_0 +
  102. a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
  103. to compute the derivative of its roots with respect to its
  104. coefficients. There is no closed form formula for computing the roots
  105. of a general degree :math:`n` polynomial. `Galois
  106. <https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
  107. <https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
  108. are numerical algorithms like computing the eigenvalues of the
  109. `Companion Matrix
  110. <https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
  111. differentiating an eigenvalue solver does not seem like fun. But the
  112. Implicit Function Theorem offers us a simple path.
  113. If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
  114. a_0 + a_1 x + \dots + a_n x^n = 0`. So,
  115. .. math::
  116. D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]
  117. D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)
  118. Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]
  119. Differentiating the Solution to an Optimization Problem
  120. -------------------------------------------------------
  121. Sometimes we are required to solve optimization problems inside
  122. optimization problems, and this requires computing the derivative of
  123. the optimal solution (or a fixed point) of an optimization problem
  124. w.r.t its parameters.
  125. Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
  126. \mathbb{R}^{k\times n}` be a matrix whose entries are a function of
  127. :math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
  128. be a constant vector, then consider the linear least squares problem:
  129. .. math::
  130. x^* = \arg \min_x \|A(\theta) x - b\|_2^2
  131. How do we compute :math:`D_\theta x^*(\theta)`?
  132. One approach would be to observe that :math:`x^*(\theta) =
  133. (A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
  134. this w.r.t :math:`\theta`. But this would require differentiating
  135. through the inverse of the matrix
  136. :math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
  137. the Implicit Function Theorem instead.
  138. The first step is to observe that :math:`x^*` satisfies the so called
  139. *normal equations*.
  140. .. math::
  141. A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0
  142. We will compute :math:`D_\theta x^*` column-wise, treating
  143. :math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
  144. of :math:`\theta` at a time. So using the normal equations, let's
  145. define :math:`F(\theta_i, x^*) = A^\top(\theta_i)A(\theta_i)x^* -
  146. A^\top(\theta_i)b = 0`. Using which can now compute:
  147. .. math::
  148. D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
  149. D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i
  150. D_2F(\theta_i, x^*) &= A^\top A
  151. Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i
  152. Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]
  153. Observe that we only need to compute the inverse of :math:`A^\top A`,
  154. to compute :math:`D x^*(\theta)`, which we needed anyways to compute
  155. :math:`x^*`.