automatic_derivatives.rst 9.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306
  1. .. default-domain:: cpp
  2. .. cpp:namespace:: ceres
  3. .. _chapter-automatic_derivatives:
  4. =====================
  5. Automatic Derivatives
  6. =====================
  7. We will now consider automatic differentiation. It is a technique that
  8. can compute exact derivatives, fast, while requiring about the same
  9. effort from the user as is needed to use numerical differentiation.
  10. Don't believe me? Well here goes. The following code fragment
  11. implements an automatically differentiated ``CostFunction`` for `Rat43
  12. <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
  13. .. code-block:: c++
  14. struct Rat43CostFunctor {
  15. Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
  16. template <typename T>
  17. bool operator()(const T* parameters, T* residuals) const {
  18. const T b1 = parameters[0];
  19. const T b2 = parameters[1];
  20. const T b3 = parameters[2];
  21. const T b4 = parameters[3];
  22. residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
  23. return true;
  24. }
  25. private:
  26. const double x_;
  27. const double y_;
  28. };
  29. CostFunction* cost_function =
  30. new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
  31. new Rat43CostFunctor(x, y));
  32. Notice that compared to numeric differentiation, the only difference
  33. when defining the functor for use with automatic differentiation is
  34. the signature of the ``operator()``.
  35. In the case of numeric differentiation it was
  36. .. code-block:: c++
  37. bool operator()(const double* parameters, double* residuals) const;
  38. and for automatic differentiation it is a templated function of the
  39. form
  40. .. code-block:: c++
  41. template <typename T> bool operator()(const T* parameters, T* residuals) const;
  42. So what does this small change buy us? The following table compares
  43. the time it takes to evaluate the residual and the Jacobian for
  44. `Rat43` using various methods.
  45. ========================== =========
  46. CostFunction Time (ns)
  47. ========================== =========
  48. Rat43Analytic 255
  49. Rat43AnalyticOptimized 92
  50. Rat43NumericDiffForward 262
  51. Rat43NumericDiffCentral 517
  52. Rat43NumericDiffRidders 3760
  53. Rat43AutomaticDiff 129
  54. ========================== =========
  55. We can get exact derivatives using automatic differentiation
  56. (``Rat43AutomaticDiff``) with about the same effort that is required
  57. to write the code for numeric differentiation but only :math:`40\%`
  58. slower than hand optimized analytical derivatives.
  59. So how does it work? For this we will have to learn about **Dual
  60. Numbers** and **Jets** .
  61. Dual Numbers & Jets
  62. ===================
  63. .. NOTE::
  64. Reading this and the next section on implementing Jets is not
  65. necessary to use automatic differentiation in Ceres Solver. But
  66. knowing the basics of how Jets work is useful when debugging and
  67. reasoning about the performance of automatic differentiation.
  68. Dual numbers are an extension of the real numbers analogous to complex
  69. numbers: whereas complex numbers augment the reals by introducing an
  70. imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
  71. numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
  72. :math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
  73. components, the *real* component :math:`a` and the *infinitesimal*
  74. component :math:`v`.
  75. Surprisingly, this simple change leads to a convenient method for
  76. computing exact derivatives without needing to manipulate complicated
  77. symbolic expressions.
  78. For example, consider the function
  79. .. math::
  80. f(x) = x^2 ,
  81. Then,
  82. .. math::
  83. \begin{align}
  84. f(10 + \epsilon) &= (10 + \epsilon)^2\\
  85. &= 100 + 20 \epsilon + \epsilon^2\\
  86. &= 100 + 20 \epsilon
  87. \end{align}
  88. Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
  89. 20`. Indeed this generalizes to functions which are not
  90. polynomial. Consider an arbitrary differentiable function
  91. :math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
  92. considering the Taylor expansion of :math:`f` near :math:`x`, which
  93. gives us the infinite series
  94. .. math::
  95. \begin{align}
  96. f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
  97. \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
  98. f(x + \epsilon) &= f(x) + Df(x) \epsilon
  99. \end{align}
  100. Here we are using the fact that :math:`\epsilon^2 = 0`.
  101. A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
  102. :math:`n`-dimensional dual number, where we augment the real numbers
  103. with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
  104. the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
  105. a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
  106. *infinitesimal* part :math:`\mathbf{v}`, i.e.,
  107. .. math::
  108. x = a + \sum_j v_{j} \epsilon_j
  109. The summation notation gets tedious, so we will also just write
  110. .. math::
  111. x = a + \mathbf{v}.
  112. where the :math:`\epsilon_i`'s are implicit. Then, using the same
  113. Taylor series expansion used above, we can see that:
  114. .. math::
  115. f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
  116. Similarly for a multivariate function
  117. :math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
  118. :math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
  119. .. math::
  120. f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
  121. So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
  122. standard basis vector, then, the above expression would simplify to
  123. .. math::
  124. f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
  125. and we can extract the coordinates of the Jacobian by inspecting the
  126. coefficients of :math:`\epsilon_i`.
  127. Implementing Jets
  128. -----------------
  129. In order for the above to work in practice, we will need the ability
  130. to evaluate an arbitrary function :math:`f` not just on real numbers
  131. but also on dual numbers, but one does not usually evaluate functions
  132. by evaluating their Taylor expansions,
  133. This is where C++ templates and operator overloading comes into
  134. play. The following code fragment has a simple implementation of a
  135. ``Jet`` and some operators/functions that operate on them.
  136. .. code-block:: c++
  137. template<int N> struct Jet {
  138. double a;
  139. Eigen::Matrix<double, 1, N> v;
  140. };
  141. template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
  142. return Jet<N>(f.a + g.a, f.v + g.v);
  143. }
  144. template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
  145. return Jet<N>(f.a - g.a, f.v - g.v);
  146. }
  147. template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
  148. return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
  149. }
  150. template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
  151. return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
  152. }
  153. template <int N> Jet<N> exp(const Jet<N>& f) {
  154. return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
  155. }
  156. // This is a simple implementation for illustration purposes, the
  157. // actual implementation of pow requires careful handling of a number
  158. // of corner cases.
  159. template <int N> Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
  160. return Jet<N>(pow(f.a, g.a),
  161. g.a * pow(f.a, g.a - 1.0) * f.v +
  162. pow(f.a, g.a) * log(f.a); * g.v);
  163. }
  164. With these overloaded functions in hand, we can now call
  165. ``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
  166. that together with appropriately initialized Jets allows us to compute
  167. the Jacobian as follows:
  168. .. code-block:: c++
  169. class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
  170. public:
  171. Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
  172. virtual ~Rat43Automatic() {}
  173. virtual bool Evaluate(double const* const* parameters,
  174. double* residuals,
  175. double** jacobians) const {
  176. // Just evaluate the residuals if Jacobians are not required.
  177. if (!jacobians) return (*functor_)(parameters[0], residuals);
  178. // Initialize the Jets
  179. ceres::Jet<4> jets[4];
  180. for (int i = 0; i < 4; ++i) {
  181. jets[i].a = parameters[0][i];
  182. jets[i].v.setZero();
  183. jets[i].v[i] = 1.0;
  184. }
  185. ceres::Jet<4> result;
  186. (*functor_)(jets, &result);
  187. // Copy the values out of the Jet.
  188. residuals[0] = result.a;
  189. for (int i = 0; i < 4; ++i) {
  190. jacobians[0][i] = result.v[i];
  191. }
  192. return true;
  193. }
  194. private:
  195. std::unique_ptr<const Rat43CostFunctor> functor_;
  196. };
  197. Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
  198. Pitfalls
  199. ========
  200. Automatic differentiation frees the user from the burden of computing
  201. and reasoning about the symbolic expressions for the Jacobians, but
  202. this freedom comes at a cost. For example consider the following
  203. simple functor:
  204. .. code-block:: c++
  205. struct Functor {
  206. template <typename T> bool operator()(const T* x, T* residual) const {
  207. residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
  208. return true;
  209. }
  210. };
  211. Looking at the code for the residual computation, one does not foresee
  212. any problems. However, if we look at the analytical expressions for
  213. the Jacobian:
  214. .. math::
  215. y &= 1 - \sqrt{x_0^2 + x_1^2}\\
  216. D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
  217. D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
  218. we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
  219. 0`.
  220. There is no single solution to this problem. In some cases one needs
  221. to reason explicitly about the points where indeterminacy may occur
  222. and use alternate expressions using `L'Hôpital's rule
  223. <https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
  224. example some of the conversion routines in `rotation.h
  225. <https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
  226. other cases, one may need to regularize the expressions to eliminate
  227. these points.