analytical_derivatives.rst 6.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192
  1. .. default-domain:: cpp
  2. .. cpp:namespace:: ceres
  3. .. _chapter-analytical_derivatives:
  4. ====================
  5. Analytic Derivatives
  6. ====================
  7. Consider the problem of fitting the following curve (`Rat43
  8. <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
  9. data:
  10. .. math::
  11. y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
  12. That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
  13. determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
  14. fit this data.
  15. Which can be stated as the problem of finding the
  16. values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
  17. minimize the following objective function [#f1]_:
  18. .. math::
  19. \begin{align}
  20. E(b_1, b_2, b_3, b_4)
  21. &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
  22. &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
  23. \end{align}
  24. To solve this problem using Ceres Solver, we need to define a
  25. :class:`CostFunction` that computes the residual :math:`f` for a given
  26. :math:`x` and :math:`y` and its derivatives with respect to
  27. :math:`b_1, b_2, b_3` and :math:`b_4`.
  28. Using elementary differential calculus, we can see that:
  29. .. math::
  30. \begin{align}
  31. D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
  32. D_2 f(b_1, b_2, b_3, b_4; x,y) &=
  33. \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
  34. D_3 f(b_1, b_2, b_3, b_4; x,y) &=
  35. \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
  36. D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
  37. \end{align}
  38. With these derivatives in hand, we can now implement the
  39. :class:`CostFunction` as:
  40. .. code-block:: c++
  41. class Rat43Analytic : public SizedCostFunction<1,4> {
  42. public:
  43. Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
  44. virtual ~Rat43Analytic() {}
  45. virtual bool Evaluate(double const* const* parameters,
  46. double* residuals,
  47. double** jacobians) const {
  48. const double b1 = parameters[0][0];
  49. const double b2 = parameters[0][1];
  50. const double b3 = parameters[0][2];
  51. const double b4 = parameters[0][3];
  52. residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
  53. if (!jacobians) return true;
  54. double* jacobian = jacobians[0];
  55. if (!jacobian) return true;
  56. jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
  57. jacobian[1] = -b1 * exp(b2 - b3 * x_) *
  58. pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
  59. jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
  60. pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
  61. jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
  62. pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
  63. return true;
  64. }
  65. private:
  66. const double x_;
  67. const double y_;
  68. };
  69. This is tedious code, hard to read and with a lot of
  70. redundancy. So in practice we will cache some sub-expressions to
  71. improve its efficiency, which would give us something like:
  72. .. code-block:: c++
  73. class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
  74. public:
  75. Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
  76. virtual ~Rat43AnalyticOptimized() {}
  77. virtual bool Evaluate(double const* const* parameters,
  78. double* residuals,
  79. double** jacobians) const {
  80. const double b1 = parameters[0][0];
  81. const double b2 = parameters[0][1];
  82. const double b3 = parameters[0][2];
  83. const double b4 = parameters[0][3];
  84. const double t1 = exp(b2 - b3 * x_);
  85. const double t2 = 1 + t1;
  86. const double t3 = pow(t2, -1.0 / b4);
  87. residuals[0] = b1 * t3 - y_;
  88. if (!jacobians) return true;
  89. double* jacobian = jacobians[0];
  90. if (!jacobian) return true;
  91. const double t4 = pow(t2, -1.0 / b4 - 1);
  92. jacobian[0] = t3;
  93. jacobian[1] = -b1 * t1 * t4 / b4;
  94. jacobian[2] = -x_ * jacobian[1];
  95. jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
  96. return true;
  97. }
  98. private:
  99. const double x_;
  100. const double y_;
  101. };
  102. What is the difference in performance of these two implementations?
  103. ========================== =========
  104. CostFunction Time (ns)
  105. ========================== =========
  106. Rat43Analytic 255
  107. Rat43AnalyticOptimized 92
  108. ========================== =========
  109. ``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
  110. ``Rat43Analytic``. This difference in run-time is not uncommon. To
  111. get the best performance out of analytically computed derivatives, one
  112. usually needs to optimize the code to account for common
  113. sub-expressions.
  114. When should you use analytical derivatives?
  115. ===========================================
  116. #. The expressions are simple, e.g. mostly linear.
  117. #. A computer algebra system like `Maple
  118. <https://www.maplesoft.com/products/maple/>`_ , `Mathematica
  119. <https://www.wolfram.com/mathematica/>`_, or `SymPy
  120. <http://www.sympy.org/en/index.html>`_ can be used to symbolically
  121. differentiate the objective function and generate the C++ to
  122. evaluate them.
  123. #. Performance is of utmost concern and there is algebraic structure
  124. in the terms that you can exploit to get better performance than
  125. automatic differentiation.
  126. That said, getting the best performance out of analytical
  127. derivatives requires a non-trivial amount of work. Before going
  128. down this path, it is useful to measure the amount of time being
  129. spent evaluating the Jacobian as a fraction of the total solve time
  130. and remember `Amdahl's Law
  131. <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
  132. #. There is no other way to compute the derivatives, e.g. you
  133. wish to compute the derivative of the root of a polynomial:
  134. .. math::
  135. a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
  136. with respect to :math:`x` and :math:`y`. This requires the use of
  137. the `Inverse Function Theorem
  138. <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
  139. #. You love the chain rule and actually enjoy doing all the algebra by
  140. hand.
  141. .. rubric:: Footnotes
  142. .. [#f1] The notion of best fit depends on the choice of the objective
  143. function used to measure the quality of fit, which in turn
  144. depends on the underlying noise process which generated the
  145. observations. Minimizing the sum of squared differences is
  146. the right thing to do when the noise is `Gaussian
  147. <https://en.wikipedia.org/wiki/Normal_distribution>`_. In
  148. that case the optimal value of the parameters is the `Maximum
  149. Likelihood Estimate
  150. <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.