gradient_solver.rst 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517
  1. .. highlight:: c++
  2. .. default-domain:: cpp
  3. .. cpp:namespace:: ceres
  4. .. _chapter-gradient_problem_solver:
  5. ==================================
  6. General Unconstrained Minimization
  7. ==================================
  8. Modeling
  9. ========
  10. :class:`FirstOrderFunction`
  11. ---------------------------
  12. .. class:: FirstOrderFunction
  13. Instances of :class:`FirstOrderFunction` implement the evaluation of
  14. a function and its gradient.
  15. .. code-block:: c++
  16. class FirstOrderFunction {
  17. public:
  18. virtual ~FirstOrderFunction() {}
  19. virtual bool Evaluate(const double* const parameters,
  20. double* cost,
  21. double* gradient) const = 0;
  22. virtual int NumParameters() const = 0;
  23. };
  24. .. function:: bool FirstOrderFunction::Evaluate(const double* const parameters, double* cost, double* gradient) const
  25. Evaluate the cost/value of the function. If ``gradient`` is not
  26. ``nullptr`` then evaluate the gradient too. If evaluation is
  27. successful return, ``true`` else return ``false``.
  28. ``cost`` guaranteed to be never ``nullptr``, ``gradient`` can be ``nullptr``.
  29. .. function:: int FirstOrderFunction::NumParameters() const
  30. Number of parameters in the domain of the function.
  31. :class:`GradientProblem`
  32. ------------------------
  33. .. class:: GradientProblem
  34. .. code-block:: c++
  35. class GradientProblem {
  36. public:
  37. explicit GradientProblem(FirstOrderFunction* function);
  38. GradientProblem(FirstOrderFunction* function,
  39. Manifold* manifold);
  40. int NumParameters() const;
  41. int NumTangentParameters() const;
  42. bool Evaluate(const double* parameters, double* cost, double* gradient) const;
  43. bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
  44. };
  45. Instances of :class:`GradientProblem` represent general non-linear
  46. optimization problems that must be solved using just the value of the
  47. objective function and its gradient. Unlike the :class:`Problem`
  48. class, which can only be used to model non-linear least squares
  49. problems, instances of :class:`GradientProblem` not restricted in the
  50. form of the objective function.
  51. Structurally :class:`GradientProblem` is a composition of a
  52. :class:`FirstOrderFunction` and optionally a :class:`Manifold`.
  53. The :class:`FirstOrderFunction` is responsible for evaluating the cost
  54. and gradient of the objective function.
  55. The :class:`Manifold` is responsible for going back and forth between the
  56. ambient space and the local tangent space. When a :class:`Manifold` is not
  57. provided, then the tangent space is assumed to coincide with the ambient
  58. Euclidean space that the gradient vector lives in.
  59. The constructor takes ownership of the :class:`FirstOrderFunction` and
  60. :class:`Manifold` objects passed to it.
  61. .. function:: void Solve(const GradientProblemSolver::Options& options, const GradientProblem& problem, double* parameters, GradientProblemSolver::Summary* summary)
  62. Solve the given :class:`GradientProblem` using the values in
  63. ``parameters`` as the initial guess of the solution.
  64. Solving
  65. =======
  66. :class:`GradientProblemSolver::Options`
  67. ---------------------------------------
  68. .. class:: GradientProblemSolver::Options
  69. :class:`GradientProblemSolver::Options` controls the overall
  70. behavior of the solver. We list the various settings and their
  71. default values below.
  72. .. function:: bool GradientProblemSolver::Options::IsValid(std::string* error) const
  73. Validate the values in the options struct and returns true on
  74. success. If there is a problem, the method returns false with
  75. ``error`` containing a textual description of the cause.
  76. .. member:: LineSearchDirectionType GradientProblemSolver::Options::line_search_direction_type
  77. Default: ``LBFGS``
  78. Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
  79. ``BFGS`` and ``LBFGS``.
  80. .. member:: LineSearchType GradientProblemSolver::Options::line_search_type
  81. Default: ``WOLFE``
  82. Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
  83. Note that in order for the assumptions underlying the ``BFGS`` and
  84. ``LBFGS`` line search direction algorithms to be guaranteed to be
  85. satisfied, the ``WOLFE`` line search should be used.
  86. .. member:: NonlinearConjugateGradientType GradientProblemSolver::Options::nonlinear_conjugate_gradient_type
  87. Default: ``FLETCHER_REEVES``
  88. Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and
  89. ``HESTENES_STIEFEL``.
  90. .. member:: int GradientProblemSolver::Options::max_lbfs_rank
  91. Default: 20
  92. The L-BFGS hessian approximation is a low rank approximation to the
  93. inverse of the Hessian matrix. The rank of the approximation
  94. determines (linearly) the space and time complexity of using the
  95. approximation. Higher the rank, the better is the quality of the
  96. approximation. The increase in quality is however is bounded for a
  97. number of reasons.
  98. 1. The method only uses secant information and not actual
  99. derivatives.
  100. 2. The Hessian approximation is constrained to be positive
  101. definite.
  102. So increasing this rank to a large number will cost time and space
  103. complexity without the corresponding increase in solution
  104. quality. There are no hard and fast rules for choosing the maximum
  105. rank. The best choice usually requires some problem specific
  106. experimentation.
  107. .. member:: bool GradientProblemSolver::Options::use_approximate_eigenvalue_bfgs_scaling
  108. Default: ``false``
  109. As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
  110. step, the initial inverse Hessian approximation is taken to be the
  111. Identity. However, [Oren]_ showed that using instead :math:`I *
  112. \gamma`, where :math:`\gamma` is a scalar chosen to approximate an
  113. eigenvalue of the true inverse Hessian can result in improved
  114. convergence in a wide variety of cases. Setting
  115. ``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
  116. scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
  117. iteration).
  118. Precisely, approximate eigenvalue scaling equates to
  119. .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
  120. With:
  121. .. math:: y_k = \nabla f_{k+1} - \nabla f_k
  122. .. math:: s_k = x_{k+1} - x_k
  123. Where :math:`f()` is the line search objective and :math:`x` the
  124. vector of parameter values [NocedalWright]_.
  125. It is important to note that approximate eigenvalue scaling does
  126. **not** *always* improve convergence, and that it can in fact
  127. *significantly* degrade performance for certain classes of problem,
  128. which is why it is disabled by default. In particular it can
  129. degrade performance when the sensitivity of the problem to different
  130. parameters varies significantly, as in this case a single scalar
  131. factor fails to capture this variation and detrimentally downscales
  132. parts of the Jacobian approximation which correspond to
  133. low-sensitivity parameters. It can also reduce the robustness of the
  134. solution to errors in the Jacobians.
  135. .. member:: LineSearchInterpolationType GradientProblemSolver::Options::line_search_interpolation_type
  136. Default: ``CUBIC``
  137. Degree of the polynomial used to approximate the objective
  138. function. Valid values are ``BISECTION``, ``QUADRATIC`` and
  139. ``CUBIC``.
  140. .. member:: double GradientProblemSolver::Options::min_line_search_step_size
  141. The line search terminates if:
  142. .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
  143. where :math:`\|\cdot\|_\infty` refers to the max norm, and
  144. :math:`\Delta x_k` is the step change in the parameter values at
  145. the :math:`k`-th iteration.
  146. .. member:: double GradientProblemSolver::Options::line_search_sufficient_function_decrease
  147. Default: ``1e-4``
  148. Solving the line search problem exactly is computationally
  149. prohibitive. Fortunately, line search based optimization algorithms
  150. can still guarantee convergence if instead of an exact solution,
  151. the line search algorithm returns a solution which decreases the
  152. value of the objective function sufficiently. More precisely, we
  153. are looking for a step size s.t.
  154. .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
  155. This condition is known as the Armijo condition.
  156. .. member:: double GradientProblemSolver::Options::max_line_search_step_contraction
  157. Default: ``1e-3``
  158. In each iteration of the line search,
  159. .. math:: \text{new_step_size} \geq \text{max_line_search_step_contraction} * \text{step_size}
  160. Note that by definition, for contraction:
  161. .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
  162. .. member:: double GradientProblemSolver::Options::min_line_search_step_contraction
  163. Default: ``0.6``
  164. In each iteration of the line search,
  165. .. math:: \text{new_step_size} \leq \text{min_line_search_step_contraction} * \text{step_size}
  166. Note that by definition, for contraction:
  167. .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
  168. .. member:: int GradientProblemSolver::Options::max_num_line_search_step_size_iterations
  169. Default: ``20``
  170. Maximum number of trial step size iterations during each line
  171. search, if a step size satisfying the search conditions cannot be
  172. found within this number of trials, the line search will stop.
  173. As this is an 'artificial' constraint (one imposed by the user, not
  174. the underlying math), if ``WOLFE`` line search is being used, *and*
  175. points satisfying the Armijo sufficient (function) decrease
  176. condition have been found during the current search (in :math:`\leq`
  177. ``max_num_line_search_step_size_iterations``). Then, the step size
  178. with the lowest function value which satisfies the Armijo condition
  179. will be returned as the new valid step, even though it does *not*
  180. satisfy the strong Wolfe conditions. This behaviour protects
  181. against early termination of the optimizer at a sub-optimal point.
  182. .. member:: int GradientProblemSolver::Options::max_num_line_search_direction_restarts
  183. Default: ``5``
  184. Maximum number of restarts of the line search direction algorithm
  185. before terminating the optimization. Restarts of the line search
  186. direction algorithm occur when the current algorithm fails to
  187. produce a new descent direction. This typically indicates a
  188. numerical failure, or a breakdown in the validity of the
  189. approximations used.
  190. .. member:: double GradientProblemSolver::Options::line_search_sufficient_curvature_decrease
  191. Default: ``0.9``
  192. The strong Wolfe conditions consist of the Armijo sufficient
  193. decrease condition, and an additional requirement that the
  194. step size be chosen s.t. the *magnitude* ('strong' Wolfe
  195. conditions) of the gradient along the search direction
  196. decreases sufficiently. Precisely, this second condition
  197. is that we seek a step size s.t.
  198. .. math:: \|f'(\text{step_size})\| \leq \text{sufficient_curvature_decrease} * \|f'(0)\|
  199. Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
  200. of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
  201. .. member:: double GradientProblemSolver::Options::max_line_search_step_expansion
  202. Default: ``10.0``
  203. During the bracketing phase of a Wolfe line search, the step size
  204. is increased until either a point satisfying the Wolfe conditions
  205. is found, or an upper bound for a bracket containing a point
  206. satisfying the conditions is found. Precisely, at each iteration
  207. of the expansion:
  208. .. math:: \text{new_step_size} \leq \text{max_step_expansion} * \text{step_size}
  209. By definition for expansion
  210. .. math:: \text{max_step_expansion} > 1.0
  211. .. member:: int GradientProblemSolver::Options::max_num_iterations
  212. Default: ``50``
  213. Maximum number of iterations for which the solver should run.
  214. .. member:: double GradientProblemSolver::Options::max_solver_time_in_seconds
  215. Default: ``1e6``
  216. Maximum amount of time for which the solver should run.
  217. .. member:: double GradientProblemSolver::Options::function_tolerance
  218. Default: ``1e-6``
  219. Solver terminates if
  220. .. math:: \frac{|\Delta \text{cost}|}{\text{cost}} \leq \text{function_tolerance}
  221. where, :math:`\Delta \text{cost}` is the change in objective
  222. function value (up or down) in the current iteration of the line search.
  223. .. member:: double GradientProblemSolver::Options::gradient_tolerance
  224. Default: ``1e-10``
  225. Solver terminates if
  226. .. math:: \|x - \Pi \boxplus(x, -g(x))\|_\infty \leq \text{gradient_tolerance}
  227. where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
  228. is projection onto the bounds constraints and :math:`\boxplus` is
  229. Plus operation for the manifold associated with the parameter
  230. vector.
  231. .. member:: double GradientProblemSolver::Options::parameter_tolerance
  232. Default: ``1e-8``
  233. Solver terminates if
  234. .. math:: \|\Delta x\| \leq (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
  235. where :math:`\Delta x` is the step computed by the linear solver in
  236. the current iteration of the line search.
  237. .. member:: LoggingType GradientProblemSolver::Options::logging_type
  238. Default: ``PER_MINIMIZER_ITERATION``
  239. .. member:: bool GradientProblemSolver::Options::minimizer_progress_to_stdout
  240. Default: ``false``
  241. By default the :class:`Minimizer` progress is logged to ``STDERR``
  242. depending on the ``vlog`` level. If this flag is set to true, and
  243. :member:`GradientProblemSolver::Options::logging_type` is not
  244. ``SILENT``, the logging output is sent to ``STDOUT``.
  245. The progress display looks like
  246. .. code-block:: bash
  247. 0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02
  248. 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01
  249. 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
  250. Here
  251. #. ``f`` is the value of the objective function.
  252. #. ``d`` is the change in the value of the objective function if
  253. the step computed in this iteration is accepted.
  254. #. ``g`` is the max norm of the gradient.
  255. #. ``h`` is the change in the parameter vector.
  256. #. ``s`` is the optimal step length computed by the line search.
  257. #. ``it`` is the time take by the current iteration.
  258. #. ``tt`` is the total time taken by the minimizer.
  259. .. member:: std::vector<IterationCallback> GradientProblemSolver::Options::callbacks
  260. Callbacks that are executed at the end of each iteration of the
  261. :class:`Minimizer`. They are executed in the order that they are
  262. specified in this vector. By default, parameter blocks are updated
  263. only at the end of the optimization, i.e., when the
  264. :class:`Minimizer` terminates. This behavior is controlled by
  265. :member:`GradientProblemSolver::Options::update_state_every_iteration`. If
  266. the user wishes to have access to the update parameter blocks when
  267. his/her callbacks are executed, then set
  268. :member:`GradientProblemSolver::Options::update_state_every_iteration`
  269. to true.
  270. The solver does NOT take ownership of these pointers.
  271. .. member:: bool GradientProblemSolver::Options::update_state_every_iteration
  272. Default: ``false``
  273. Normally the parameter vector is only updated when the solver
  274. terminates. Setting this to true updates it every iteration. This
  275. setting is useful when building an interactive application using
  276. Ceres and using an :class:`IterationCallback`.
  277. :class:`GradientProblemSolver::Summary`
  278. ---------------------------------------
  279. .. class:: GradientProblemSolver::Summary
  280. Summary of the various stages of the solver after termination.
  281. .. function:: std::string GradientProblemSolver::Summary::BriefReport() const
  282. A brief one line description of the state of the solver after
  283. termination.
  284. .. function:: std::string GradientProblemSolver::Summary::FullReport() const
  285. A full multiline description of the state of the solver after
  286. termination.
  287. .. function:: bool GradientProblemSolver::Summary::IsSolutionUsable() const
  288. Whether the solution returned by the optimization algorithm can be
  289. relied on to be numerically sane. This will be the case if
  290. `GradientProblemSolver::Summary:termination_type` is set to `CONVERGENCE`,
  291. `USER_SUCCESS` or `NO_CONVERGENCE`, i.e., either the solver
  292. converged by meeting one of the convergence tolerances or because
  293. the user indicated that it had converged or it ran to the maximum
  294. number of iterations or time.
  295. .. member:: TerminationType GradientProblemSolver::Summary::termination_type
  296. The cause of the minimizer terminating.
  297. .. member:: std::string GradientProblemSolver::Summary::message
  298. Reason why the solver terminated.
  299. .. member:: double GradientProblemSolver::Summary::initial_cost
  300. Cost of the problem (value of the objective function) before the
  301. optimization.
  302. .. member:: double GradientProblemSolver::Summary::final_cost
  303. Cost of the problem (value of the objective function) after the
  304. optimization.
  305. .. member:: std::vector<IterationSummary> GradientProblemSolver::Summary::iterations
  306. :class:`IterationSummary` for each minimizer iteration in order.
  307. .. member:: int num_cost_evaluations
  308. Number of times the cost (and not the gradient) was evaluated.
  309. .. member:: int num_gradient_evaluations
  310. Number of times the gradient (and the cost) were evaluated.
  311. .. member:: double GradientProblemSolver::Summary::total_time_in_seconds
  312. Time (in seconds) spent in the solver.
  313. .. member:: double GradientProblemSolver::Summary::cost_evaluation_time_in_seconds
  314. Time (in seconds) spent evaluating the cost vector.
  315. .. member:: double GradientProblemSolver::Summary::gradient_evaluation_time_in_seconds
  316. Time (in seconds) spent evaluating the gradient vector.
  317. .. member:: int GradientProblemSolver::Summary::num_parameters
  318. Number of parameters in the problem.
  319. .. member:: int GradientProblemSolver::Summary::num_tangent_parameters
  320. Dimension of the tangent space of the problem. This is different
  321. from :member:`GradientProblemSolver::Summary::num_parameters` if a
  322. :class:`Manifold` object is used.
  323. .. member:: LineSearchDirectionType GradientProblemSolver::Summary::line_search_direction_type
  324. Type of line search direction used.
  325. .. member:: LineSearchType GradientProblemSolver::Summary::line_search_type
  326. Type of the line search algorithm used.
  327. .. member:: LineSearchInterpolationType GradientProblemSolver::Summary::line_search_interpolation_type
  328. When performing line search, the degree of the polynomial used to
  329. approximate the objective function.
  330. .. member:: NonlinearConjugateGradientType GradientProblemSolver::Summary::nonlinear_conjugate_gradient_type
  331. If the line search direction is `NONLINEAR_CONJUGATE_GRADIENT`,
  332. then this indicates the particular variant of non-linear conjugate
  333. gradient used.
  334. .. member:: int GradientProblemSolver::Summary::max_lbfgs_rank
  335. If the type of the line search direction is `LBFGS`, then this
  336. indicates the rank of the Hessian approximation.