Reference documentation for deal.II version 9.1.0-pre
Namespaces
Automatic differentiation

A module dedicated to the implementation of functions and classes that relate to automatic differentiation. More...

Namespaces

 Differentiation
 
 Differentiation::AD
 

Detailed Description

A module dedicated to the implementation of functions and classes that relate to automatic differentiation.

Below we provide a very brief introduction as to what automatic differentiation is, what variations of this computational / numerical scheme exist, and how it is integrated within deal.II's framework.

Automatic differentiation

Automatic differentiation (commonly also referred to as algorithmic differentiation), is a numerical method that can be used to "automatically" compute the first, and perhaps higher-order, derivatives of function(s) with respect to one or more input variables. Although this comes at a certain computational cost, the benefits to using such a tool may be significant. When used correctly the derivatives of often complicated functions can be computed to a very high accuracy. Although the exact accuracy achievable by these frameworks largely depends on their underlying mathematical formulation, some implementations compute with a precision on the order of machine accuracy. Note that this is different to classical numerical differentiation, which has an accuracy that depends on both the perturbation size as well as the chosen finite-difference scheme (and is measurably larger than well-formulated automatic differentiation approaches).

Three practical examples of auto-differentiation use within a finite-element context would then be

There are quite a number of implementations for auto-differentiable numbers. They primarily fall into two broad categories, namely source code transformation and operator overloading. The first method generates new, compilable code based on some input function that, when executed, returns the derivatives of the input function. The second exploits the capability of C++ operator definitions to be overloaded for custom class types. Therefore a class that represents such an auto-differentiable number can, following each mathematical operation performed on or with it, in principle evaluate and keep track of its value as well as that of its directional derivative(s). As the libraries exclusively implementing the source code transformation approach collectively describe highly specialized tools that are to be used as function preprocessors, they have no direct support within deal.II itself. The latter, however, represent specialized number types that can be supported through the use of template metaprogramming in the appropriate context. Given the examples presented above, this means that the FEValues class (and friends), as well as the Tensor and SymmetricTensor classes should support calculations performed with these specialized numbers. (In theory an entire program could be made differentiable. This could be useful in, for example, the sentitivity analysis of solutions with respect to input parameters. However, to date this has not been tested.)

Implementations of specialized frameworks based on operator overloading typically fall into one of three categories. In each, some customized data classes representing the floating point value of an evaluated function and its derivative(s) by

  1. exploiting dual / complex-step / hyper-dual formulations (occasionally called tapeless methods),
  2. those utilizing taping strategies, and
  3. those using compile-time optimization through expression templates.

To provide some tentative insight into how these various implementations might look like in practice, we offer the following generic summary of these approaches:

  1. The first two tapeless approaches listed above (dual numbers and complex-step method) use some variation of a truncated Taylor series, along with a particular choice of definition for the perturbation parameter, to compute function derivatives using a finite-difference based approach. The "dual" number constitutes the accumulated directional derivatives computed simultaneously as the function values are evaluated; in the complex-step approach, the imaginary value effectively serves this purpose. The choice of the perturbation parameter determines the numerical qualities of the scheme, such as the influence of the truncation of the Taylor scheme; dual numbers do not contain any higher-order terms in their first derivative, while for the complex-step method these existent higher-order terms are neglected. It can be shown that both of these methods are not subject to subtractive cancellation errors and that, within their finite-difference scheme, they are not numerically sensitive to the internal step-size chosen for the numerical perturbation. The dual number approach thus produces exact first derivatives, while the complex-step approximation does not. The standard implementation of the dual numbers, however, cannot yield exact values for second derivatives. Hyper-dual numbers take a different view of this idea, with numbers begin represented in a form similar to quaternions (i.e. carrying additional non-real components) and the derivatives being computed from a high-order truncation of the Taylor series all four components. The outcome that, with the appropriate implementation, both first and second derivatives can be computed exactly.
  2. With taped approaches, a specified subregion of code is selected as one for which all operations executed with active (marked) input variables are tracked and recorded in a data structure referred to as a tape. At the end of the taped region, the recorded function(s) may be revaluated by "replaying" the tape with a different set of input variables instead of recomputing the function directly. Assuming that the taped region represents a smooth function, arbitrarily high-order derivatives of the function then can be computed by referring to the code path tracked and stored on the tape. (This could perhaps be achieved, for example, through evaluation of the function around the point of interest.) There exist strategies to deal with situations where the taped function is not smooth at the evaluated point, or if it is not analytic. Furthermore, one might need to consider the case of branched functions, where the tape is no longer sequential, but rather forks off on a different evaluation path to that due to the original recorded inputs.
  3. Methods based on expression templates leverage the computational graph (in this case, a directed acyclic graph (DAG)), constructed from the abstract syntax tree (AST), that resolves the function output from its input values. The outermost leaves on the tree represent the independent variables or constants, and are transformed by unary operators and connected by binary operators (in the most simple case). Therefore, the operations performed on the function inputs is known at compile time, and with that the associated derivative operation can also be defined at the same time. The compiled output type returned by this operator need not be generic, but can rather be specialized based on the specific inputs (possibly carrying a differential history) given to that specific operator on the vertex of the DAG. In this way, a compile-time optimized set of instructions can be generated for the very specialized individual operations used to evaluate each intermediate result of the dependent function.

Each of these methods, of course, has its advantages and disadvantages, and one may be more appropriate than another for a given problem that is to be solved. As the aforemetioned implementational details (and others not discussed) may be hidden from the user, it may still be important to understand the implications, run-time cost, and potential limitations, of using any one of these "black-box" auto-differentiable numbers.

In addition to the supplied linked articles, resources used to furnish the details supplied here include:

1 @InProceedings{Fike2011a,
2  author = {Fike, Jeffrey A and Alonso, Juan J},
3  title = {The Development of Hyper-Dual Numbers for Exact Second-Derivative Calculations},
4  booktitle = {49th {AIAA} Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition},
5  year = {2011},
6  volume = {886},
7  pages = {124},
8  month = {jan},
9  publisher = {American Institute of Aeronautics and Astronautics},
10  doi = {10.2514/6.2011-886},
11 }
1 @Manual{Walther2009a,
2  title = {Getting Started with ADOL-C},
3  author = {Walther, Andrea and Griewank, Andreas},
4  year = {2009},
5  booktitle = {Combinatorial scientific computing},
6  doi = {10.1.1.210.4834},
7  pages = {181--202}
8 }

Exploitation of the chain-rule

In the most practical sense, any of the above categories exploit the chain-rule to compute the total derivative of a composite function. To perform this action, they typically use one of two mechanisms to compute derivatives, specifically

As a point of interest, the optimal Jacobian accumulation, which performs a minimal set of computations, lies somewhere between these two limiting cases. Its computation for a general composite function remains an open problem in graph theory.

With the aid of the diagram below (it and some of the listed details courtesy of this Wikipedia article),

Forward mode automatic differentiation
Forward mode automatic differentiation
Reverse mode automatic differentiation
Reverse mode automatic differentiation

representing the calculation of the function \(f (\mathbf{x}) = x_{1} \times x_{2} + \sin (x_{1})\), we will briefly describe what forward and reverse auto-differentiation are. Note that in the diagram, along the edges of the graph in text are the directional derivative of function \(w\) with respect to the \(i\)-th variable, represented by the notation \(\dot{w} = \dfrac{d w}{d x_{i}}\). The specific computations used to render the function value and its directional derivatives for this example are tabulated in the source article. For a second illustrative example, we refer the interested reader to this article.

Consider first that any composite function \(f(x)\), here represented as having two independent variables, can be dissected into a composition of its elementary functions

\[ f (\mathbf{x}) = f_{0} \circ f_{1} \circ f_{2} \circ \ldots \circ f_{n} (\mathbf{x}) \quad . \]

As was previously mentioned, if each of the primitive operations \(f_{n}\) is smooth and differentiable, then the chain-rule can be universally employed to compute the total derivative of \(f\), namely \(\dfrac{d f(x)}{d \mathbf{x}}\). What distinguishes the "forward" from the "reverse" mode is how the chain-rule is evaluated, but ultimately both compute the total derivative

\[ \dfrac{d f (\mathbf{x})}{d \mathbf{x}} = \dfrac{d f_{0}}{d f_{1}} \dfrac{d f_{1}}{d f_{2}} \dfrac{d f_{2}}{d f_{3}} \ldots \dfrac{d f_{n} (\mathbf{x})}{d \mathbf{x}} \quad . \]

In forward-mode, the chain-rule is computed naturally from the "inside out". The independent variables are therefore fixed, and each sub-function \(f'_{i} \vert_{f'_{i+1}}\) is computed recursively and its result returned as inputs to the parent function. Encapsulating and fixing the order of operations using parentheses, this means that we compute

\[ \dfrac{d f (\mathbf{x})}{d \mathbf{x}} = \dfrac{d f_{0}}{d f_{1}} \left( \dfrac{d f_{1}}{d f_{2}} \left(\dfrac{d f_{2}}{d f_{3}} \left(\ldots \left( \dfrac{d f_{n} (\mathbf{x})}{d \mathbf{x}} \right)\right)\right)\right) \quad . \]

The computational complexity of a forward-sweep is proportional to that of the input function. However, for each directional derivative that is to be computed one sweep of the computational graph is required.

In reverse-mode, the chain-rule is computed somewhat unnaturally from the "outside in". The values of the dependent variables first get computed and fixed, and then the preceeding differential operations are evaluated and multiplied in succession with the previous results from left to right. Again, if we encapsulate and fix the order of operations using parentheses, this implies that the reverse calculation is performed by

\[ \dfrac{d f (\mathbf{x})}{d \mathbf{x}} = \left( \left( \left( \left( \left( \dfrac{d f_{0}}{d f_{1}} \right) \dfrac{d f_{1}}{d f_{2}} \right) \dfrac{d f_{2}}{d f_{3}} \right) \ldots \right) \dfrac{d f_{n} (\mathbf{x})}{d \mathbf{x}} \right) \quad . \]

The intermediate values \(\dfrac{d f_{i-1}}{d f_{i}}\) are known as adjoints, which must be computed and stored as the computational graph is traversed. However, for each dependent scalar function one sweep of the computational graph renders all directional derivatives at once.

Overall, the efficiency of each mode is determined by the number of independent (input) variables and dependent (output) variables. If the outputs greatly exceed the inputs in number, then forward-mode can be shown to be more efficient than reverse-mode. The converse is true when the number of input variables greatly exceeds that of the output variables. This point may be used to help inform which number type is most suitable for which set of operations are to be performed using automatic differentiation. For example, in many applications for which second derivatives are to be computed it is appropriate to combine both reverse- and forward-modes. The former would then typically be used to calculate the first derivatives, and the latter the second derivatives.

Supported automatic differentiation libraries

We currently have validated implementations for the following number types and combinations:

Note that in the above, "dynamic memory allocation" refers to the fact that the number of independent variables need not be specified at compile time.

The ADOL-C user manual

1 @Manual{Walther2009a,
2  title = {Getting Started with ADOL-C},
3  author = {Walther, Andrea and Griewank, Andreas},
4  year = {2009},
5  booktitle = {Combinatorial scientific computing},
6  doi = {10.1.1.210.4834},
7  pages = {181--202},
8  url = {https://projects.coin-or.org/ADOL-C/browser/trunk/ADOL-C/doc/adolc-manual.pdf}
9 }

provides the principle insights into their taped and tapeless implementations, and how ADOL-C can be incorporated into a user code. Some further useful resources for understanding the implementation of ADOL-C, and possibilities for how it may be used within a numerical code, include:

1 @Article{Griewank1996a,
2  author = {Griewank, Andreas and Juedes, David and Utke, Jean},
3  title = {Algorithm 755: {ADOL-C}: a package for the automatic differentiation of algorithms written in {C/C++}},
4  journal = {ACM Transactions on Mathematical Software (TOMS)},
5  year = {1996},
6  volume = {22},
7  number = {2},
8  pages = {131--167},
9  doi = {10.1145/229473.229474},
10  publisher = {ACM}
11 }
1 @InCollection{Bischof2008a,
2  author = {Bischof, Christian and Guertler, Niels and Kowarz, Andreas and Walther, Andrea},
3  title = {Parallel reverse mode automatic differentiation for OpenMP programs with ADOL-C},
4  booktitle = {Advances in Automatic Differentiation},
5  publisher = {Springer},
6  year = {2008},
7  pages = {163--173}
8 }
1 @InBook{Kulshreshtha2012a,
2  chapter = {Computing Derivatives in a Meshless Simulation Using Permutations in {ADOL}-C},
3  pages = {321--331},
4  title = {Recent Advances in Algorithmic Differentiation},
5  publisher = {Springer Berlin Heidelberg},
6  year = {2012},
7  author = {Kshitij Kulshreshtha and Jan Marburger},
8  editor = {Forth S. and Hovland P. and Phipps E. and Utke J. and Walther A.},
9  series = {Lecture Notes in Computational Science and Engineering},
10  doi = {10.1007/978-3-642-30023-3_29},
11 }
1 @InProceedings{Kulshreshtha2013a,
2  author = {Kulshreshtha, Kshitij and Koniaeva, Alina},
3  title = {Vectorizing the forward mode of ADOL-C on a GPU using CUDA},
4  booktitle = {13th European AD Workshop},
5  year = {2013},
6  month = jun
7 }

Similarly, a selection of useful resources for understanding the implementation of Sacado number types (in particular, how expression templating is employed and exploited) include:

1 @InCollection{Bartlett2006a,
2  author = {Bartlett, R. A. and Gay, D. M. and Phipps, E. T.},
3  title = {Automatic Differentiation of C++ Codes for Large-Scale Scientific Computing},
4  booktitle = {International Conference on Computational Science {\textendash} {ICCS} 2006},
5  publisher = {Springer Berlin Heidelberg},
6  year = {2006},
7  editor = {Alexandrov, V.N. and van Albada, G.D. and Sloot, P.M.A. amd Dongarra, J.},
8  pages = {525--532},
9  doi = {10.1007/11758549_73},
10  organization = {Springer}
11 }
1 @InBook{Gay2012a,
2  chapter = {Using expression graphs in optimization algorithms},
3  pages = {247--262},
4  title = {Mixed Integer Nonlinear Programming},
5  publisher = {Springer New York},
6  year = {2012},
7  author = {Gay, D. M.},
8  editor = {Lee, J. and Leyffer, S.},
9  isbn = {978-1-4614-1927-3},
10  doi = {10.1007/978-1-4614-1927-3_8}
11 }
1 @InBook{Phipps2012a,
2  chapter = {Efficient Expression Templates for Operator Overloading-based Automatic Differentiation},
3  pages = {309--319},
4  title = {Recent Advances in Algorithmic Differentiation},
5  publisher = {Springer},
6  year = {2012},
7  author = {Eric Phipps and Roger Pawlowski},
8  editor = {Forth S. and Hovland P. and Phipps E. and Utke J. and Walther A.},
9  series = {Lecture Notes in Computational Science and Engineering},
10  volume = {73},
11  date = {2012-05-15},
12  doi = {10.1007/978-3-642-30023-3_28},
13  eprint = {1205.3506v1},
14  eprintclass = {cs.MS},
15  eprinttype = {arXiv}
16 }

The implementation of both forward- and reverse-mode Sacado numbers is quite intricate. As of Trilinos 12.12, the implementation of math operations involves a lot of preprocessor directives and macro programming. Accordingly, the code may be hard to follow and there exists no meaningful companion documentation for these classes. So, a useful resource for understanding the principle implementation of these numbers can be found at this link for the Sacado::Fad::SimpleFad class that outlines a reference (although reportedly inefficient) implementation of a forward-mode auto-differentiable number that does not use expression templates. (Although not explicitly stated, it would appear that the Sacado::Fad::SimpleFad class is implemented in the spirit of dual numbers.)

How automatic differentiation is integrated into deal.II

Since the interface to each automatic differentiation library is so vastly different, a uniform internal interface to each number will be established in the near future. The goal will be to allow some driver classes (that provide the core functionality, and will later be introduced in the next section) to have a consistent mechanism to interact with different auto-differentiation libraries. Specifically, they need to be able to correctly initialize and finalize data that is to be interpreted as the dependent and independent variables of a formula.

A summary of the files that implement the interface to the supported auto-differentiable numbers is as follows:

By using type codes for each supported number type, we artificially limit the type of auto-differentiable numbers that can be used within the library. This design choice is due to the fact that its not trivial to ensure that each number type is correctly initialized and that all combinations of nested (templated) types remain valid for all operations performed by the library. Furthermore, there are some lengthy functions within the library that are instantiated for the supported number types and have internal checks that are only satisfied when a auto-differentiable number, of which the library has knowledge, is used. This again ensures that the integrity of all computations is maintained. Finally, using a simple enumeration as a class template parameter ultimately makes it really easy to switch between the type used in production code with little to no further amendments required to user code.

User interface to the automatic differentiation libraries

As of the current release, there is no formal, unified interface to the automatic differentation libraries that we support. It is therefore necessary for users to manage the initialization and derivative computations themselves.

The most up-to-date examples of how this is done using ADOL-C can be found in

while for Sacado, illustrative examples can be found in