This tutorial depends on step-4.

Table of contents

Introduction
The commented program

Results
The plain program

Introduction

Note: The material presented here is also discussed in video lecture 19, video lecture 20, video lecture 21. (All video lectures are also available here.)

This program is devoted to two aspects: the use of mixed finite elements – in particular Raviart-Thomas elements – and using block matrices to define solvers, preconditioners, and nested versions of those that use the substructure of the system matrix. The equation we are going to solve is again the Laplace equation, though with a matrix-valued coefficient:

\begin{eqnarray*} -\nabla \cdot K({\mathbf x}) \nabla p &=& f \qquad {\textrm{in}\ } \Omega, \\ p &=& g \qquad {\textrm{on}\ }\partial\Omega. \end{eqnarray*}

\(K({\mathbf x})\) is assumed to be uniformly positive definite, i.e., there is \(\alpha>0\) such that the eigenvalues \(\lambda_i({\mathbf x})\) of \(K(x)\) satisfy \(\lambda_i({\mathbf x})\ge \alpha\). The use of the symbol \(p\) instead of the usual \(u\) for the solution variable will become clear in the next section.

After discussing the equation and the formulation we are going to use to solve it, this introduction will cover the use of block matrices and vectors, the definition of solvers and preconditioners, and finally the actual test case we are going to solve.

We are going to extend this tutorial program in step-21 to solve not only the mixed Laplace equation, but add another equation that describes the transport of a mixture of two fluids.

The equations covered here fall into the class of vector-valued problems. A toplevel overview of this topic can be found in the Handling vector valued problems module.

Formulation, weak form, and discrete problem

In the form above, the Laplace equation is generally considered a good model equation for fluid flow in porous media. In particular, if flow is so slow that all dynamic effects such as the acceleration terms in the Navier-Stokes equation become irrelevant, and if the flow pattern is stationary, then the Laplace equation models the pressure that drives the flow reasonably well. (Because the solution variable is a pressure, we here use the name \(p\) instead of the name \(u\) more commonly used for the solution of partial differential equations.)

Typical applications of this view of the Laplace equation are then modeling groundwater flow, or the flow of hydrocarbons in oil reservoirs. In these applications, \(K\) is then the permeability tensor, i.e. a measure for how much resistance the soil or rock matrix asserts on the fluid flow. In the applications just named, a desirable feature is that the numerical scheme is locally conservative, i.e. that whatever flows into a cell also flows out of it (or the difference is equal to the integral over the source terms over each cell, if the sources are nonzero). However, as it turns out, the usual discretizations of the Laplace equation do not satisfy this property. On the other hand, one can achieve this by choosing a different formulation.

To this end, one first introduces a second variable, called the flux, \({\mathbf u}=-K\nabla p\). By its definition, the flux is a vector in the negative direction of the pressure gradient, multiplied by the permeability tensor. If the permeability tensor is proportional to the unit matrix, this equation is easy to understand and intuitive: the higher the permeability, the higher the flux; and the flux is proportional to the gradient of the pressure, going from areas of high pressure to areas of low pressure.

With this second variable, one then finds an alternative version of the Laplace equation, called the mixed formulation:

\begin{eqnarray*} K^{-1} {\mathbf u} + \nabla p &=& 0 \qquad {\textrm{in}\ } \Omega, \\ -{\textrm{div}}\ {\mathbf u} &=& -f \qquad {\textrm{in}\ }\Omega, \\ p &=& g \qquad {\textrm{on}\ } \partial\Omega. \end{eqnarray*}

Here, we have multiplied the equation defining the velocity \({\mathbf u}\) by \(K^{-1}\) because this makes the set of equations symmetric: one of the equations has the gradient, the second the negative divergence, and these two are of course adjoints of each other, resulting in a symmetric bilinear form and a consequently symmetric system matrix under the common assumption that \(K\) is a symmetric tensor.

The weak formulation of this problem is found by multiplying the two equations with test functions and integrating some terms by parts:

\begin{eqnarray*} A(\{{\mathbf u},p\},\{{\mathbf v},q\}) = F(\{{\mathbf v},q\}), \end{eqnarray*}

where

\begin{eqnarray*} A(\{{\mathbf u},p\},\{{\mathbf v},q\}) &=& ({\mathbf v}, K^{-1}{\mathbf u})_\Omega - ({\textrm{div}}\ {\mathbf v}, p)_\Omega - (q,{\textrm{div}}\ {\mathbf u})_\Omega \\ F(\{{\mathbf v},q\}) &=& -(g,{\mathbf v}\cdot {\mathbf n})_{\partial\Omega} - (f,q)_\Omega. \end{eqnarray*}

Here, \({\mathbf n}\) is the outward normal vector at the boundary. Note how in this formulation, Dirichlet boundary values of the original problem are incorporated in the weak form.

To be well-posed, we have to look for solutions and test functions in the space \(H({\textrm{div}})=\{{\mathbf w}\in L^2(\Omega)^d:\ {\textrm{div}}\ {\mathbf w}\in L^2\}\) for \(\mathbf u\), \(\mathbf v\), and \(L^2\) for \(p,q\). It is a well-known fact stated in almost every book on finite element theory that if one chooses discrete finite element spaces for the approximation of \({\mathbf u},p\) inappropriately, then the resulting discrete saddle-point problem is instable and the discrete solution will not converge to the exact solution.

To overcome this, a number of different finite element pairs for \({\mathbf u},p\) have been developed that lead to a stable discrete problem. One such pair is to use the Raviart-Thomas spaces \(RT(k)\) for the velocity \({\mathbf u}\) and discontinuous elements of class \(DQ(k)\) for the pressure \(p\). For details about these spaces, we refer in particular to the book on mixed finite element methods by Brezzi and Fortin, but many other books on the theory of finite elements, for example the classic book by Brenner and Scott, also state the relevant results.

Assembling the linear system

The deal.II library (of course) implements Raviart-Thomas elements \(RT(k)\) of arbitrary order \(k\), as well as discontinuous elements \(DG(k)\). If we forget about their particular properties for a second, we then have to solve a discrete problem

\begin{eqnarray*} A(x_h,w_h) = F(w_h), \end{eqnarray*}

with the bilinear form and right hand side as stated above, and \(x_h=\{{\mathbf u}_h,p_h\}\), \(w_h=\{{\mathbf v}_h,q_h\}\). Both \(x_h\) and \(w_h\) are from the space \(X_h=RT(k)\times DQ(k)\), where \(RT(k)\) is itself a space of \(dim\)-dimensional functions to accommodate for the fact that the flow velocity is vector-valued. The necessary question then is: how do we do this in a program?

Vector-valued elements have already been discussed in previous tutorial programs, the first time and in detail in step-8. The main difference there was that the vector-valued space \(V_h\) is uniform in all its components: the \(dim\) components of the displacement vector are all equal and from the same function space. What we could therefore do was to build \(V_h\) as the outer product of the \(dim\) times the usual \(Q(1)\) finite element space, and by this make sure that all our shape functions have only a single non-zero vector component. Instead of dealing with vector-valued shape functions, all we did in step-8 was therefore to look at the (scalar) only non-zero component and use the fe.system_to_component_index(i).first call to figure out which component this actually is.

This doesn't work with Raviart-Thomas elements: following from their construction to satisfy certain regularity properties of the space \(H({\textrm{div}})\), the shape functions of \(RT(k)\) are usually nonzero in all their vector components at once. For this reason, were fe.system_to_component_index(i).first applied to determine the only nonzero component of shape function \(i\), an exception would be generated. What we really need to do is to get at all vector components of a shape function. In deal.II diction, we call such finite elements non-primitive, whereas finite elements that are either scalar or for which every vector-valued shape function is nonzero only in a single vector component are called primitive.

So what do we have to do for non-primitive elements? To figure this out, let us go back in the tutorial programs, almost to the very beginnings. There, we learned that we use the FEValues class to determine the values and gradients of shape functions at quadrature points. For example, we would call fe_values.shape_value(i,q_point) to obtain the value of the ith shape function on the quadrature point with number q_point. Later, in step-8 and other tutorial programs, we learned that this function call also works for vector-valued shape functions (of primitive finite elements), and that it returned the value of the only non-zero component of shape function i at quadrature point q_point.

For non-primitive shape functions, this is clearly not going to work: there is no single non-zero vector component of shape function i, and the call to fe_values.shape_value(i,q_point) would consequently not make much sense. However, deal.II offers a second function call, fe_values.shape_value_component(i,q_point,comp) that returns the value of the compth vector component of shape function i at quadrature point q_point, where comp is an index between zero and the number of vector components of the present finite element; for example, the element we will use to describe velocities and pressures is going to have \(dim+1\) components. It is worth noting that this function call can also be used for primitive shape functions: it will simply return zero for all components except one; for non-primitive shape functions, it will in general return a non-zero value for more than just one component.

We could now attempt to rewrite the bilinear form above in terms of vector components. For example, in 2d, the first term could be rewritten like this (note that \(u_0=x_0, u_1=x_1, p=x_2\)):

\begin{eqnarray*} ({\mathbf u}_h^i, K^{-1}{\mathbf u}_h^j) = &\left((x_h^i)_0, K^{-1}_{00} (x_h^j)_0\right) + \left((x_h^i)_0, K^{-1}_{01} (x_h^j)_1\right) + \\ &\left((x_h^i)_1, K^{-1}_{10} (x_h^j)_0\right) + \left((x_h^i)_1, K^{-1}_{11} (x_h^j)_1\right). \end{eqnarray*}

If we implemented this, we would get code like this:

for (unsigned int q=0; q<n_q_points; ++q)
  for (unsigned int i=0; i<dofs_per_cell; ++i)
    for (unsigned int j=0; j<dofs_per_cell; ++j)
      local_matrix(i,j) += (k_inverse_values[q][0][0] *
                            fe_values.shape_value_component(i,q,0) *
                            fe_values.shape_value_component(j,q,0)
                            +
                            k_inverse_values[q][0][1] *
                            fe_values.shape_value_component(i,q,0) *
                            fe_values.shape_value_component(j,q,1)
                            +
                            k_inverse_values[q][1][0] *
                            fe_values.shape_value_component(i,q,1) *
                            fe_values.shape_value_component(j,q,0)
                            +
                            k_inverse_values[q][1][1] *
                            fe_values.shape_value_component(i,q,1) *
                            fe_values.shape_value_component(j,q,1)
                           )
                            
                           fe_values.JxW(q);

This is, at best, tedious, error prone, and not dimension independent. There are obvious ways to make things dimension independent, but in the end, the code is simply not pretty. What would be much nicer is if we could simply extract the \({\mathbf u}\) and \(p\) components of a shape function \(x_h^i\). In the program we do that in the following way:

const FEValuesExtractors::Vector velocities (0);
const FEValuesExtractors::Scalar pressure (dim);
...
for (unsigned int q=0; q<n_q_points; ++q)
  for (unsigned int i=0; i<dofs_per_cell; ++i)
    for (unsigned int j=0; j<dofs_per_cell; ++j)
      local_matrix(i,j) += (fe_values[velocities].value (i, q) *
                            k_inverse_values[q] *
                            fe_values[velocities].value (j, q)
                            -
                            fe_values[velocities].divergence (i, q) *
                            fe_values[pressure].value (j, q)
                            -
                            fe_values[pressure].value (i, q) *
                            fe_values[velocities].divergence (j, q)) *
                            fe_values.JxW(q);

This is, in fact, not only the first term of the bilinear form, but the whole thing (sans boundary contributions).

What this piece of code does is, given an fe_values object, to extract the values of the first \(dim\) components of shape function i at quadrature points q, that is the velocity components of that shape function. Put differently, if we write shape functions \(x_h^i\) as the tuple \(\{{\mathbf u}_h^i,p_h^i\}\), then the function returns the velocity part of this tuple. Note that the velocity is of course a dim-dimensional tensor, and that the function returns a corresponding object. Similarly, where we subscript with the pressure extractor, we extract the scalar pressure component. The whole mechanism is described in more detail in the Handling vector valued problems module.

In practice, it turns out that we can do a bit better if we evaluate the shape functions, their gradients and divergences only once per outermost loop, and store the result, as this saves us a few otherwise repeated computations (it is possible to save even more repeated operations by calculating all relevant quantities in advance and then only inserting the results in the actual loop, see step-22 for a realization of that approach). The final result then looks like this, working in every space dimension:

for (const auto &cell : dof_handler.active_cell_iterators())
  {
    fe_values.reinit (cell);
    local_matrix = 0;
    local_rhs = 0;
    right_hand_side.value_list (fe_values.get_quadrature_points(),
                                rhs_values);
    k_inverse.value_list (fe_values.get_quadrature_points(),
                          k_inverse_values);
    for (unsigned int q=0; q<n_q_points; ++q)
      for (unsigned int i=0; i<dofs_per_cell; ++i)
        {
          const Tensor<1,dim> phi_i_u     = fe_values[velocities].value (i, q);
          const double        div_phi_i_u = fe_values[velocities].divergence (i, q);
          const double        phi_i_p     = fe_values[pressure].value (i, q);
          for (unsigned int j=0; j<dofs_per_cell; ++j)
            {
              const Tensor<1,dim> phi_j_u     = fe_values[velocities].value (j, q);
              const double        div_phi_j_u = fe_values[velocities].divergence (j, q);
              const double        phi_j_p     = fe_values[pressure].value (j, q);
              local_matrix(i,j) += (phi_i_u * k_inverse_values[q] * phi_j_u
                                    - div_phi_i_u * phi_j_p
                                    - phi_i_p * div_phi_j_u) *
                                   fe_values.JxW(q);
            }
          local_rhs(i) += -phi_i_p *
                          rhs_values[q] *
                          fe_values.JxW(q);
        }

This very closely resembles the form in which we have originally written down the bilinear form and right hand side.

There is one final term that we have to take care of: the right hand side contained the term \((g,{\mathbf v}\cdot {\mathbf n})_{\partial\Omega}\), constituting the weak enforcement of pressure boundary conditions. We have already seen in step-7 how to deal with face integrals: essentially exactly the same as with domain integrals, except that we have to use the FEFaceValues class instead of FEValues. To compute the boundary term we then simply have to loop over all boundary faces and integrate there. The mechanism works in the same way as above, i.e. the extractor classes also work on FEFaceValues objects:

for (unsigned int face_no=0;
     face_no<GeometryInfo<dim>::faces_per_cell;
     ++face_no)
  if (cell->at_boundary(face_no))
    {
      fe_face_values.reinit (cell, face_no);
      pressure_boundary_values
        .value_list (fe_face_values.get_quadrature_points(),
                     boundary_values);
      for (unsigned int q=0; q<n_face_q_points; ++q)
        for (unsigned int i=0; i<dofs_per_cell; ++i)
          local_rhs(i) += -(fe_face_values[velocities].value (i, q) *
                            fe_face_values.normal_vector(q) *
                            boundary_values[q] *
                            fe_face_values.JxW(q));
    }

You will find the exact same code as above in the sources for the present program. We will therefore not comment much on it below.

Linear solvers and preconditioners

After assembling the linear system we are faced with the task of solving it. The problem here is: the matrix has a zero block at the bottom right (there is no term in the bilinear form that couples the pressure \(p\) with the pressure test function \(q\)), and it is indefinite. At least it is symmetric. In other words: the Conjugate Gradient method is not going to work. We would have to resort to other iterative solvers instead, such as MinRes, SymmLQ, or GMRES, that can deal with indefinite systems. However, then the next problem immediately surfaces: due to the zero block, there are zeros on the diagonal and none of the usual preconditioners (Jacobi, SSOR) will work as they require division by diagonal elements.

For the matrix sizes we expect to run with this program, the by far simplest approach would be to just use a direct solver (in particular, the SparseDirectUMFPACK class that is bundled with deal.II). step-29 goes this route and shows that solving any linear system can be done in just 3 or 4 lines of code.

But then, this is a tutorial: we teach how to do things. Consequently, in the following, we will introduce some techniques that can be used in cases like these. Namely, we will consider the linear system as not consisting of one large matrix and vectors, but we will want to decompose matrices into blocks that correspond to the individual operators that appear in the system. We note that the resulting solver is not optimal – there are much better ways, for example those explained in the results section of step-22 or the one we use in step-43 for a problem rather similar to the current one – but that the goal is to introduce techniques rather than optimal solvers.

Solving using the Schur complement

In view of the difficulties using standard solvers and preconditioners mentioned above, let us take another look at the matrix. If we sort our degrees of freedom so that all velocity come before all pressure variables, then we can subdivide the linear system \(Ax=h\) into the following blocks:

\begin{eqnarray*} \left(\begin{array}{cc} M & B \\ B^T & 0 \end{array}\right) \left(\begin{array}{cc} U \\ P \end{array}\right) = \left(\begin{array}{cc} F \\ G \end{array}\right), \end{eqnarray*}

where \(U,P\) are the values of velocity and pressure degrees of freedom, respectively, \(M\) is the mass matrix on the velocity space, \(B^T\) corresponds to the negative divergence operator, and \(B\) is its transpose and corresponds to the gradient.

By block elimination, we can then re-order this system in the following way (multiply the first row of the system by \(B^TM^{-1}\) and then subtract the second row from it):

\begin{eqnarray*} B^TM^{-1}B P &=& B^TM^{-1} F - G, \\ MU &=& F - BP. \end{eqnarray*}

Here, the matrix \(S=B^TM^{-1}B\) (called the Schur complement of \(A\)) is obviously symmetric and, owing to the positive definiteness of \(M\) and the fact that \(B\) has full column rank, \(S\) is also positive definite.

Consequently, if we could compute \(S\), we could apply the Conjugate Gradient method to it. However, computing \(S\) is expensive, and \(S\) is in fact also a full matrix. On the other hand, the CG algorithm doesn't require us to actually have a representation of \(S\), it is sufficient to form matrix-vector products with it. We can do so in steps: to compute \(Sv=B^TM^{-1}Bv=B^T(M^{-1}(Bv))\), we

form \(w = B v\);
solve \(My = w\) for \(y=M^{-1}w\), using the CG method applied to the positive definite and symmetric mass matrix \(M\);
form \(z=B^Ty\) to obtain \(z=Sv\).

Note how we evaluate the expression \(B^TM^{-1}Bv\) right to left to avoid matrix-matrix products; this way, all we have to do is evaluate matrix-vector products.

Note: The key point in this consideration is to recognize that to implement an iterative solver such as CG or GMRES, we never actually need the actual elements of a matrix! All that is required is that we can form matrix-vector products. The same is true for preconditioners. In deal.II we encode this requirement by only requiring that matrices and preconditioners given to solver classes have a vmult() member function that does the matrix-vector product. How a class chooses to implement this function is not important to the solver. Consequently, classes can implement it by, for example, doing a sequence of products and linear solves as discussed above.

Using this strategy, we can then implement a class that provides the function vmult() that is all that the SolverCG class requires from an object representing a matrix. We can make our life a bit easier by also introducing an object that represents \(M^{-1}\) and that has its own vmult() function that, if called, solves the linear system with \(M\). Using this (which we will implement as the InverseMatrix class in the body of the program), the class that implements the Schur only needs to offer the vmult() function to perform a matrix-vector multiplication, using the algorithm above. Here are again the relevant parts of the code:

class SchurComplement : public Subscriptor
{
  public:
    SchurComplement (const BlockSparseMatrix<double>            &A,
                     const InverseMatrix<SparseMatrix<double> > &inverse_mass);
    void vmult (Vector<double>       &dst,
                const Vector<double> &src) const;
  private:
    const SmartPointer<const BlockSparseMatrix<double> > system_matrix;
    const SmartPointer<const InverseMatrix<SparseMatrix<double> > > inverse_mass;
    mutable Vector<double> tmp1, tmp2;
};
void SchurComplement::vmult (Vector<double>       &dst,
                             const Vector<double> &src) const
{
  system_matrix->block(0,1).vmult (tmp1, src); // multiply with the top right block: B
  inverse_mass->vmult (tmp2, tmp1);            // multiply with M^-1
  system_matrix->block(1,0).vmult (dst, tmp2); // multiply with the bottom left block: B^T
}

In this code, the constructor takes a reference to a block sparse matrix for the entire system, and a reference to the object representing the inverse of the mass matrix. It stores these using SmartPointer objects (see step-7), and additionally allocates two temporary vectors tmp1 and tmp2 for the vectors labeled \(w,y\) in the list above.

In the matrix-vector multiplication function, the product \(Sv\) is performed in exactly the order outlined above. Note how we access the blocks \(B\) and \(B^T\) by calling system_matrix->block(0,1) and system_matrix->block(1,0) respectively, thereby picking out individual blocks of the block system. Multiplication by \(M^{-1}\) happens using the object introduced above.

With all this, we can go ahead and write down the solver we are going to use. Essentially, all we need to do is form the right hand sides of the two equations defining \(P\) and \(U\), and then solve them with the Schur complement matrix and the mass matrix, respectively:

template <int dim>
void MixedLaplaceProblem<dim>::solve ()
{
  InverseMatrix<SparseMatrix<double> > inverse_mass (system_matrix.block(0,0));
  Vector<double> tmp (solution.block(0).size());
  {
    SchurComplement schur_complement (system_matrix, inverse_mass);
    Vector<double> schur_rhs (solution.block(1).size());
    inverse_mass.vmult (tmp, system_rhs.block(0));
    system_matrix.block(1,0).vmult (schur_rhs, tmp);
    schur_rhs -= system_rhs.block(1);
    SolverControl solver_control (solution.block(1).size(),
                                  1e-12*schur_rhs.l2_norm());
    SolverCG<> cg (solver_control);
    PreconditionIdentity preconditioner;
    cg.solve (schur_complement, solution.block(1), schur_rhs,
              preconditioner);
  }
  {
    system_matrix.block(0,1).vmult (tmp, solution.block(1));
    tmp *= -1;
    tmp += system_rhs.block(0);
    inverse_mass.vmult (solution.block(0), tmp);
  }
}

This code looks more impressive than it actually is. At the beginning, we declare an object representing \(M^{-1}\) and a temporary vector (of the size of the first block of the solution, i.e. with as many entries as there are velocity unknowns), and the two blocks surrounded by braces then solve the two equations for \(P\) and \(U\), in this order. Most of the code in each of the two blocks is actually devoted to constructing the proper right hand sides. For the first equation, this would be \(B^TM^{-1}F-G\), and \(-BP+F\) for the second one. The first hand side is then solved with the Schur complement matrix, and the second simply multiplied with \(M^{-1}\). The code as shown uses no preconditioner (i.e. the identity matrix as preconditioner) for the Schur complement.

A preconditioner for the Schur complement

One may ask whether it would help if we had a preconditioner for the Schur complement \(S=B^TM^{-1}B\). The general answer, as usual, is: of course. The problem is only, we don't know anything about this Schur complement matrix. We do not know its entries, all we know is its action. On the other hand, we have to realize that our solver is expensive since in each iteration we have to do one matrix-vector product with the Schur complement, which means that we have to do invert the mass matrix once in each iteration.

There are different approaches to preconditioning such a matrix. On the one extreme is to use something that is cheap to apply and therefore has no real impact on the work done in each iteration. The other extreme is a preconditioner that is itself very expensive, but in return really brings down the number of iterations required to solve with \(S\).

We will try something along the second approach, as much to improve the performance of the program as to demonstrate some techniques. To this end, let us recall that the ideal preconditioner is, of course, \(S^{-1}\), but that is unattainable. However, how about

\begin{eqnarray*} \tilde S^{-1} = [B^T ({\textrm{diag}\ }M)^{-1}B]^{-1} \end{eqnarray*}

as a preconditioner? That would mean that every time we have to do one preconditioning step, we actually have to solve with \(\tilde S\). At first, this looks almost as expensive as solving with \(S\) right away. However, note that in the inner iteration, we do not have to calculate \(M^{-1}\), but only the inverse of its diagonal, which is cheap.

The next step is to define a class that represents this approximate Schur complement. This should look very much like the Schur complement class itself, except that it doesn't need the object representing \(M^{-1}\) any more since we can compute the inverse of the diagonal of \(M\) on the fly:

class ApproximateSchurComplement : public Subscriptor
{
public:
  ApproximateSchurComplement (const BlockSparseMatrix<double> &A);
  void vmult (Vector<double>       &dst,
              const Vector<double> &src) const;
private:
  const SmartPointer<const BlockSparseMatrix<double> > system_matrix;
  mutable Vector<double> tmp1, tmp2;
};
void
ApproximateSchurComplement::vmult
  (Vector<double>       &dst,
   const Vector<double> &src) const
  {
    system_matrix->block(0,1).vmult (tmp1, src);
    system_matrix->block(0,0).precondition_Jacobi (tmp2, tmp1);
    system_matrix->block(1,0).vmult (dst, tmp2);
  }

Note how the vmult function differs in simply doing one Jacobi sweep (i.e. multiplying with the inverses of the diagonal) instead of multiplying with the full \(M^{-1}\). (This is how a single Jacobi preconditioner step with \(M\) is defined: it is the multiplication with the inverse of the diagonal of \(M\); in other words, the operation \(({\textrm{diag}\ }M)^{-1}x\) on a vector \(x\) is exactly what the function SparseMatrix::precondition_Jacobi above does.)

With all this, we nearly already have the preconditioner: it should be the inverse of the approximate Schur complement. We implement this with the InverseMatrix class:

ApproximateSchurComplement approximate_schur (system_matrix);
InverseMatrix<ApproximateSchurComplement> approximate_inverse
(approximate_schur);

That's all!

Taken together, the first block of our solve() function will then look like this:

SchurComplement schur_complement (system_matrix, inverse_mass);
Vector<double> schur_rhs (solution.block(1).size());
inverse_mass.vmult (tmp, system_rhs.block(0));
system_matrix.block(1,0).vmult (schur_rhs, tmp);
schur_rhs -= system_rhs.block(1);
SolverControl solver_control (solution.block(1).size(),
                              1e-12*schur_rhs.l2_norm());
SolverCG<> cg (solver_control);
ApproximateSchurComplement approximate_schur (system_matrix);
InverseMatrix<ApproximateSchurComplement> approximate_inverse
(approximate_schur);
cg.solve (schur_complement, solution.block(1), schur_rhs,
          approximate_inverse);

Note how we pass the so-defined preconditioner to the solver working on the Schur complement matrix.

Obviously, applying this inverse of the approximate Schur complement is a very expensive preconditioner, almost as expensive as inverting the Schur complement itself. We can expect it to significantly reduce the number of outer iterations required for the Schur complement. In fact it does: in a typical run on 5 times refined meshes using elements of order 0, the number of outer iterations drops from 164 to 12. On the other hand, we now have to apply a very expensive preconditioner 12 times. A better measure is therefore simply the run-time of the program: on my laptop, it drops from 28 to 23 seconds for this test case. That doesn't seem too impressive, but the savings become more pronounced on finer meshes and with elements of higher order. For example, a six times refined mesh and using elements of order 2 yields an improvement of 318 to 12 outer iterations, at a runtime of 338 seconds to 229 seconds. Not earth shattering, but significant.

A remark on similar functionality in deal.II

As a final remark about solvers and preconditioners, let us note that a significant amount of functionality introduced above is actually also present in the library itself. It probably even is more powerful and general, but we chose to introduce this material here anyway to demonstrate how to work with block matrices and to develop solvers and preconditioners, rather than using black box components from the library.

Definition of the test case

In this tutorial program, we will solve the Laplace equation in mixed formulation as stated above. Since we want to monitor convergence of the solution inside the program, we choose right hand side, boundary conditions, and the coefficient so that we recover a solution function known to us. In particular, we choose the pressure solution

\begin{eqnarray*} p = -\left(\frac \alpha 2 xy^2 + \beta x - \frac \alpha 6 x^3\right), \end{eqnarray*}

and for the coefficient we choose the unit matrix \(K_{ij}=\delta_{ij}\) for simplicity. Consequently, the exact velocity satisfies

\begin{eqnarray*} {\mathbf u} = \left(\begin{array}{cc} \frac \alpha 2 y^2 + \beta - \frac \alpha 2 x^2 \\ \alpha xy \end{array}\right). \end{eqnarray*}

This solution was chosen since it is exactly divergence free, making it a realistic test case for incompressible fluid flow. By consequence, the right hand side equals \(f=0\), and as boundary values we have to choose \(g=p|_{\partial\Omega}\).

For the computations in this program, we choose \(\alpha=0.3,\beta=1\). You can find the resulting solution in the results section below, after the commented program.

The commented program

Include files

Since this program is only an adaptation of step-4, there is not much new stuff in terms of header files. In deal.II, we usually list include files in the order base-lac-grid-dofs-fe-numerics, followed by C++ standard include files:

#include <deal.II/base/quadrature_lib.h>
#include <deal.II/base/logstream.h>
#include <deal.II/base/function.h>
#include <deal.II/lac/block_vector.h>
#include <deal.II/lac/full_matrix.h>
#include <deal.II/lac/block_sparse_matrix.h>
#include <deal.II/lac/solver_cg.h>
#include <deal.II/lac/precondition.h>
#include <deal.II/grid/tria.h>
#include <deal.II/grid/grid_generator.h>
#include <deal.II/grid/tria_accessor.h>
#include <deal.II/grid/tria_iterator.h>
#include <deal.II/dofs/dof_handler.h>
#include <deal.II/dofs/dof_renumbering.h>
#include <deal.II/dofs/dof_accessor.h>
#include <deal.II/dofs/dof_tools.h>
#include <deal.II/fe/fe_dgq.h>
#include <deal.II/fe/fe_system.h>
#include <deal.II/fe/fe_values.h>
#include <deal.II/numerics/vector_tools.h>
#include <deal.II/numerics/matrix_tools.h>
#include <deal.II/numerics/data_out.h>
#include <fstream>
#include <iostream>

This is the only significant new header, namely the one in which the Raviart-Thomas finite element is declared:

#include <deal.II/fe/fe_raviart_thomas.h>

Finally, as a bonus in this program, we will use a tensorial coefficient. Since it may have a spatial dependence, we consider it a tensor-valued function. The following include file provides the TensorFunction class that offers such functionality:

#include <deal.II/base/tensor_function.h>

The last step is as in all previous programs:

namespace Step20
{
  using namespace dealii;

The `MixedLaplaceProblem` class template

Again, since this is an adaptation of step-6, the main class is almost the same as the one in that tutorial program. In terms of member functions, the main differences are that the constructor takes the degree of the Raviart-Thomas element as an argument (and that there is a corresponding member variable to store this value) and the addition of the compute_error function in which, no surprise, we will compute the difference between the exact and the numerical solution to determine convergence of our computations:

template <int dim>
class MixedLaplaceProblem
{
public:
  MixedLaplaceProblem(const unsigned int degree);
  void run();
private:
  void make_grid_and_dofs();
  void assemble_system();
  void solve();
  void compute_errors() const;
  void output_results() const;
  const unsigned int degree;
  Triangulation<dim> triangulation;
  FESystem<dim>      fe;
  DoFHandler<dim>    dof_handler;

The second difference is that the sparsity pattern, the system matrix, and solution and right hand side vectors are now blocked. What this means and what one can do with such objects is explained in the introduction to this program as well as further down below when we explain the linear solvers and preconditioners for this problem:

  BlockSparsityPattern      sparsity_pattern;
  BlockSparseMatrix<double> system_matrix;
  BlockVector<double> solution;
  BlockVector<double> system_rhs;
};

Right hand side, boundary values, and exact solution

Our next task is to define the right hand side of our problem (i.e., the scalar right hand side for the pressure in the original Laplace equation), boundary values for the pressure, as well as a function that describes both the pressure and the velocity of the exact solution for later computations of the error. Note that these functions have one, one, and dim+1 components, respectively, and that we pass the number of components down to the Function<dim> base class. For the exact solution, we only declare the function that actually returns the entire solution vector (i.e. all components of it) at once. Here are the respective declarations:

template <int dim>
class RightHandSide : public Function<dim>
{
public:
  RightHandSide()
    : Function<dim>(1)
  {}
  virtual double value(const Point<dim> & p,
                       const unsigned int component = 0) const override;
};
template <int dim>
class PressureBoundaryValues : public Function<dim>
{
public:
  PressureBoundaryValues()
    : Function<dim>(1)
  {}
  virtual double value(const Point<dim> & p,
                       const unsigned int component = 0) const override;
};
template <int dim>
class ExactSolution : public Function<dim>
{
public:
  ExactSolution()
    : Function<dim>(dim + 1)
  {}
  virtual void vector_value(const Point<dim> &p,
                            Vector<double> &  value) const override;
};

And then we also have to define these respective functions, of course. Given our discussion in the introduction of how the solution should look like, the following computations should be straightforward:

template <int dim>
double RightHandSide<dim>::value(const Point<dim> & / *p* /,
                                 const unsigned int / *component* /) const
{
  return 0;
}
template <int dim>
double
PressureBoundaryValues<dim>::value(const Point<dim> &p,
                                   const unsigned int / *component* /) const
{
  const double alpha = 0.3;
  const double beta  = 1;
  return -(alpha * p[0] * p[1] * p[1] / 2 + beta * p[0] -
           alpha * p[0] * p[0] * p[0] / 6);
}
template <int dim>
void ExactSolution<dim>::vector_value(const Point<dim> &p,
                                      Vector<double> &  values) const
{
  Assert(values.size() == dim + 1,
         ExcDimensionMismatch(values.size(), dim + 1));
  const double alpha = 0.3;
  const double beta  = 1;
  values(0) = alpha * p[1] * p[1] / 2 + beta - alpha * p[0] * p[0] / 2;
  values(1) = alpha * p[0] * p[1];
  values(2) = -(alpha * p[0] * p[1] * p[1] / 2 + beta * p[0] -
                alpha * p[0] * p[0] * p[0] / 6);
}

The inverse permeability tensor

In addition to the other equation data, we also want to use a permeability tensor, or better – because this is all that appears in the weak form – the inverse of the permeability tensor, KInverse. For the purpose of verifying the exactness of the solution and determining convergence orders, this tensor is more in the way than helpful. We will therefore simply set it to the identity matrix.

However, a spatially varying permeability tensor is indispensable in real-life porous media flow simulations, and we would like to use the opportunity to demonstrate the technique to use tensor valued functions.

Possibly unsurprising, deal.II also has a base class not only for scalar and generally vector-valued functions (the Function base class) but also for functions that return tensors of fixed dimension and rank, the TensorFunction template. Here, the function under consideration returns a dim-by-dim matrix, i.e. a tensor of rank 2 and dimension dim. We then choose the template arguments of the base class appropriately.

The interface that the TensorFunction class provides is essentially equivalent to the Function class. In particular, there exists a value_list function that takes a list of points at which to evaluate the function, and returns the values of the function in the second argument, a list of tensors:

template <int dim>
class KInverse : public TensorFunction<2, dim>
{
public:
  KInverse()
    : TensorFunction<2, dim>()
  {}
  virtual void value_list(const std::vector<Point<dim>> &points,
                          std::vector<Tensor<2, dim>> &values) const override;
};

The implementation is less interesting. As in previous examples, we add a check to the beginning of the class to make sure that the sizes of input and output parameters are the same (see step-5 for a discussion of this technique). Then we loop over all evaluation points, and for each one first clear the output tensor and then set all its diagonal elements to one (i.e. fill the tensor with the identity matrix):

template <int dim>
void KInverse<dim>::value_list(const std::vector<Point<dim>> &points,
                               std::vector<Tensor<2, dim>> &  values) const
{

The value we are going to store for a given point does not depend on its coordinates and we use the points object only for checking that the values object has the correct size. In release mode, AssertDimension is defined empty and the compiler will complain that the points object is unused. The following line silences this warning.

  (void)points;
  AssertDimension(points.size(), values.size());
  for (auto &value : values)
    {
      value.clear();
      for (unsigned int d = 0; d < dim; ++d)
        value[d][d] = 1.;
    }
}

MixedLaplaceProblem class implementation

MixedLaplaceProblem::MixedLaplaceProblem

In the constructor of this class, we first store the value that was passed in concerning the degree of the finite elements we shall use (a degree of zero, for example, means to use RT(0) and DG(0)), and then construct the vector valued element belonging to the space \(X_h\) described in the introduction. The rest of the constructor is as in the early tutorial programs.

The only thing worth describing here is the constructor call of the fe variable. The FESystem class to which this variable belongs has a number of different constructors that all refer to binding simpler elements together into one larger element. In the present case, we want to couple a single RT(degree) element with a single DQ(degree) element. The constructor to FESystem that does this requires us to specify first the first base element (the FE_RaviartThomas object of given degree) and then the number of copies for this base element, and then similarly the kind and number of FE_DGQ elements. Note that the Raviart-Thomas element already has dim vector components, so that the coupled element will have dim+1 vector components, the first dim of which correspond to the velocity variable whereas the last one corresponds to the pressure.

It is also worth comparing the way we constructed this element from its base elements, with the way we have done so in step-8: there, we have built it as fe (FE_Q<dim>(1), dim), i.e. we have simply used dim copies of the FE_Q(1) element, one copy for the displacement in each coordinate direction.

template <int dim>
MixedLaplaceProblem<dim>::MixedLaplaceProblem(const unsigned int degree)
  : degree(degree)
  , fe(FE_RaviartThomas<dim>(degree), 1, FE_DGQ<dim>(degree), 1)
  , dof_handler(triangulation)
{}

MixedLaplaceProblem::make_grid_and_dofs

This next function starts out with well-known functions calls that create and refine a mesh, and then associate degrees of freedom with it:

template <int dim>
void MixedLaplaceProblem<dim>::make_grid_and_dofs()
{
  GridGenerator::hyper_cube(triangulation, -1, 1);
  triangulation.refine_global(3);
  dof_handler.distribute_dofs(fe);

However, then things become different. As mentioned in the introduction, we want to subdivide the matrix into blocks corresponding to the two different kinds of variables, velocity and pressure. To this end, we first have to make sure that the indices corresponding to velocities and pressures are not intermingled: First all velocity degrees of freedom, then all pressure DoFs. This way, the global matrix separates nicely into a \(2 \times 2\) system. To achieve this, we have to renumber degrees of freedom base on their vector component, an operation that conveniently is already implemented:

DoFRenumbering::component_wise(dof_handler);

The next thing is that we want to figure out the sizes of these blocks so that we can allocate an appropriate amount of space. To this end, we call the DoFTools::count_dofs_per_component() function that counts how many shape functions are non-zero for a particular vector component. We have dim+1 vector components, and DoFTools::count_dofs_per_component() will count how many shape functions belong to each of these components.

There is one problem here. As described in the documentation of that function, it wants to put the number of \(x\)-velocity shape functions into dofs_per_component[0], the number of \(y\)-velocity shape functions into dofs_per_component[1] (and similar in 3d), and the number of pressure shape functions into dofs_per_component[dim]. But, the Raviart-Thomas element is special in that it is non-primitive, i.e., for Raviart-Thomas elements all velocity shape functions are nonzero in all components. In other words, the function cannot distinguish between \(x\) and \(y\) velocity functions because there is no such distinction. It therefore puts the overall number of velocity into each of dofs_per_component[c], \(0\le c\le \text{dim}\). On the other hand, the number of pressure variables equals the number of shape functions that are nonzero in the dim-th component.

Using this knowledge, we can get the number of velocity shape functions from any of the first dim elements of dofs_per_component, and then use this below to initialize the vector and matrix block sizes, as well as create output.

Note: If you find this concept difficult to understand, you may want to consider using the function DoFTools::count_dofs_per_block() instead, as we do in the corresponding piece of code in step-22. You might also want to read up on the difference between blocks and components in the glossary.

std::vector<types::global_dof_index> dofs_per_component(dim + 1);
DoFTools::count_dofs_per_component(dof_handler, dofs_per_component);
const unsigned int n_u = dofs_per_component[0],
                   n_p = dofs_per_component[dim];
std::cout << "Number of active cells: " << triangulation.n_active_cells()
          << std::endl
          << "Total number of cells: " << triangulation.n_cells()
          << std::endl
          << "Number of degrees of freedom: " << dof_handler.n_dofs()
          << " (" << n_u << '+' << n_p << ')' << std::endl;

The next task is to allocate a sparsity pattern for the matrix that we will create. We use a compressed sparsity pattern like in the previous steps, but as system_matrix is a block matrix we use the class BlockDynamicSparsityPattern instead of just DynamicSparsityPattern. This block sparsity pattern has four blocks in a \(2 \times 2\) pattern. The blocks' sizes depend on n_u and n_p, which hold the number of velocity and pressure variables. In the second step we have to instruct the block system to update its knowledge about the sizes of the blocks it manages; this happens with the dsp.collect_sizes () call.

BlockDynamicSparsityPattern dsp(2, 2);
dsp.block(0, 0).reinit(n_u, n_u);
dsp.block(1, 0).reinit(n_p, n_u);
dsp.block(0, 1).reinit(n_u, n_p);
dsp.block(1, 1).reinit(n_p, n_p);
dsp.collect_sizes();
DoFTools::make_sparsity_pattern(dof_handler, dsp);

We use the compressed block sparsity pattern in the same way as the non-block version to create the sparsity pattern and then the system matrix:

sparsity_pattern.copy_from(dsp);

system_matrix.reinit(sparsity_pattern);

Then we have to resize the solution and right hand side vectors in exactly the same way as the block compressed sparsity pattern:

  solution.reinit(2);
  solution.block(0).reinit(n_u);
  solution.block(1).reinit(n_p);
  solution.collect_sizes();
  system_rhs.reinit(2);
  system_rhs.block(0).reinit(n_u);
  system_rhs.block(1).reinit(n_p);
  system_rhs.collect_sizes();
}

MixedLaplaceProblem::assemble_system

Similarly, the function that assembles the linear system has mostly been discussed already in the introduction to this example. At its top, what happens are all the usual steps, with the addition that we do not only allocate quadrature and FEValues objects for the cell terms, but also for face terms. After that, we define the usual abbreviations for variables, and the allocate space for the local matrix and right hand side contributions, and the array that holds the global numbers of the degrees of freedom local to the present cell.

template <int dim>
void MixedLaplaceProblem<dim>::assemble_system()
{
  QGauss<dim>     quadrature_formula(degree + 2);
  QGauss<dim - 1> face_quadrature_formula(degree + 2);
  FEValues<dim>     fe_values(fe,
                          quadrature_formula,
                          update_values | update_gradients |
                            update_quadrature_points | update_JxW_values);
  FEFaceValues<dim> fe_face_values(fe,
                                   face_quadrature_formula,
                                   update_values | update_normal_vectors |
                                     update_quadrature_points |
                                     update_JxW_values);
  const unsigned int dofs_per_cell   = fe.dofs_per_cell;
  const unsigned int n_q_points      = quadrature_formula.size();
  const unsigned int n_face_q_points = face_quadrature_formula.size();
  FullMatrix<double> local_matrix(dofs_per_cell, dofs_per_cell);
  Vector<double>     local_rhs(dofs_per_cell);
  std::vector<types::global_dof_index> local_dof_indices(dofs_per_cell);

The next step is to declare objects that represent the source term, pressure boundary value, and coefficient in the equation. In addition to these objects that represent continuous functions, we also need arrays to hold their values at the quadrature points of individual cells (or faces, for the boundary values). Note that in the case of the coefficient, the array has to be one of matrices.

const RightHandSide<dim>          right_hand_side;
const PressureBoundaryValues<dim> pressure_boundary_values;
const KInverse<dim>               k_inverse;
std::vector<double>         rhs_values(n_q_points);
std::vector<double>         boundary_values(n_face_q_points);
std::vector<Tensor<2, dim>> k_inverse_values(n_q_points);

Finally, we need a couple of extractors that we will use to get at the velocity and pressure components of vector-valued shape functions. Their function and use is described in detail in the Handling vector valued problems report. Essentially, we will use them as subscripts on the FEValues objects below: the FEValues object describes all vector components of shape functions, while after subscription, it will only refer to the velocities (a set of dim components starting at component zero) or the pressure (a scalar component located at position dim):

const FEValuesExtractors::Vector velocities(0);

const FEValuesExtractors::Scalar pressure(dim);

With all this in place, we can go on with the loop over all cells. The body of this loop has been discussed in the introduction, and will not be commented any further here:

for (const auto &cell : dof_handler.active_cell_iterators())
  {
    fe_values.reinit(cell);
    local_matrix = 0;
    local_rhs    = 0;
    right_hand_side.value_list(fe_values.get_quadrature_points(),
                               rhs_values);
    k_inverse.value_list(fe_values.get_quadrature_points(),
                         k_inverse_values);
    for (unsigned int q = 0; q < n_q_points; ++q)
      for (unsigned int i = 0; i < dofs_per_cell; ++i)
        {
          const Tensor<1, dim> phi_i_u = fe_values[velocities].value(i, q);
          const double div_phi_i_u = fe_values[velocities].divergence(i, q);
          const double phi_i_p     = fe_values[pressure].value(i, q);
          for (unsigned int j = 0; j < dofs_per_cell; ++j)
            {
              const Tensor<1, dim> phi_j_u =
                fe_values[velocities].value(j, q);
              const double div_phi_j_u =
                fe_values[velocities].divergence(j, q);
              const double phi_j_p = fe_values[pressure].value(j, q);
              local_matrix(i, j) +=
                (phi_i_u * k_inverse_values[q] * phi_j_u //
                 - phi_i_p * div_phi_j_u                 //
                 - div_phi_i_u * phi_j_p)                //
                * fe_values.JxW(q);
            }
          local_rhs(i) += -phi_i_p * rhs_values[q] * fe_values.JxW(q);
        }
    for (unsigned int face_n = 0;
         face_n < GeometryInfo<dim>::faces_per_cell;
         ++face_n)
      if (cell->at_boundary(face_n))
        {
          fe_face_values.reinit(cell, face_n);
          pressure_boundary_values.value_list(
            fe_face_values.get_quadrature_points(), boundary_values);
          for (unsigned int q = 0; q < n_face_q_points; ++q)
            for (unsigned int i = 0; i < dofs_per_cell; ++i)
              local_rhs(i) += -(fe_face_values[velocities].value(i, q) * //
                                fe_face_values.normal_vector(q) *        //
                                boundary_values[q] *                     //
                                fe_face_values.JxW(q));
        }

The final step in the loop over all cells is to transfer local contributions into the global matrix and right hand side vector. Note that we use exactly the same interface as in previous examples, although we now use block matrices and vectors instead of the regular ones. In other words, to the outside world, block objects have the same interface as matrices and vectors, but they additionally allow to access individual blocks.

      cell->get_dof_indices(local_dof_indices);
      for (unsigned int i = 0; i < dofs_per_cell; ++i)
        for (unsigned int j = 0; j < dofs_per_cell; ++j)
          system_matrix.add(local_dof_indices[i],
                            local_dof_indices[j],
                            local_matrix(i, j));
      for (unsigned int i = 0; i < dofs_per_cell; ++i)
        system_rhs(local_dof_indices[i]) += local_rhs(i);
    }
}

Linear solvers and preconditioners

The linear solvers and preconditioners we use in this example have been discussed in significant detail already in the introduction. We will therefore not discuss the rationale for these classes here any more, but rather only comment on implementational aspects.

The `InverseMatrix` class template

There are a few places in this program where we will need either the action of the inverse of the mass matrix or the action of the inverse of the approximate Schur complement. Rather than explicitly calling SolverCG::solve every time that we need to solve such a system, we will wrap the action of either inverse in a simple class. The only things we would like to note are that this class is derived from Subscriptor and, as mentioned above, it stores a pointer to the underlying matrix with a SmartPointer object. This class also appears in step-21 and a more advanced version of it appears in step-22.

template <class MatrixType>
class InverseMatrix : public Subscriptor
{
public:
  InverseMatrix(const MatrixType &m);
  void vmult(Vector<double> &dst, const Vector<double> &src) const;
private:
  const SmartPointer<const MatrixType> matrix;
};
template <class MatrixType>
InverseMatrix<MatrixType>::InverseMatrix(const MatrixType &m)
  : matrix(&m)
{}
template <class MatrixType>
void InverseMatrix<MatrixType>::vmult(Vector<double> &      dst,
                                      const Vector<double> &src) const
{

To make the control flow simpler, we recreate both the ReductionControl and SolverCG objects every time this is called. This is not the most efficient choice because SolverCG instances allocate memory whenever they are created; this is just a tutorial so such inefficiencies are acceptable for the sake of exposition.

  SolverControl solver_control(std::max<unsigned int>(src.size(), 200),
                               1e-8 * src.l2_norm());
  SolverCG<>    cg(solver_control);
  dst = 0;
  cg.solve(*matrix, dst, src, PreconditionIdentity());
}

The `SchurComplement` class

The next class is the Schur complement class. Its rationale has also been discussed in length in the introduction. Like InverseMatrix, this class is derived from Subscriptor and stores SmartPointer s pointing to the system matrix and InverseMatrix wrapper.

The vmult function requires two temporary vectors that we do not want to re-allocate and free every time we call this function. Since here, we have full control over the use of these vectors (unlike above, where a class called by the vmult function required these vectors, not the vmult function itself), we allocate them directly, rather than going through the VectorMemory mechanism. However, again, these member variables do not carry any state between successive calls to the member functions of this class (i.e., we never care what values they were set to the last time a member function was called), we mark these vectors as mutable.

The rest of the (short) implementation of this class is straightforward if you know the order of matrix-vector multiplications performed by the vmult function:

class SchurComplement : public Subscriptor
{
public:
  SchurComplement(const BlockSparseMatrix<double> &          A,
                  const InverseMatrix<SparseMatrix<double>> &Minv);
  void vmult(Vector<double> &dst, const Vector<double> &src) const;
private:
  const SmartPointer<const BlockSparseMatrix<double>>           system_matrix;
  const SmartPointer<const InverseMatrix<SparseMatrix<double>>> m_inverse;
  mutable Vector<double> tmp1, tmp2;
};
SchurComplement ::SchurComplement(
  const BlockSparseMatrix<double> &          A,
  const InverseMatrix<SparseMatrix<double>> &Minv)
  : system_matrix(&A)
  , m_inverse(&Minv)
  , tmp1(A.block(0, 0).m())
  , tmp2(A.block(0, 0).m())
{}
void SchurComplement::vmult(Vector<double> &      dst,
                            const Vector<double> &src) const
{
  system_matrix->block(0, 1).vmult(tmp1, src);
  m_inverse->vmult(tmp2, tmp1);
  system_matrix->block(1, 0).vmult(dst, tmp2);
}

The `ApproximateSchurComplement` class

The third component of our solver and preconditioner system is the class that approximates the Schur complement with the method described in the introduction. We will use this class to build a preconditioner for our system matrix.

class ApproximateSchurComplement : public Subscriptor
{
public:
  ApproximateSchurComplement(const BlockSparseMatrix<double> &A);
  void vmult(Vector<double> &dst, const Vector<double> &src) const;
private:
  const SmartPointer<const BlockSparseMatrix<double>> system_matrix;
  mutable Vector<double> tmp1, tmp2;
};
ApproximateSchurComplement::ApproximateSchurComplement(
  const BlockSparseMatrix<double> &A)
  : system_matrix(&A)
  , tmp1(A.block(0, 0).m())
  , tmp2(A.block(0, 0).m())
{}
void ApproximateSchurComplement::vmult(Vector<double> &      dst,
                                       const Vector<double> &src) const
{
  system_matrix->block(0, 1).vmult(tmp1, src);
  system_matrix->block(0, 0).precondition_Jacobi(tmp2, tmp1);
  system_matrix->block(1, 0).vmult(dst, tmp2);
}

MixedLaplace::solve

After all these preparations, we can finally write the function that actually solves the linear problem. We will go through the two parts it has that each solve one of the two equations, the first one for the pressure (component 1 of the solution), then the velocities (component 0 of the solution).

template <int dim>
void MixedLaplaceProblem<dim>::solve()
{
  InverseMatrix<SparseMatrix<double>> inverse_mass(system_matrix.block(0, 0));
  Vector<double>                      tmp(solution.block(0).size());

Now on to the first equation. The right hand side of it is \(B^TM^{-1}F-G\), which is what we compute in the first few lines:

{
  SchurComplement schur_complement(system_matrix, inverse_mass);
  Vector<double>  schur_rhs(solution.block(1).size());
  inverse_mass.vmult(tmp, system_rhs.block(0));
  system_matrix.block(1, 0).vmult(schur_rhs, tmp);
  schur_rhs -= system_rhs.block(1);

Now that we have the right hand side we can go ahead and solve for the pressure, using our approximation of the inverse as a preconditioner:

  SolverControl solver_control(solution.block(1).size(),
                               1e-12 * schur_rhs.l2_norm());
  SolverCG<>    cg(solver_control);
  ApproximateSchurComplement approximate_schur(system_matrix);
  InverseMatrix<ApproximateSchurComplement> approximate_inverse(
    approximate_schur);
  cg.solve(schur_complement,
           solution.block(1),
           schur_rhs,
           approximate_inverse);
  std::cout << solver_control.last_step()
            << " CG Schur complement iterations to obtain convergence."
            << std::endl;
}

After we have the pressure, we can compute the velocity. The equation reads \(MU=-BP+F\), and we solve it by first computing the right hand side, and then multiplying it with the object that represents the inverse of the mass matrix:

  {
    system_matrix.block(0, 1).vmult(tmp, solution.block(1));
    tmp *= -1;
    tmp += system_rhs.block(0);
    inverse_mass.vmult(solution.block(0), tmp);
  }
}

MixedLaplaceProblem class implementation (continued)

MixedLaplace::compute_errors

After we have dealt with the linear solver and preconditioners, we continue with the implementation of our main class. In particular, the next task is to compute the errors in our numerical solution, in both the pressures as well as velocities.

To compute errors in the solution, we have already introduced the VectorTools::integrate_difference function in step-7 and step-11. However, there we only dealt with scalar solutions, whereas here we have a vector-valued solution with components that even denote different quantities and may have different orders of convergence (this isn't the case here, by choice of the used finite elements, but is frequently the case in mixed finite element applications). What we therefore have to do is to `mask' the components that we are interested in. This is easily done: the VectorTools::integrate_difference function takes as one of its arguments a pointer to a weight function (the parameter defaults to the null pointer, meaning unit weights). What we have to do is to pass a function object that equals one in the components we are interested in, and zero in the other ones. For example, to compute the pressure error, we should pass a function that represents the constant vector with a unit value in component dim, whereas for the velocity the constant vector should be one in the first dim components, and zero in the location of the pressure.

In deal.II, the ComponentSelectFunction does exactly this: it wants to know how many vector components the function it is to represent should have (in our case this would be dim+1, for the joint velocity-pressure space) and which individual or range of components should be equal to one. We therefore define two such masks at the beginning of the function, following by an object representing the exact solution and a vector in which we will store the cellwise errors as computed by integrate_difference:

template <int dim>
void MixedLaplaceProblem<dim>::compute_errors() const
{
  const ComponentSelectFunction<dim> pressure_mask(dim, dim + 1);
  const ComponentSelectFunction<dim> velocity_mask(std::make_pair(0, dim),
                                                   dim + 1);
  ExactSolution<dim> exact_solution;
  Vector<double>     cellwise_errors(triangulation.n_active_cells());

As already discussed in step-7, we have to realize that it is impossible to integrate the errors exactly. All we can do is approximate this integral using quadrature. This actually presents a slight twist here: if we naively chose an object of type QGauss<dim>(degree+1) as one may be inclined to do (this is what we used for integrating the linear system), one realizes that the error is very small and does not follow the expected convergence curves at all. What is happening is that for the mixed finite elements used here, the Gauss points happen to be superconvergence points in which the pointwise error is much smaller (and converges with higher order) than anywhere else. These are therefore not particularly good points for integration. To avoid this problem, we simply use a trapezoidal rule and iterate it degree+2 times in each coordinate direction (again as explained in step-7):

QTrapez<1> q_trapez;

QIterated<dim> quadrature(q_trapez, degree + 2);

With this, we can then let the library compute the errors and output them to the screen:

  VectorTools::integrate_difference(dof_handler,
                                    solution,
                                    exact_solution,
                                    cellwise_errors,
                                    quadrature,
                                    VectorTools::L2_norm,
                                    &pressure_mask);
  const double p_l2_error =
    VectorTools::compute_global_error(triangulation,
                                      cellwise_errors,
                                      VectorTools::L2_norm);
  VectorTools::integrate_difference(dof_handler,
                                    solution,
                                    exact_solution,
                                    cellwise_errors,
                                    quadrature,
                                    VectorTools::L2_norm,
                                    &velocity_mask);
  const double u_l2_error =
    VectorTools::compute_global_error(triangulation,
                                      cellwise_errors,
                                      VectorTools::L2_norm);
  std::cout << "Errors: ||e_p||_L2 = " << p_l2_error
            << ",   ||e_u||_L2 = " << u_l2_error << std::endl;
}

MixedLaplace::output_results

The last interesting function is the one in which we generate graphical output. Note that all velocity components get the same solution name "u". Together with using DataComponentInterpretation::::component_is_part_of_vector this will cause DataOut<dim>::write_vtu() to generate a vector representation of the individual velocity components, see step-22 or the Generating graphical output section of the Handling vector valued problems module for more information. Finally, it seems inappropriate for higher order elements to only show a single bilinear quadrilateral per cell in the graphical output. We therefore generate patches of size (degree+1)x(degree+1) to capture the full information content of the solution. See the step-7 tutorial program for more information on this.

template <int dim>
void MixedLaplaceProblem<dim>::output_results() const
{
  std::vector<std::string> solution_names(dim, "u");
  solution_names.emplace_back("p");
  std::vector<DataComponentInterpretation::DataComponentInterpretation>
    interpretation(dim,
                   DataComponentInterpretation::component_is_part_of_vector);
  interpretation.push_back(DataComponentInterpretation::component_is_scalar);
  DataOut<dim> data_out;
  data_out.add_data_vector(dof_handler,
                           solution,
                           solution_names,
                           interpretation);
  data_out.build_patches(degree + 1);
  std::ofstream output("solution.vtu");
  data_out.write_vtu(output);
}

MixedLaplace::run

This is the final function of our main class. It's only job is to call the other functions in their natural order:

  template <int dim>
  void MixedLaplaceProblem<dim>::run()
  {
    make_grid_and_dofs();
    assemble_system();
    solve();
    compute_errors();
    output_results();
  }
} // namespace Step20

The `main` function

The main function we stole from step-6 instead of step-4. It is almost equal to the one in step-6 (apart from the changed class names, of course), the only exception is that we pass the degree of the finite element space to the constructor of the mixed Laplace problem (here, we use zero-th order elements).

int main()
{
  try
    {
      using namespace dealii;
      using namespace Step20;
      MixedLaplaceProblem<2> mixed_laplace_problem(0);
      mixed_laplace_problem.run();
    }
  catch (std::exception &exc)
    {
      std::cerr << std::endl
                << std::endl
                << "----------------------------------------------------"
                << std::endl;
      std::cerr << "Exception on processing: " << std::endl
                << exc.what() << std::endl
                << "Aborting!" << std::endl
                << "----------------------------------------------------"
                << std::endl;
      return 1;
    }
  catch (...)
    {
      std::cerr << std::endl
                << std::endl
                << "----------------------------------------------------"
                << std::endl;
      std::cerr << "Unknown exception!" << std::endl
                << "Aborting!" << std::endl
                << "----------------------------------------------------"
                << std::endl;
      return 1;
    }
  return 0;
}

Results

Output of the program and graphical visualization

If we run the program as is, we get this output for the \(8\times 8\) mesh we use (for a total of 64 cells with 64 pressure degrees of freedom since we use piecewise constants, and 144 velocities because the Raviart-Thomas element defines one degree per freedom per face and there are 72 faces parallel to the \(x\)-axis and the same number parallel to the \(y\)-axis):

$ make run
[ 66%] Built target step-20
Scanning dependencies of target run
[100%] Run step-20 with Release configuration
Number of active cells: 64
Total number of cells: 85
Number of degrees of freedom: 208 (144+64)
15 CG Schur complement iterations to obtain convergence.
Errors: ||e_p||_L2 = 0.178055,   ||e_u||_L2 = 0.0433435
[100%] Built target run

The fact that the number of iterations is so small, of course, is due to good (but expensive!) preconditioner we have developed. To get confidence in the solution, let us take a look at it. The following three images show (from left to right) the x-velocity, the y-velocity, and the pressure:

Let us start with the pressure: it is highest at the left and lowest at the right, so flow will be from left to right. In addition, though hardly visible in the graph, we have chosen the pressure field such that the flow left-right flow first channels towards the center and then outward again. Consequently, the x-velocity has to increase to get the flow through the narrow part, something that can easily be seen in the left image. The middle image represents inward flow in y-direction at the left end of the domain, and outward flow in y-direction at the right end of the domain.

As an additional remark, note how the x-velocity in the left image is only continuous in x-direction, whereas the y-velocity is continuous in y-direction. The flow fields are discontinuous in the other directions. This very obviously reflects the continuity properties of the Raviart-Thomas elements, which are, in fact, only in the space H(div) and not in the space \(H^1\). Finally, the pressure field is completely discontinuous, but that should not surprise given that we have chosen FE_DGQ(0) as the finite element for that solution component.

Convergence

The program offers two obvious places where playing and observing convergence is in order: the degree of the finite elements used (passed to the constructor of the MixedLaplaceProblem class from main()), and the refinement level (determined in MixedLaplaceProblem::make_grid_and_dofs). What one can do is to change these values and observe the errors computed later on in the course of the program run.

If one does this, one finds the following pattern for the \(L_2\) error in the pressure variable:

	Finite element order
Refinement level	0	1	2
0	1.45344	0.0831743	0.0235186
1	0.715099	0.0245341	0.00293983
2	0.356383	0.0063458	0.000367478
3	0.178055	0.00159944	4.59349e-05
4	0.0890105	0.000400669	5.74184e-06
5	0.0445032	0.000100218	7.17799e-07
6	0.0222513	2.50576e-05	9.0164e-08
	\(O(h)\)	\(O(h^2)\)	\(O(h^3)\)

The theoretically expected convergence orders are very nicely reflected by the experimentally observed ones indicated in the last row of the table.

One can make the same experiment with the \(L_2\) error in the velocity variables:

	Finite element order
Refinement level	0	1	2
0	0.367423	0.127657	5.10388e-14
1	0.175891	0.0319142	9.04414e-15
2	0.0869402	0.00797856	1.23723e-14
3	0.0433435	0.00199464	1.86345e-07
4	0.0216559	0.00049866	2.72566e-07
5	0.010826	0.000124664	3.57141e-07
6	0.00541274	3.1166e-05	4.46124e-07
	\(O(h)\)	\(O(h^2)\)	\(O(h^3)\)

The result concerning the convergence order is the same here.

Possibilities for extensions

More realistic permeability fields

Realistic flow computations for ground water or oil reservoir simulations will not use a constant permeability. Here's a first, rather simple way to change this situation: we use a permeability that decays very rapidly away from a central flowline until it hits a background value of 0.001. This is to mimic the behavior of fluids in sandstone: in most of the domain, the sandstone is homogeneous and, while permeable to fluids, not overly so; on the other stone, the stone has cracked, or faulted, along one line, and the fluids flow much easier along this large crack. Here is how we could implement something like this:

template <int dim>
void
KInverse<dim>::value_list (const std::vector<Point<dim> > &points,
                           std::vector<Tensor<2,dim> >    &values) const
{
  Assert (points.size() == values.size(),
          ExcDimensionMismatch (points.size(), values.size()));
  for (unsigned int p=0; p<points.size(); ++p)
    {
      values[p].clear ();
      const double distance_to_flowline
        = std::fabs(points[p][1]-0.2*std::sin(10*points[p][0]));
      const double permeability = std::max(std::exp(-(distance_to_flowline*
                                                      distance_to_flowline)
                                                    / (0.1 * 0.1)),
                                           0.001);
      for (unsigned int d=0; d<dim; ++d)
        values[p][d][d] = 1./permeability;
    }
}

Remember that the function returns the inverse of the permeability tensor.

With a significantly higher mesh resolution, we can visualize this, here with x- and y-velocity:

It is obvious how fluids flow essentially only along the middle line, and not anywhere else.

Another possibility would be to use a random permeability field. A simple way to achieve this would be to scatter a number of centers around the domain and then use a permeability field that is the sum of (negative) exponentials for each of these centers. Flow would then try to hop from one center of high permeability to the next one. This is an entirely unscientific attempt at describing a random medium, but one possibility to implement this behavior would look like this:

template <int dim>
class KInverse : public TensorFunction<2,dim>
{
  public:
    KInverse ();
    virtual void value_list (const std::vector<Point<dim> > &points,
                             std::vector<Tensor<2,dim> >    &values) const;
  private:
    std::vector<Point<dim> > centers;
};
template <int dim>
KInverse<dim>::KInverse ()
{
  const unsigned int N = 40;
  centers.resize (N);
  for (unsigned int i=0; i<N; ++i)
    for (unsigned int d=0; d<dim; ++d)
      centers[i][d] = 2.*rand()/RAND_MAX-1;
}
template <int dim>
void
KInverse<dim>::value_list (const std::vector<Point<dim> > &points,
                           std::vector<Tensor<2,dim> >    &values) const
{
  Assert (points.size() == values.size(),
          ExcDimensionMismatch (points.size(), values.size()));
  for (unsigned int p=0; p<points.size(); ++p)
    {
      values[p].clear ();
      double permeability = 0;
      for (unsigned int i=0; i<centers.size(); ++i)
        permeability += std::exp(-(points[p]-centers[i]).square()
                                 / (0.1 * 0.1));
      const double normalized_permeability
        = std::max(permeability, 0.005);
      for (unsigned int d=0; d<dim; ++d)
        values[p][d][d] = 1./normalized_permeability;
    }
}

A piecewise constant interpolation of the diagonal elements of the inverse of this tensor (i.e., of normalized_permeability) looks as follows:

With a permeability field like this, we would get x-velocities and pressures as follows:

We will use these permeability fields again in step-21 and step-43.

Better linear solvers

As mentioned in the introduction, the Schur complement solver used here is not the best one conceivable (nor is it intended to be a particularly good one). Better ones can be found in the literature and can be built using the same block matrix techniques that were introduced here. We pick up on this theme again in step-22, where we first build a Schur complement solver for the Stokes equation as we did here, and then in the Improved Solvers section discuss better ways based on solving the system as a whole but preconditioning based on individual blocks. We will also come back to this in step-43.

The plain program

/* ---------------------------------------------------------------------
 *
 * Copyright (C) 2005 - 2018 by the deal.II authors
 *
 * This file is part of the deal.II library.
 *
 * The deal.II library is free software; you can use it, redistribute
 * it, and/or modify it under the terms of the GNU Lesser General
 * Public License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 * The full text of the license can be found in the file LICENSE.md at
 * the top level directory of deal.II.
 *
 * ---------------------------------------------------------------------
 *
 * Author: Wolfgang Bangerth, Texas A&M University, 2005, 2006
 */
#include <deal.II/base/quadrature_lib.h>
#include <deal.II/base/logstream.h>
#include <deal.II/base/function.h>
#include <deal.II/lac/block_vector.h>
#include <deal.II/lac/full_matrix.h>
#include <deal.II/lac/block_sparse_matrix.h>
#include <deal.II/lac/solver_cg.h>
#include <deal.II/lac/precondition.h>
#include <deal.II/grid/tria.h>
#include <deal.II/grid/grid_generator.h>
#include <deal.II/grid/tria_accessor.h>
#include <deal.II/grid/tria_iterator.h>
#include <deal.II/dofs/dof_handler.h>
#include <deal.II/dofs/dof_renumbering.h>
#include <deal.II/dofs/dof_accessor.h>
#include <deal.II/dofs/dof_tools.h>
#include <deal.II/fe/fe_dgq.h>
#include <deal.II/fe/fe_system.h>
#include <deal.II/fe/fe_values.h>
#include <deal.II/numerics/vector_tools.h>
#include <deal.II/numerics/matrix_tools.h>
#include <deal.II/numerics/data_out.h>
#include <fstream>
#include <iostream>
#include <deal.II/fe/fe_raviart_thomas.h>
#include <deal.II/base/tensor_function.h>
namespace Step20
{
  using namespace dealii;
  template <int dim>
  class MixedLaplaceProblem
  {
  public:
    MixedLaplaceProblem(const unsigned int degree);
    void run();
  private:
    void make_grid_and_dofs();
    void assemble_system();
    void solve();
    void compute_errors() const;
    void output_results() const;
    const unsigned int degree;
    Triangulation<dim> triangulation;
    FESystem<dim>      fe;
    DoFHandler<dim>    dof_handler;
    BlockSparsityPattern      sparsity_pattern;
    BlockSparseMatrix<double> system_matrix;
    BlockVector<double> solution;
    BlockVector<double> system_rhs;
  };
  template <int dim>
  class RightHandSide : public Function<dim>
  {
  public:
    RightHandSide()
      : Function<dim>(1)
    {}
    virtual double value(const Point<dim> & p,
                         const unsigned int component = 0) const override;
  };
  template <int dim>
  class PressureBoundaryValues : public Function<dim>
  {
  public:
    PressureBoundaryValues()
      : Function<dim>(1)
    {}
    virtual double value(const Point<dim> & p,
                         const unsigned int component = 0) const override;
  };
  template <int dim>
  class ExactSolution : public Function<dim>
  {
  public:
    ExactSolution()
      : Function<dim>(dim + 1)
    {}
    virtual void vector_value(const Point<dim> &p,
                              Vector<double> &  value) const override;
  };
  template <int dim>
  double RightHandSide<dim>::value(const Point<dim> & /*p*/,
                                   const unsigned int /*component*/) const
  {
    return 0;
  }
  template <int dim>
  double
  PressureBoundaryValues<dim>::value(const Point<dim> &p,
                                     const unsigned int /*component*/) const
  {
    const double alpha = 0.3;
    const double beta  = 1;
    return -(alpha * p[0] * p[1] * p[1] / 2 + beta * p[0] -
             alpha * p[0] * p[0] * p[0] / 6);
  }
  template <int dim>
  void ExactSolution<dim>::vector_value(const Point<dim> &p,
                                        Vector<double> &  values) const
  {
    Assert(values.size() == dim + 1,
           ExcDimensionMismatch(values.size(), dim + 1));
    const double alpha = 0.3;
    const double beta  = 1;
    values(0) = alpha * p[1] * p[1] / 2 + beta - alpha * p[0] * p[0] / 2;
    values(1) = alpha * p[0] * p[1];
    values(2) = -(alpha * p[0] * p[1] * p[1] / 2 + beta * p[0] -
                  alpha * p[0] * p[0] * p[0] / 6);
  }
  template <int dim>
  class KInverse : public TensorFunction<2, dim>
  {
  public:
    KInverse()
      : TensorFunction<2, dim>()
    {}
    virtual void value_list(const std::vector<Point<dim>> &points,
                            std::vector<Tensor<2, dim>> &values) const override;
  };
  template <int dim>
  void KInverse<dim>::value_list(const std::vector<Point<dim>> &points,
                                 std::vector<Tensor<2, dim>> &  values) const
  {
    (void)points;
    AssertDimension(points.size(), values.size());
    for (auto &value : values)
      {
        value.clear();
        for (unsigned int d = 0; d < dim; ++d)
          value[d][d] = 1.;
      }
  }
  template <int dim>
  MixedLaplaceProblem<dim>::MixedLaplaceProblem(const unsigned int degree)
    : degree(degree)
    , fe(FE_RaviartThomas<dim>(degree), 1, FE_DGQ<dim>(degree), 1)
    , dof_handler(triangulation)
  {}
  template <int dim>
  void MixedLaplaceProblem<dim>::make_grid_and_dofs()
  {
    GridGenerator::hyper_cube(triangulation, -1, 1);
    triangulation.refine_global(3);
    dof_handler.distribute_dofs(fe);
    DoFRenumbering::component_wise(dof_handler);
    std::vector<types::global_dof_index> dofs_per_component(dim + 1);
    DoFTools::count_dofs_per_component(dof_handler, dofs_per_component);
    const unsigned int n_u = dofs_per_component[0],
                       n_p = dofs_per_component[dim];
    std::cout << "Number of active cells: " << triangulation.n_active_cells()
              << std::endl
              << "Total number of cells: " << triangulation.n_cells()
              << std::endl
              << "Number of degrees of freedom: " << dof_handler.n_dofs()
              << " (" << n_u << '+' << n_p << ')' << std::endl;
    BlockDynamicSparsityPattern dsp(2, 2);
    dsp.block(0, 0).reinit(n_u, n_u);
    dsp.block(1, 0).reinit(n_p, n_u);
    dsp.block(0, 1).reinit(n_u, n_p);
    dsp.block(1, 1).reinit(n_p, n_p);
    dsp.collect_sizes();
    DoFTools::make_sparsity_pattern(dof_handler, dsp);
    sparsity_pattern.copy_from(dsp);
    system_matrix.reinit(sparsity_pattern);
    solution.reinit(2);
    solution.block(0).reinit(n_u);
    solution.block(1).reinit(n_p);
    solution.collect_sizes();
    system_rhs.reinit(2);
    system_rhs.block(0).reinit(n_u);
    system_rhs.block(1).reinit(n_p);
    system_rhs.collect_sizes();
  }
  template <int dim>
  void MixedLaplaceProblem<dim>::assemble_system()
  {
    QGauss<dim>     quadrature_formula(degree + 2);
    QGauss<dim - 1> face_quadrature_formula(degree + 2);
    FEValues<dim>     fe_values(fe,
                            quadrature_formula,
                            update_values | update_gradients |
                              update_quadrature_points | update_JxW_values);
    FEFaceValues<dim> fe_face_values(fe,
                                     face_quadrature_formula,
                                     update_values | update_normal_vectors |
                                       update_quadrature_points |
                                       update_JxW_values);
    const unsigned int dofs_per_cell   = fe.dofs_per_cell;
    const unsigned int n_q_points      = quadrature_formula.size();
    const unsigned int n_face_q_points = face_quadrature_formula.size();
    FullMatrix<double> local_matrix(dofs_per_cell, dofs_per_cell);
    Vector<double>     local_rhs(dofs_per_cell);
    std::vector<types::global_dof_index> local_dof_indices(dofs_per_cell);
    const RightHandSide<dim>          right_hand_side;
    const PressureBoundaryValues<dim> pressure_boundary_values;
    const KInverse<dim>               k_inverse;
    std::vector<double>         rhs_values(n_q_points);
    std::vector<double>         boundary_values(n_face_q_points);
    std::vector<Tensor<2, dim>> k_inverse_values(n_q_points);
    const FEValuesExtractors::Vector velocities(0);
    const FEValuesExtractors::Scalar pressure(dim);
    for (const auto &cell : dof_handler.active_cell_iterators())
      {
        fe_values.reinit(cell);
        local_matrix = 0;
        local_rhs    = 0;
        right_hand_side.value_list(fe_values.get_quadrature_points(),
                                   rhs_values);
        k_inverse.value_list(fe_values.get_quadrature_points(),
                             k_inverse_values);
        for (unsigned int q = 0; q < n_q_points; ++q)
          for (unsigned int i = 0; i < dofs_per_cell; ++i)
            {
              const Tensor<1, dim> phi_i_u = fe_values[velocities].value(i, q);
              const double div_phi_i_u = fe_values[velocities].divergence(i, q);
              const double phi_i_p     = fe_values[pressure].value(i, q);
              for (unsigned int j = 0; j < dofs_per_cell; ++j)
                {
                  const Tensor<1, dim> phi_j_u =
                    fe_values[velocities].value(j, q);
                  const double div_phi_j_u =
                    fe_values[velocities].divergence(j, q);
                  const double phi_j_p = fe_values[pressure].value(j, q);
                  local_matrix(i, j) +=
                    (phi_i_u * k_inverse_values[q] * phi_j_u //
                     - phi_i_p * div_phi_j_u                 //
                     - div_phi_i_u * phi_j_p)                //
                    * fe_values.JxW(q);
                }
              local_rhs(i) += -phi_i_p * rhs_values[q] * fe_values.JxW(q);
            }
        for (unsigned int face_n = 0;
             face_n < GeometryInfo<dim>::faces_per_cell;
             ++face_n)
          if (cell->at_boundary(face_n))
            {
              fe_face_values.reinit(cell, face_n);
              pressure_boundary_values.value_list(
                fe_face_values.get_quadrature_points(), boundary_values);
              for (unsigned int q = 0; q < n_face_q_points; ++q)
                for (unsigned int i = 0; i < dofs_per_cell; ++i)
                  local_rhs(i) += -(fe_face_values[velocities].value(i, q) * //
                                    fe_face_values.normal_vector(q) *        //
                                    boundary_values[q] *                     //
                                    fe_face_values.JxW(q));
            }
        cell->get_dof_indices(local_dof_indices);
        for (unsigned int i = 0; i < dofs_per_cell; ++i)
          for (unsigned int j = 0; j < dofs_per_cell; ++j)
            system_matrix.add(local_dof_indices[i],
                              local_dof_indices[j],
                              local_matrix(i, j));
        for (unsigned int i = 0; i < dofs_per_cell; ++i)
          system_rhs(local_dof_indices[i]) += local_rhs(i);
      }
  }
  template <class MatrixType>
  class InverseMatrix : public Subscriptor
  {
  public:
    InverseMatrix(const MatrixType &m);
    void vmult(Vector<double> &dst, const Vector<double> &src) const;
  private:
    const SmartPointer<const MatrixType> matrix;
  };
  template <class MatrixType>
  InverseMatrix<MatrixType>::InverseMatrix(const MatrixType &m)
    : matrix(&m)
  {}
  template <class MatrixType>
  void InverseMatrix<MatrixType>::vmult(Vector<double> &      dst,
                                        const Vector<double> &src) const
  {
    SolverControl solver_control(std::max<unsigned int>(src.size(), 200),
                                 1e-8 * src.l2_norm());
    SolverCG<>    cg(solver_control);
    dst = 0;
    cg.solve(*matrix, dst, src, PreconditionIdentity());
  }
  class SchurComplement : public Subscriptor
  {
  public:
    SchurComplement(const BlockSparseMatrix<double> &          A,
                    const InverseMatrix<SparseMatrix<double>> &Minv);
    void vmult(Vector<double> &dst, const Vector<double> &src) const;
  private:
    const SmartPointer<const BlockSparseMatrix<double>>           system_matrix;
    const SmartPointer<const InverseMatrix<SparseMatrix<double>>> m_inverse;
    mutable Vector<double> tmp1, tmp2;
  };
  SchurComplement ::SchurComplement(
    const BlockSparseMatrix<double> &          A,
    const InverseMatrix<SparseMatrix<double>> &Minv)
    : system_matrix(&A)
    , m_inverse(&Minv)
    , tmp1(A.block(0, 0).m())
    , tmp2(A.block(0, 0).m())
  {}
  void SchurComplement::vmult(Vector<double> &      dst,
                              const Vector<double> &src) const
  {
    system_matrix->block(0, 1).vmult(tmp1, src);
    m_inverse->vmult(tmp2, tmp1);
    system_matrix->block(1, 0).vmult(dst, tmp2);
  }
  class ApproximateSchurComplement : public Subscriptor
  {
  public:
    ApproximateSchurComplement(const BlockSparseMatrix<double> &A);
    void vmult(Vector<double> &dst, const Vector<double> &src) const;
  private:
    const SmartPointer<const BlockSparseMatrix<double>> system_matrix;
    mutable Vector<double> tmp1, tmp2;
  };
  ApproximateSchurComplement::ApproximateSchurComplement(
    const BlockSparseMatrix<double> &A)
    : system_matrix(&A)
    , tmp1(A.block(0, 0).m())
    , tmp2(A.block(0, 0).m())
  {}
  void ApproximateSchurComplement::vmult(Vector<double> &      dst,
                                         const Vector<double> &src) const
  {
    system_matrix->block(0, 1).vmult(tmp1, src);
    system_matrix->block(0, 0).precondition_Jacobi(tmp2, tmp1);
    system_matrix->block(1, 0).vmult(dst, tmp2);
  }
  template <int dim>
  void MixedLaplaceProblem<dim>::solve()
  {
    InverseMatrix<SparseMatrix<double>> inverse_mass(system_matrix.block(0, 0));
    Vector<double>                      tmp(solution.block(0).size());
    {
      SchurComplement schur_complement(system_matrix, inverse_mass);
      Vector<double>  schur_rhs(solution.block(1).size());
      inverse_mass.vmult(tmp, system_rhs.block(0));
      system_matrix.block(1, 0).vmult(schur_rhs, tmp);
      schur_rhs -= system_rhs.block(1);
      SolverControl solver_control(solution.block(1).size(),
                                   1e-12 * schur_rhs.l2_norm());
      SolverCG<>    cg(solver_control);
      ApproximateSchurComplement approximate_schur(system_matrix);
      InverseMatrix<ApproximateSchurComplement> approximate_inverse(
        approximate_schur);
      cg.solve(schur_complement,
               solution.block(1),
               schur_rhs,
               approximate_inverse);
      std::cout << solver_control.last_step()
                << " CG Schur complement iterations to obtain convergence."
                << std::endl;
    }
    {
      system_matrix.block(0, 1).vmult(tmp, solution.block(1));
      tmp *= -1;
      tmp += system_rhs.block(0);
      inverse_mass.vmult(solution.block(0), tmp);
    }
  }
  template <int dim>
  void MixedLaplaceProblem<dim>::compute_errors() const
  {
    const ComponentSelectFunction<dim> pressure_mask(dim, dim + 1);
    const ComponentSelectFunction<dim> velocity_mask(std::make_pair(0, dim),
                                                     dim + 1);
    ExactSolution<dim> exact_solution;
    Vector<double>     cellwise_errors(triangulation.n_active_cells());
    QTrapez<1>     q_trapez;
    QIterated<dim> quadrature(q_trapez, degree + 2);
    VectorTools::integrate_difference(dof_handler,
                                      solution,
                                      exact_solution,
                                      cellwise_errors,
                                      quadrature,
                                      VectorTools::L2_norm,
                                      &pressure_mask);
    const double p_l2_error =
      VectorTools::compute_global_error(triangulation,
                                        cellwise_errors,
                                        VectorTools::L2_norm);
    VectorTools::integrate_difference(dof_handler,
                                      solution,
                                      exact_solution,
                                      cellwise_errors,
                                      quadrature,
                                      VectorTools::L2_norm,
                                      &velocity_mask);
    const double u_l2_error =
      VectorTools::compute_global_error(triangulation,
                                        cellwise_errors,
                                        VectorTools::L2_norm);
    std::cout << "Errors: ||e_p||_L2 = " << p_l2_error
              << ",   ||e_u||_L2 = " << u_l2_error << std::endl;
  }
  template <int dim>
  void MixedLaplaceProblem<dim>::output_results() const
  {
    std::vector<std::string> solution_names(dim, "u");
    solution_names.emplace_back("p");
    std::vector<DataComponentInterpretation::DataComponentInterpretation>
      interpretation(dim,
                     DataComponentInterpretation::component_is_part_of_vector);
    interpretation.push_back(DataComponentInterpretation::component_is_scalar);
    DataOut<dim> data_out;
    data_out.add_data_vector(dof_handler,
                             solution,
                             solution_names,
                             interpretation);
    data_out.build_patches(degree + 1);
    std::ofstream output("solution.vtu");
    data_out.write_vtu(output);
  }
  template <int dim>
  void MixedLaplaceProblem<dim>::run()
  {
    make_grid_and_dofs();
    assemble_system();
    solve();
    compute_errors();
    output_results();
  }
} // namespace Step20
int main()
{
  try
    {
      using namespace dealii;
      using namespace Step20;
      MixedLaplaceProblem<2> mixed_laplace_problem(0);
      mixed_laplace_problem.run();
    }
  catch (std::exception &exc)
    {
      std::cerr << std::endl
                << std::endl
                << "----------------------------------------------------"
                << std::endl;
      std::cerr << "Exception on processing: " << std::endl
                << exc.what() << std::endl
                << "Aborting!" << std::endl
                << "----------------------------------------------------"
                << std::endl;
      return 1;
    }
  catch (...)
    {
      std::cerr << std::endl
                << std::endl
                << "----------------------------------------------------"
                << std::endl;
      std::cerr << "Unknown exception!" << std::endl
                << "Aborting!" << std::endl
                << "----------------------------------------------------"
                << std::endl;
      return 1;
    }
  return 0;
}

Introduction

Formulation, weak form, and discrete problem

Assembling the linear system

Linear solvers and preconditioners

Solving using the Schur complement

A preconditioner for the Schur complement

A remark on similar functionality in deal.II

Definition of the test case

The commented program

Include files

The MixedLaplaceProblem class template

Right hand side, boundary values, and exact solution

The inverse permeability tensor

MixedLaplaceProblem class implementation

MixedLaplaceProblem::MixedLaplaceProblem

MixedLaplaceProblem::make_grid_and_dofs

MixedLaplaceProblem::assemble_system

Linear solvers and preconditioners

The InverseMatrix class template

The SchurComplement class

The ApproximateSchurComplement class

MixedLaplace::solve

MixedLaplaceProblem class implementation (continued)

MixedLaplace::compute_errors

MixedLaplace::output_results

MixedLaplace::run

The main function

Results

Output of the program and graphical visualization

Convergence

Possibilities for extensions

More realistic permeability fields

Better linear solvers

The plain program

The `MixedLaplaceProblem` class template

The `InverseMatrix` class template

The `SchurComplement` class

The `ApproximateSchurComplement` class

The `main` function