Solving a Saddle Point Problem

A finite-element saddle-point solver written with deal.II

This post walks through a research/coursework project that I worked on this during spring 2025 semester: a two-component finite-element solver, written in C++ using the deal.II library, for a saddle-point system that arises when one tries to compute the harmonic Riesz representation of a given functional in the dual of $H^1$. The full code documentation lives at saddle-point-project url, it is grown out of the MATH 676 project that I completed in spring 2025.

The post is organized as follows. First I describe the continuous problem and its saddle-point formulation; then I write down the weak form that the code actually discretizes. After that I give an overview of the implementation choices we made on top of deal.II’s tutorial step-6, and finally I discuss the convergence results from running the program on a manufactured solution.


1. Problem setup

Let $\Omega \subset \mathbb{R}^2$ be a bounded domain with boundary $\Gamma = \partial \Omega$. In our experiments $\Omega$ is the disk of radius $\pi$, so $\Gamma$ is a smooth closed curve with constant total curvature $\kappa = 1$ and outward unit normal $n = \langle x, y \rangle / |x|$.

The underlying motivation is the following. Given a functional $\ell \in (H^1(\Omega))^*$, the harmonic Riesz representative of $\ell$ is the (unique) harmonic function $u$ that realizes $\ell$ through the $H^1$-inner product. Writing the optimality conditions of the corresponding constrained minimization with a Lagrange multiplier $p$ leads to a coupled PDE system in $(p,u)$, which has the structure of a saddle-point problem: the multiplier $p$ enforces the harmonic constraint on $u$, and $u$ enforces the data condition encoded by $\ell$.

Strong form

Concretely, the strong form of the system we want to solve is

$$ \begin{aligned} u - \Delta_{\Gamma}\, u &= g & &\text{on } \Gamma, \\ \Delta u &= 0 & &\text{in } \Omega, \\ -\Delta p &= f_1 & &\text{in } \Omega, \end{aligned} $$

where $\Delta_\Gamma$ is the surface Laplacian (Laplace–Beltrami operator, see https://arxiv.org/abs/1906.02786) on the boundary curve $\Gamma$, and $f_1$, $g$ are given data (with $f_1 \in H^{-1}(\Omega)$ and $g$ defined on $\Gamma$). The two unknowns $p,u \in H^1(\Omega)$ are coupled through the boundary trace of $u$: the first equation lives on $\Gamma$, while the other two live in the bulk $\Omega$.

The second equation, $\Delta u = 0$, is the constraint (harmonicity of $u$); the third equation, $-\Delta p = f_1$, is the equation for the multiplier $p$; the first equation is the consistency condition that links the boundary trace of $u$ to the prescribed data $g$ through the surface operator $u - \Delta_\Gamma u$.

Weak / saddle-point form

Multiplying by test functions $v, q \in C^\infty(\overline{\Omega})$ and integrating by parts in the usual way, we obtain the weak form that the code actually discretizes:

$$ \begin{aligned} \int_{\Gamma} u\, v + \int_{\Gamma} \nabla_{\Gamma} u \cdot \nabla_{\Gamma} v + \int_{\Omega} A\, \nabla p \cdot \nabla v &= \int_{\Omega} f_1\, v + \int_{\Gamma} g\, v \quad \forall v, \\[4pt] \int_{\Omega} A\, \nabla u \cdot \nabla q &= 0 \quad \forall q, \end{aligned} $$

where $A$ is a coefficient matrix that defaults to the identity in our runs (the function coefficient(...) in step-6.h). The first equation contains both a bulk Laplace term in $p$ and a surface mass + surface stiffness contribution in $u$ over $\Gamma$, while the second equation is just a bulk Laplace problem in $u$ tested against $q$. The off-diagonal coupling between $p$ (the multiplier) and $u$ is what gives the system its saddle-point character.

This form is what makes the implementation slightly more interesting than a plain $-\Delta u = f$ tutorial code: in addition to assembling the standard cell terms, we also need to assemble face terms on $\Gamma$ involving the tangential gradient $\nabla_\Gamma$.

Manufactured solution

To verify the implementation we use the method of manufactured solutions on the disk of radius $\pi$. The components of the exact solution are chosen as

$$ p(x,y) = e^{x} \cos y, \qquad u(x,y) = -2 x y, $$

with $f_1$ and $g$ computed by plugging $(p,u)$ into the strong form. The bulk right-hand side $f_1 = -\Delta p$ is straightforward; the boundary right-hand side $g = u - \Delta_\Gamma u$ requires a little more work because of the surface Laplacian. Using the identity (see deal.II step-38)

$$ -\Delta_\Gamma u = -\Delta u + n^\top D^2 u\, n + (n\cdot \nabla u)\, \kappa, $$

together with $\kappa = 1$ and $n = \langle x, y \rangle$ on the circle of radius $\pi$, one obtains $-\Delta_\Gamma u = -8 x y$, and from there the explicit form of $g$. These analytical right-hand sides are passed to the program at runtime through a param.prm file (more on that below).


2. Implementation overview (deal.II)

The starting point of the implementation is the classic deal.II tutorial step-6, which solves the Laplace problem on a disk with adaptive mesh refinement and hanging-node constraints. From there we add the pieces we need to handle the two-component, vector-valued saddle-point system.

The directory layout is

saddle-point/
├── CMakeLists.txt
├── include/
│   ├── rhs_function.h     # FunctionParser-based RHS
│   └── step-6.h           # SaddlePointProblem<dim> class
├── source/
│   ├── exact-solution.cc  # manufactured solution (boundary data)
│   ├── solution.cc        # manufactured solution (full)
│   ├── step-6.cc          # SaddlePointProblem<dim> implementation
│   └── main.cc            # entry point + param.prm parsing
└── param/
    └── param.prm          # runtime parameters

Key design choices

A few aspects of the implementation are worth highlighting.

The linear solver

Once the system matrix and right-hand side are assembled, the linear system is solved with the Conjugate Gradient (CG) method from deal.II’s SolverCG. CG is appropriate here because, after applying the constraints, the resulting matrix is still symmetric positive definite for the discretization we use. The SolverControl object stops CG when either of the following holds:

For larger problems CG starts to become the bottleneck; in the discussion section below we’ll see that the iteration count grows roughly linearly with the number of refinements. A natural extension would be to add a preconditioner — for instance SparseILU<double> — or to switch to GMRES.

How to build and run

The build follows the standard deal.II tutorial pattern. From the saddle-point/build directory:

make distclean && cmake .. && make -j8 && ./saddle-point ../param/param.prm

The executable writes one .vtk file per refinement cycle (solution-0.vtk, solution-1.vtk, …) and an exact.vtu containing the manufactured solution sampled on the finest mesh; both are easy to open in ParaView.


3. Results

Here is the actual output from a run with linear elements ($Q_1 \times Q_1$, i.e. Finite element degree = 1) on 6 refinement cycles of the disk of radius $\pi$:

$ ./saddle-point ./../param/param.prm
Cycle 0:
   Number of active cells:       20
   Number of degrees of freedom: 50
Iterations required for convergence: 10
Norm of residual at convergence: 1.70228e-05
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 3.54796
H1 error = 9.76966

cells dofs      L2            H1
   20   50 3.548e+00 - - 9.770e+00 - -
Cycle 1:
   Number of active cells:       80
   Number of degrees of freedom: 178
Iterations required for convergence: 21
Norm of residual at convergence: 4.71245e-05
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 2.03509
H1 error = 7.44833

cells dofs         L2                  H1
   20   50 3.548e+00    -    - 9.770e+00    -    -
   80  178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
Cycle 2:
   Number of active cells:       320
   Number of degrees of freedom: 674
Iterations required for convergence: 44
Norm of residual at convergence: 0.000106592
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.575904
H1 error = 3.57117

cells dofs         L2                  H1
   20   50 3.548e+00    -    - 9.770e+00    -    -
   80  178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320  674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
Cycle 3:
   Number of active cells:       1280
   Number of degrees of freedom: 2626
Iterations required for convergence: 90
Norm of residual at convergence: 0.000158353
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.147966
H1 error = 1.75396

cells dofs         L2                  H1
   20   50 3.548e+00    -    - 9.770e+00    -    -
   80  178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320  674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
 1280 2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
Cycle 4:
   Number of active cells:       5120
   Number of degrees of freedom: 10370
Iterations required for convergence: 180
Norm of residual at convergence: 0.000203561
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.0372619
H1 error = 0.872958

cells dofs          L2                  H1
   20    50 3.548e+00    -    - 9.770e+00    -    -
   80   178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320   674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
 1280  2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
 5120 10370 3.726e-02 3.97 1.99 8.730e-01 2.01 1.01
Cycle 5:
   Number of active cells:       20480
   Number of degrees of freedom: 41218
Iterations required for convergence: 358
Norm of residual at convergence: 0.00031578
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.00936616
H1 error = 0.435983

cells dofs          L2                  H1
   20    50 3.548e+00    -    - 9.770e+00    -    -
   80   178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320   674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
 1280  2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
 5120 10370 3.726e-02 3.97 1.99 8.730e-01 2.01 1.01
20480 41218 9.366e-03 3.98 1.99 4.360e-01 2.00 1.00

For each refinement cycle the program prints:

At the end, the convergence table is reprinted in full. For each error column the three sub-columns are: the error itself, the ratio between consecutive errors, and the $\log_2$ of that ratio (i.e. the empirical convergence order).

Reading the convergence rates

With linear ($Q_1$) finite elements, classical FEM theory predicts

$$ \|u - u_h\|_{L^2(\Omega)} = \mathcal{O}(h^2), \qquad \|u - u_h\|_{H^1(\Omega)} = \mathcal{O}(h), $$

so on uniformly refined meshes (where $h$ halves at each cycle) we expect the error ratios to be roughly $4$ in $L^2$ and $2$ in $H^1$, giving observed convergence rates near $2$ and $1$ respectively. Looking at the bottom rows of the table:

Cycle $L^2$ ratio $\log_2$ rate $H^1$ ratio $\log_2$ rate
1 1.74 0.80 1.31 0.39
2 3.53 1.82 2.09 1.06
3 3.89 1.96 2.04 1.03
4 3.97 1.99 2.01 1.01
5 3.98 1.99 2.00 1.00

— the asymptotic rates settle in cleanly at $\approx 2.0$ for $L^2$ and $\approx 1.0$ for $H^1$, which is exactly what theory predicts for a well-posed problem with smooth solution and linear elements. The first couple of cycles are too coarse for the asymptotic behavior to be visible, but from cycle 2 onward the agreement is essentially perfect.

This is a great sanity check that the (complicated) boundary-integral plus tangential-gradient assembly is correct: a programming bug in the surface terms would typically show up as either reduced order or as an error that fails to decrease at all.

CG iteration count

One thing that is also visible in the output is the growth of the CG iteration count: $10, 21, 44, 90, 180, 358$. The doubling per cycle could be a consequence of running CG without a preconditioner on a system whose condition number scales like $h^{-2}$ — halving $h$ roughly doubles the iteration count. This is fine at the meshes we report here, but it is something one would want to address when pushing to much finer meshes, by switching to a preconditioned CG (or to GMRES for non-symmetric variants).

What changes with quadratic elements

Without recompiling, setting Finite element degree = 2 in param.prm and re-running gives an analogous convergence study with $Q_2 \times Q_2$ elements. There the expected asymptotic rates are $\mathcal{O}(h^3)$ in $L^2$ and $\mathcal{O}(h^2)$ in $H^1$. In practice, the rates we observe match theory at small/moderate cycles, and only on the very finest meshes does the unpreconditioned CG solver start to struggle (its tolerance becomes the dominant source of error, not the discretization). That points at the linear solver as the next thing to improve.


4. Summary and possible extensions

The project was a good exercise in putting several deal.II features together inside a single, relatively small code base:

The numerical results confirm that the discretization is implemented correctly: the observed $L^2$ and $H^1$ convergence rates match the theoretical $\mathcal{O}(h^{k+1})$ / $\mathcal{O}(h^{k})$ behavior for $Q_k$ elements, on a smooth manufactured solution.

The next steps to try are:

Thanks for reading!