Solving a Saddle Point Problem

May 13, 2026

A finite-element saddle-point solver written with deal.II

This post walks through a research/coursework project that I worked on this during spring 2025 semester: a two-component finite-element solver, written in C++ using the deal.II library, for a saddle-point system that arises when one tries to compute the harmonic Riesz representation of a given functional in the dual of $H^1$. The full code documentation lives at saddle-point-project url, it is grown out of the MATH 676 project that I completed in spring 2025.

The post is organized as follows. First I describe the continuous problem and its saddle-point formulation; then I write down the weak form that the code actually discretizes. After that I give an overview of the implementation choices we made on top of deal.II’s tutorial step-6, and finally I discuss the convergence results from running the program on a manufactured solution.

1. Problem setup

Let $\Omega \subset \mathbb{R}^2$ be a bounded domain with boundary $\Gamma = \partial \Omega$. In our experiments $\Omega$ is the disk of radius $\pi$, so $\Gamma$ is a smooth closed curve with constant total curvature $\kappa = 1$ and outward unit normal $n = \langle x, y \rangle / |x|$.

The underlying motivation is the following. Given a functional $\ell \in (H^1(\Omega))^*$, the harmonic Riesz representative of $\ell$ is the (unique) harmonic function $u$ that realizes $\ell$ through the $H^1$-inner product. Writing the optimality conditions of the corresponding constrained minimization with a Lagrange multiplier $p$ leads to a coupled PDE system in $(p,u)$, which has the structure of a saddle-point problem: the multiplier $p$ enforces the harmonic constraint on $u$, and $u$ enforces the data condition encoded by $\ell$.

Strong form

Concretely, the strong form of the system we want to solve is

$$ \begin{aligned} u - \Delta_{\Gamma}\, u &= g & &\text{on } \Gamma, \\ \Delta u &= 0 & &\text{in } \Omega, \\ -\Delta p &= f_1 & &\text{in } \Omega, \end{aligned} $$

where $\Delta_\Gamma$ is the surface Laplacian (Laplace–Beltrami operator, see https://arxiv.org/abs/1906.02786) on the boundary curve $\Gamma$, and $f_1$, $g$ are given data (with $f_1 \in H^{-1}(\Omega)$ and $g$ defined on $\Gamma$). The two unknowns $p,u \in H^1(\Omega)$ are coupled through the boundary trace of $u$: the first equation lives on $\Gamma$, while the other two live in the bulk $\Omega$.

The second equation, $\Delta u = 0$, is the constraint (harmonicity of $u$); the third equation, $-\Delta p = f_1$, is the equation for the multiplier $p$; the first equation is the consistency condition that links the boundary trace of $u$ to the prescribed data $g$ through the surface operator $u - \Delta_\Gamma u$.

Weak / saddle-point form

Multiplying by test functions $v, q \in C^\infty(\overline{\Omega})$ and integrating by parts in the usual way, we obtain the weak form that the code actually discretizes:

$$ \begin{aligned} \int_{\Gamma} u\, v + \int_{\Gamma} \nabla_{\Gamma} u \cdot \nabla_{\Gamma} v + \int_{\Omega} A\, \nabla p \cdot \nabla v &= \int_{\Omega} f_1\, v + \int_{\Gamma} g\, v \quad \forall v, \\[4pt] \int_{\Omega} A\, \nabla u \cdot \nabla q &= 0 \quad \forall q, \end{aligned} $$

where $A$ is a coefficient matrix that defaults to the identity in our runs (the function coefficient(...) in step-6.h). The first equation contains both a bulk Laplace term in $p$ and a surface mass + surface stiffness contribution in $u$ over $\Gamma$, while the second equation is just a bulk Laplace problem in $u$ tested against $q$. The off-diagonal coupling between $p$ (the multiplier) and $u$ is what gives the system its saddle-point character.

This form is what makes the implementation slightly more interesting than a plain $-\Delta u = f$ tutorial code: in addition to assembling the standard cell terms, we also need to assemble face terms on $\Gamma$ involving the tangential gradient $\nabla_\Gamma$.

Manufactured solution

To verify the implementation we use the method of manufactured solutions on the disk of radius $\pi$. The components of the exact solution are chosen as

$$ p(x,y) = e^{x} \cos y, \qquad u(x,y) = -2 x y, $$

with $f_1$ and $g$ computed by plugging $(p,u)$ into the strong form. The bulk right-hand side $f_1 = -\Delta p$ is straightforward; the boundary right-hand side $g = u - \Delta_\Gamma u$ requires a little more work because of the surface Laplacian. Using the identity (see deal.II step-38)

$$ -\Delta_\Gamma u = -\Delta u + n^\top D^2 u\, n + (n\cdot \nabla u)\, \kappa, $$

together with $\kappa = 1$ and $n = \langle x, y \rangle$ on the circle of radius $\pi$, one obtains $-\Delta_\Gamma u = -8 x y$, and from there the explicit form of $g$. These analytical right-hand sides are passed to the program at runtime through a param.prm file (more on that below).

2. Implementation overview (deal.II)

The starting point of the implementation is the classic deal.II tutorial step-6, which solves the Laplace problem on a disk with adaptive mesh refinement and hanging-node constraints. From there we add the pieces we need to handle the two-component, vector-valued saddle-point system.

The directory layout is

saddle-point/
├── CMakeLists.txt
├── include/
│   ├── rhs_function.h     # FunctionParser-based RHS
│   └── step-6.h           # SaddlePointProblem<dim> class
├── source/
│   ├── exact-solution.cc  # manufactured solution (boundary data)
│   ├── solution.cc        # manufactured solution (full)
│   ├── step-6.cc          # SaddlePointProblem<dim> implementation
│   └── main.cc            # entry point + param.prm parsing
└── param/
    └── param.prm          # runtime parameters

Key design choices

A few aspects of the implementation are worth highlighting.

Vector-valued FE via FESystem. Since we have two scalar components $p$ and $u$ that live in the same $H^1(\Omega)$ space and that are coupled through the bilinear form, we build the finite element as FESystem<dim>(FE_Q<dim>(degree), 2) — two copies of a scalar $Q_k$ element. Inside the assembly loop, we use
```
FEValuesExtractors::Scalar v0(0);
FEValuesExtractors::Scalar v1(1);
```
to project an FEValues object onto the $p$-component and the $u$-component respectively, exactly the same way deal.II’s step-20 does for mixed Poisson.
Boundary face assembly with tangential gradients. The $\int_\Gamma u v + \int_\Gamma \nabla_\Gamma u \cdot \nabla_\Gamma v$ contributions are assembled in a separate loop over faces marked as being on the boundary, mirroring the surface-PDE assembly in step-38. We allocate dedicated FEFaceValues objects with their own face quadrature formula, and we use the tangential gradient $\nabla_\Gamma w = \nabla w - (n\cdot\nabla w), n$ at each quadrature point.
Hanging-node + Dirichlet constraints in one object. Following the modern deal.II style, we keep a single AffineConstraints object that stores both the hanging-node constraints (from local mesh refinement) and the Dirichlet constraints (the manufactured solution evaluated on $\Gamma$). Local matrix and vector contributions are pushed into the global system with constraints.distribute_local_to_global(...), and the corresponding distribute(...) call is made on the solution vector after the linear solve.
Runtime parameters via ParameterHandler. We do not have to recompile the program to change the polynomial degree, the number of refinement cycles, the CG tolerance, or the right-hand sides: all of these are read from param.prm at startup. The pattern is the same as in deal.II’s step-28.
Symbolic right-hand sides via FunctionParser. The right-hand sides $f_1$ and $g$ are read from param.prm as ordinary strings, e.g. set custom rhs gamma = (-10)*x*y, and parsed at runtime via deal.II’s FunctionParser class (a wrapper around muparser). This makes it trivial to swap in a different manufactured solution without touching any C++ code.
Convergence tables via ConvergenceTable. After every refinement cycle, we record the cell count, DoF count, $L^2$-error and $H^1$-error and let ConvergenceTable compute the per-refinement reduction ratios and the corresponding $\log_2$ rates. The same object is then dumped to a .tex file (error-global.tex) for the written report.

The linear solver

Once the system matrix and right-hand side are assembled, the linear system is solved with the Conjugate Gradient (CG) method from deal.II’s SolverCG. CG is appropriate here because, after applying the constraints, the resulting matrix is still symmetric positive definite for the discretization we use. The SolverControl object stops CG when either of the following holds:

the iteration count exceeds 9999, or
$|r_k| \le \texttt{tol} \cdot |b|$, where tol is the user-specified Power Iteration Tolerance in param.prm (we use $10^{-6}$).

For larger problems CG starts to become the bottleneck; in the discussion section below we’ll see that the iteration count grows roughly linearly with the number of refinements. A natural extension would be to add a preconditioner — for instance SparseILU<double> — or to switch to GMRES.

How to build and run

The build follows the standard deal.II tutorial pattern. From the saddle-point/build directory:

make distclean && cmake .. && make -j8 && ./saddle-point ../param/param.prm

The executable writes one .vtk file per refinement cycle (solution-0.vtk, solution-1.vtk, …) and an exact.vtu containing the manufactured solution sampled on the finest mesh; both are easy to open in ParaView.

3. Results

Here is the actual output from a run with linear elements ($Q_1 \times Q_1$, i.e. Finite element degree = 1) on 6 refinement cycles of the disk of radius $\pi$:

$ ./saddle-point ./../param/param.prm
Cycle 0:
   Number of active cells:       20
   Number of degrees of freedom: 50
Iterations required for convergence: 10
Norm of residual at convergence: 1.70228e-05
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 3.54796
H1 error = 9.76966

cells dofs      L2            H1
   20   50 3.548e+00 - - 9.770e+00 - -
Cycle 1:
   Number of active cells:       80
   Number of degrees of freedom: 178
Iterations required for convergence: 21
Norm of residual at convergence: 4.71245e-05
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 2.03509
H1 error = 7.44833

cells dofs         L2                  H1
   20   50 3.548e+00    -    - 9.770e+00    -    -
   80  178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
Cycle 2:
   Number of active cells:       320
   Number of degrees of freedom: 674
Iterations required for convergence: 44
Norm of residual at convergence: 0.000106592
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.575904
H1 error = 3.57117

cells dofs         L2                  H1
   20   50 3.548e+00    -    - 9.770e+00    -    -
   80  178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320  674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
Cycle 3:
   Number of active cells:       1280
   Number of degrees of freedom: 2626
Iterations required for convergence: 90
Norm of residual at convergence: 0.000158353
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.147966
H1 error = 1.75396

cells dofs         L2                  H1
   20   50 3.548e+00    -    - 9.770e+00    -    -
   80  178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320  674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
 1280 2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
Cycle 4:
   Number of active cells:       5120
   Number of degrees of freedom: 10370
Iterations required for convergence: 180
Norm of residual at convergence: 0.000203561
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.0372619
H1 error = 0.872958

cells dofs          L2                  H1
   20    50 3.548e+00    -    - 9.770e+00    -    -
   80   178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320   674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
 1280  2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
 5120 10370 3.726e-02 3.97 1.99 8.730e-01 2.01 1.01
Cycle 5:
   Number of active cells:       20480
   Number of degrees of freedom: 41218
Iterations required for convergence: 358
Norm of residual at convergence: 0.00031578
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.00936616
H1 error = 0.435983

cells dofs          L2                  H1
   20    50 3.548e+00    -    - 9.770e+00    -    -
   80   178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
  320   674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
 1280  2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
 5120 10370 3.726e-02 3.97 1.99 8.730e-01 2.01 1.01
20480 41218 9.366e-03 3.98 1.99 4.360e-01 2.00 1.00

For each refinement cycle the program prints:

the number of active cells (quadruples each cycle as expected from uniform refinement in 2D),
the number of degrees of freedom (about twice the cell count, because there are two components),
the number of CG iterations to reach the residual tolerance,
the absolute residual norm at convergence, and
the $L^2$ and $H^1$ errors against the manufactured solution.

At the end, the convergence table is reprinted in full. For each error column the three sub-columns are: the error itself, the ratio between consecutive errors, and the $\log_2$ of that ratio (i.e. the empirical convergence order).

Reading the convergence rates

With linear ($Q_1$) finite elements, classical FEM theory predicts

$$ \|u - u_h\|_{L^2(\Omega)} = \mathcal{O}(h^2), \qquad \|u - u_h\|_{H^1(\Omega)} = \mathcal{O}(h), $$

so on uniformly refined meshes (where $h$ halves at each cycle) we expect the error ratios to be roughly $4$ in $L^2$ and $2$ in $H^1$, giving observed convergence rates near $2$ and $1$ respectively. Looking at the bottom rows of the table:

Cycle	$L^2$ ratio	$\log_2$ rate	$H^1$ ratio	$\log_2$ rate
1	1.74	0.80	1.31	0.39
2	3.53	1.82	2.09	1.06
3	3.89	1.96	2.04	1.03
4	3.97	1.99	2.01	1.01
5	3.98	1.99	2.00	1.00

— the asymptotic rates settle in cleanly at $\approx 2.0$ for $L^2$ and $\approx 1.0$ for $H^1$, which is exactly what theory predicts for a well-posed problem with smooth solution and linear elements. The first couple of cycles are too coarse for the asymptotic behavior to be visible, but from cycle 2 onward the agreement is essentially perfect.

This is a great sanity check that the (complicated) boundary-integral plus tangential-gradient assembly is correct: a programming bug in the surface terms would typically show up as either reduced order or as an error that fails to decrease at all.

CG iteration count

One thing that is also visible in the output is the growth of the CG iteration count: $10, 21, 44, 90, 180, 358$. The doubling per cycle could be a consequence of running CG without a preconditioner on a system whose condition number scales like $h^{-2}$ — halving $h$ roughly doubles the iteration count. This is fine at the meshes we report here, but it is something one would want to address when pushing to much finer meshes, by switching to a preconditioned CG (or to GMRES for non-symmetric variants).

What changes with quadratic elements

Without recompiling, setting Finite element degree = 2 in param.prm and re-running gives an analogous convergence study with $Q_2 \times Q_2$ elements. There the expected asymptotic rates are $\mathcal{O}(h^3)$ in $L^2$ and $\mathcal{O}(h^2)$ in $H^1$. In practice, the rates we observe match theory at small/moderate cycles, and only on the very finest meshes does the unpreconditioned CG solver start to struggle (its tolerance becomes the dominant source of error, not the discretization). That points at the linear solver as the next thing to improve.

4. Summary and possible extensions

The project was a good exercise in putting several deal.II features together inside a single, relatively small code base:

vector-valued finite elements with FESystem and FEValuesExtractors,
mixed bulk + surface assembly using FEValues and FEFaceValues with the tangential gradient,
adaptive refinement with hanging-node constraints, combined with Dirichlet constraints, in a single AffineConstraints object,
runtime configuration via ParameterHandler and FunctionParser, so that one binary can solve many different right-hand-side configurations, and
automated reporting via ConvergenceTable, both to the terminal and to a LaTeX file.

The numerical results confirm that the discretization is implemented correctly: the observed $L^2$ and $H^1$ convergence rates match the theoretical $\mathcal{O}(h^{k+1})$ / $\mathcal{O}(h^{k})$ behavior for $Q_k$ elements, on a smooth manufactured solution.

The next steps to try are:

replace the unpreconditioned CG with a preconditioned solver (SparseILU, AMG, or a block preconditioner that exploits the saddle-point structure);
exercise the symbolic FunctionParser interface more aggressively by running the solver on non-manufactured data, and visualizing the resulting harmonic Riesz representative in ParaView;
generalize the surface assembly to non-circular boundaries, where $\kappa$ and $n$ vary along $\Gamma$.

Thanks for reading!