Solving a Saddle Point Problem
May 13, 2026A finite-element saddle-point solver written with deal.II
This post walks through a research/coursework project
that I worked on this
during spring 2025 semester:
a two-component finite-element solver, written in C++ using
the deal.II library, for a saddle-point system that
arises when one tries to compute the harmonic Riesz representation of a given
functional in the dual of $H^1$. The full code documentation lives at
saddle-point-project
url, it is grown out of the MATH 676 project that I completed in spring 2025.
The post is organized as follows. First I describe the continuous problem and
its saddle-point formulation; then I write down the weak form that the code
actually discretizes. After that I give an overview of the implementation
choices we made on top of deal.II’s tutorial step-6, and finally I discuss
the convergence results from running the program on a manufactured solution.
1. Problem setup
Let $\Omega \subset \mathbb{R}^2$ be a bounded domain with boundary $\Gamma = \partial \Omega$. In our experiments $\Omega$ is the disk of radius $\pi$, so $\Gamma$ is a smooth closed curve with constant total curvature $\kappa = 1$ and outward unit normal $n = \langle x, y \rangle / |x|$.
The underlying motivation is the following. Given a functional $\ell \in (H^1(\Omega))^*$, the harmonic Riesz representative of $\ell$ is the (unique) harmonic function $u$ that realizes $\ell$ through the $H^1$-inner product. Writing the optimality conditions of the corresponding constrained minimization with a Lagrange multiplier $p$ leads to a coupled PDE system in $(p,u)$, which has the structure of a saddle-point problem: the multiplier $p$ enforces the harmonic constraint on $u$, and $u$ enforces the data condition encoded by $\ell$.
Strong form
Concretely, the strong form of the system we want to solve is
$$ \begin{aligned} u - \Delta_{\Gamma}\, u &= g & &\text{on } \Gamma, \\ \Delta u &= 0 & &\text{in } \Omega, \\ -\Delta p &= f_1 & &\text{in } \Omega, \end{aligned} $$where $\Delta_\Gamma$ is the surface Laplacian (Laplace–Beltrami operator, see https://arxiv.org/abs/1906.02786) on the boundary curve $\Gamma$, and $f_1$, $g$ are given data (with $f_1 \in H^{-1}(\Omega)$ and $g$ defined on $\Gamma$). The two unknowns $p,u \in H^1(\Omega)$ are coupled through the boundary trace of $u$: the first equation lives on $\Gamma$, while the other two live in the bulk $\Omega$.
The second equation, $\Delta u = 0$, is the constraint (harmonicity of $u$); the third equation, $-\Delta p = f_1$, is the equation for the multiplier $p$; the first equation is the consistency condition that links the boundary trace of $u$ to the prescribed data $g$ through the surface operator $u - \Delta_\Gamma u$.
Weak / saddle-point form
Multiplying by test functions $v, q \in C^\infty(\overline{\Omega})$ and integrating by parts in the usual way, we obtain the weak form that the code actually discretizes:
$$ \begin{aligned} \int_{\Gamma} u\, v + \int_{\Gamma} \nabla_{\Gamma} u \cdot \nabla_{\Gamma} v + \int_{\Omega} A\, \nabla p \cdot \nabla v &= \int_{\Omega} f_1\, v + \int_{\Gamma} g\, v \quad \forall v, \\[4pt] \int_{\Omega} A\, \nabla u \cdot \nabla q &= 0 \quad \forall q, \end{aligned} $$where $A$ is a coefficient matrix that defaults to the identity in our runs
(the function coefficient(...) in step-6.h). The first equation contains
both a bulk Laplace term in $p$ and a surface mass + surface stiffness
contribution in $u$ over $\Gamma$, while the second equation is just a bulk
Laplace problem in $u$ tested against $q$. The off-diagonal coupling
between $p$ (the multiplier) and $u$ is what gives the system its
saddle-point character.
This form is what makes the implementation slightly more interesting than a plain $-\Delta u = f$ tutorial code: in addition to assembling the standard cell terms, we also need to assemble face terms on $\Gamma$ involving the tangential gradient $\nabla_\Gamma$.
Manufactured solution
To verify the implementation we use the method of manufactured solutions on the disk of radius $\pi$. The components of the exact solution are chosen as
$$ p(x,y) = e^{x} \cos y, \qquad u(x,y) = -2 x y, $$with $f_1$ and $g$ computed by plugging $(p,u)$ into the strong form. The
bulk right-hand side $f_1 = -\Delta p$ is straightforward; the boundary
right-hand side $g = u - \Delta_\Gamma u$ requires a little more work
because of the surface Laplacian. Using the identity (see deal.II
step-38)
together with $\kappa = 1$ and $n = \langle x, y \rangle$ on the circle of
radius $\pi$, one obtains $-\Delta_\Gamma u = -8 x y$, and from there the
explicit form of $g$. These analytical right-hand sides are passed to the
program at runtime through a param.prm file (more on that below).
2. Implementation overview (deal.II)
The starting point of the implementation is the classic deal.II tutorial
step-6, which
solves the Laplace problem on a disk with adaptive mesh refinement and
hanging-node constraints. From there we add the pieces we need to handle
the two-component, vector-valued saddle-point system.
The directory layout is
saddle-point/
├── CMakeLists.txt
├── include/
│ ├── rhs_function.h # FunctionParser-based RHS
│ └── step-6.h # SaddlePointProblem<dim> class
├── source/
│ ├── exact-solution.cc # manufactured solution (boundary data)
│ ├── solution.cc # manufactured solution (full)
│ ├── step-6.cc # SaddlePointProblem<dim> implementation
│ └── main.cc # entry point + param.prm parsing
└── param/
└── param.prm # runtime parameters
Key design choices
A few aspects of the implementation are worth highlighting.
-
Vector-valued FE via
FESystem. Since we have two scalar components $p$ and $u$ that live in the same $H^1(\Omega)$ space and that are coupled through the bilinear form, we build the finite element asFESystem<dim>(FE_Q<dim>(degree), 2)— two copies of a scalar $Q_k$ element. Inside the assembly loop, we useFEValuesExtractors::Scalar v0(0); FEValuesExtractors::Scalar v1(1);to project an
FEValuesobject onto the $p$-component and the $u$-component respectively, exactly the same way deal.II’sstep-20does for mixed Poisson. -
Boundary face assembly with tangential gradients. The $\int_\Gamma u v + \int_\Gamma \nabla_\Gamma u \cdot \nabla_\Gamma v$ contributions are assembled in a separate loop over faces marked as being on the boundary, mirroring the surface-PDE assembly in
step-38. We allocate dedicatedFEFaceValuesobjects with their own face quadrature formula, and we use the tangential gradient $\nabla_\Gamma w = \nabla w - (n\cdot\nabla w), n$ at each quadrature point. -
Hanging-node + Dirichlet constraints in one object. Following the modern deal.II style, we keep a single
AffineConstraintsobject that stores both the hanging-node constraints (from local mesh refinement) and the Dirichlet constraints (the manufactured solution evaluated on $\Gamma$). Local matrix and vector contributions are pushed into the global system withconstraints.distribute_local_to_global(...), and the correspondingdistribute(...)call is made on the solution vector after the linear solve. -
Runtime parameters via
ParameterHandler. We do not have to recompile the program to change the polynomial degree, the number of refinement cycles, the CG tolerance, or the right-hand sides: all of these are read fromparam.prmat startup. The pattern is the same as in deal.II’sstep-28. -
Symbolic right-hand sides via
FunctionParser. The right-hand sides $f_1$ and $g$ are read fromparam.prmas ordinary strings, e.g.set custom rhs gamma = (-10)*x*y, and parsed at runtime via deal.II’sFunctionParserclass (a wrapper around muparser). This makes it trivial to swap in a different manufactured solution without touching any C++ code. -
Convergence tables via
ConvergenceTable. After every refinement cycle, we record the cell count, DoF count, $L^2$-error and $H^1$-error and letConvergenceTablecompute the per-refinement reduction ratios and the corresponding $\log_2$ rates. The same object is then dumped to a.texfile (error-global.tex) for the written report.
The linear solver
Once the system matrix and right-hand side are assembled, the linear
system is solved with the Conjugate Gradient (CG) method from deal.II’s
SolverCG. CG is appropriate here because, after applying the
constraints, the resulting matrix is still symmetric positive definite
for the discretization we use. The SolverControl object stops CG when
either of the following holds:
- the iteration count exceeds 9999, or
- $|r_k| \le \texttt{tol} \cdot |b|$, where
tolis the user-specifiedPower Iteration Toleranceinparam.prm(we use $10^{-6}$).
For larger problems CG starts to become the bottleneck; in the
discussion section below we’ll see that the iteration count grows
roughly linearly with the number of refinements. A natural extension
would be to add a preconditioner — for instance SparseILU<double> —
or to switch to GMRES.
How to build and run
The build follows the standard deal.II tutorial pattern. From the
saddle-point/build directory:
make distclean && cmake .. && make -j8 && ./saddle-point ../param/param.prm
The executable writes one .vtk file per refinement cycle
(solution-0.vtk, solution-1.vtk, …) and an exact.vtu containing
the manufactured solution sampled on the finest mesh; both are easy to
open in ParaView.
3. Results
Here is the actual output from a run with linear elements
($Q_1 \times Q_1$, i.e. Finite element degree = 1) on 6 refinement
cycles of the disk of radius $\pi$:
$ ./saddle-point ./../param/param.prm
Cycle 0:
Number of active cells: 20
Number of degrees of freedom: 50
Iterations required for convergence: 10
Norm of residual at convergence: 1.70228e-05
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 3.54796
H1 error = 9.76966
cells dofs L2 H1
20 50 3.548e+00 - - 9.770e+00 - -
Cycle 1:
Number of active cells: 80
Number of degrees of freedom: 178
Iterations required for convergence: 21
Norm of residual at convergence: 4.71245e-05
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 2.03509
H1 error = 7.44833
cells dofs L2 H1
20 50 3.548e+00 - - 9.770e+00 - -
80 178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
Cycle 2:
Number of active cells: 320
Number of degrees of freedom: 674
Iterations required for convergence: 44
Norm of residual at convergence: 0.000106592
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.575904
H1 error = 3.57117
cells dofs L2 H1
20 50 3.548e+00 - - 9.770e+00 - -
80 178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
320 674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
Cycle 3:
Number of active cells: 1280
Number of degrees of freedom: 2626
Iterations required for convergence: 90
Norm of residual at convergence: 0.000158353
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.147966
H1 error = 1.75396
cells dofs L2 H1
20 50 3.548e+00 - - 9.770e+00 - -
80 178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
320 674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
1280 2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
Cycle 4:
Number of active cells: 5120
Number of degrees of freedom: 10370
Iterations required for convergence: 180
Norm of residual at convergence: 0.000203561
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.0372619
H1 error = 0.872958
cells dofs L2 H1
20 50 3.548e+00 - - 9.770e+00 - -
80 178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
320 674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
1280 2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
5120 10370 3.726e-02 3.97 1.99 8.730e-01 2.01 1.01
Cycle 5:
Number of active cells: 20480
Number of degrees of freedom: 41218
Iterations required for convergence: 358
Norm of residual at convergence: 0.00031578
output_exact_results function is called with exact.vtu
manufactured components number: 2
L2 error = 0.00936616
H1 error = 0.435983
cells dofs L2 H1
20 50 3.548e+00 - - 9.770e+00 - -
80 178 2.035e+00 1.74 0.80 7.448e+00 1.31 0.39
320 674 5.759e-01 3.53 1.82 3.571e+00 2.09 1.06
1280 2626 1.480e-01 3.89 1.96 1.754e+00 2.04 1.03
5120 10370 3.726e-02 3.97 1.99 8.730e-01 2.01 1.01
20480 41218 9.366e-03 3.98 1.99 4.360e-01 2.00 1.00
For each refinement cycle the program prints:
- the number of active cells (quadruples each cycle as expected from uniform refinement in 2D),
- the number of degrees of freedom (about twice the cell count, because there are two components),
- the number of CG iterations to reach the residual tolerance,
- the absolute residual norm at convergence, and
- the $L^2$ and $H^1$ errors against the manufactured solution.
At the end, the convergence table is reprinted in full. For each error column the three sub-columns are: the error itself, the ratio between consecutive errors, and the $\log_2$ of that ratio (i.e. the empirical convergence order).
Reading the convergence rates
With linear ($Q_1$) finite elements, classical FEM theory predicts
$$ \|u - u_h\|_{L^2(\Omega)} = \mathcal{O}(h^2), \qquad \|u - u_h\|_{H^1(\Omega)} = \mathcal{O}(h), $$so on uniformly refined meshes (where $h$ halves at each cycle) we expect the error ratios to be roughly $4$ in $L^2$ and $2$ in $H^1$, giving observed convergence rates near $2$ and $1$ respectively. Looking at the bottom rows of the table:
| Cycle | $L^2$ ratio | $\log_2$ rate | $H^1$ ratio | $\log_2$ rate |
|---|---|---|---|---|
| 1 | 1.74 | 0.80 | 1.31 | 0.39 |
| 2 | 3.53 | 1.82 | 2.09 | 1.06 |
| 3 | 3.89 | 1.96 | 2.04 | 1.03 |
| 4 | 3.97 | 1.99 | 2.01 | 1.01 |
| 5 | 3.98 | 1.99 | 2.00 | 1.00 |
— the asymptotic rates settle in cleanly at $\approx 2.0$ for $L^2$ and $\approx 1.0$ for $H^1$, which is exactly what theory predicts for a well-posed problem with smooth solution and linear elements. The first couple of cycles are too coarse for the asymptotic behavior to be visible, but from cycle 2 onward the agreement is essentially perfect.
This is a great sanity check that the (complicated) boundary-integral plus tangential-gradient assembly is correct: a programming bug in the surface terms would typically show up as either reduced order or as an error that fails to decrease at all.
CG iteration count
One thing that is also visible in the output is the growth of the CG iteration count: $10, 21, 44, 90, 180, 358$. The doubling per cycle could be a consequence of running CG without a preconditioner on a system whose condition number scales like $h^{-2}$ — halving $h$ roughly doubles the iteration count. This is fine at the meshes we report here, but it is something one would want to address when pushing to much finer meshes, by switching to a preconditioned CG (or to GMRES for non-symmetric variants).
What changes with quadratic elements
Without recompiling, setting Finite element degree = 2 in param.prm
and re-running gives an analogous convergence study with $Q_2 \times
Q_2$ elements. There the expected asymptotic rates are $\mathcal{O}(h^3)$
in $L^2$ and $\mathcal{O}(h^2)$ in $H^1$. In practice, the rates we
observe match theory at small/moderate cycles, and only on the very
finest meshes does the unpreconditioned CG solver start to struggle
(its tolerance becomes the dominant source of error, not the
discretization). That points at the linear solver
as the next thing to improve.
4. Summary and possible extensions
The project was a good exercise in putting several deal.II features together inside a single, relatively small code base:
- vector-valued finite elements with
FESystemandFEValuesExtractors, - mixed bulk + surface assembly using
FEValuesandFEFaceValueswith the tangential gradient, - adaptive refinement with hanging-node constraints, combined with
Dirichlet constraints, in a single
AffineConstraintsobject, - runtime configuration via
ParameterHandlerandFunctionParser, so that one binary can solve many different right-hand-side configurations, and - automated reporting via
ConvergenceTable, both to the terminal and to a LaTeX file.
The numerical results confirm that the discretization is implemented correctly: the observed $L^2$ and $H^1$ convergence rates match the theoretical $\mathcal{O}(h^{k+1})$ / $\mathcal{O}(h^{k})$ behavior for $Q_k$ elements, on a smooth manufactured solution.
The next steps to try are:
- replace the unpreconditioned CG with a preconditioned solver
(
SparseILU, AMG, or a block preconditioner that exploits the saddle-point structure); - exercise the symbolic
FunctionParserinterface more aggressively by running the solver on non-manufactured data, and visualizing the resulting harmonic Riesz representative in ParaView; - generalize the surface assembly to non-circular boundaries, where $\kappa$ and $n$ vary along $\Gamma$.
Thanks for reading!