A subspace framework for L
∞ model reduction

Emre Mengi

doi:10.1515/jnma-2023-0115

Enjoy 40% off

academic books on De Gruyter Brill *

Article Open Access

A subspace framework for L _∞ model reduction

Emre Mengi

Published/Copyright: January 15, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Journal of Numerical Mathematics Volume 33 Issue 3

Abstract

We propose an approach for the L ∞ model reduction of descriptor systems based on the minimization of the L ∞ objective by means of smooth optimization techniques. Direct applications of smooth optimization techniques are not feasible even for systems of modest order, since the optimization techniques converge at best at a linear rate requiring too many evaluations of the costly L ∞ -norm objective to be practical. We replace the original system with a system of smaller order interpolating the original system at points on the imaginary axis, minimize the L ∞ objective after this replacement, and refine the smaller system based on the minimization. We also describe how asymptotic stability constraints on the reduced system sought can be incorporated into our approach. The numerical experiments illustrate that the approach leads to locally optimal solutions to the L ∞ model reduction problem, and its capability to deal with systems of order a few ten thousands.

Keywords: H ∞ model reduction; descriptor system; quasi-Newton methods; Petrov–Galerkin projection; Hermite interpolation

MSC 2010: 65D05; 65F15; 65L80; 90C53; 93A15; 93C05

1 Introduction

Various applications give rise to linear time-invariant (LTI) descriptor systems; see, e.g., [1], [2], [3], [4], and references therein. A model order reduction problem for such a system typically concerns the approximation of the system with a system of smaller and prescribed order. Here we deal with one of them, namely the L ∞ model reduction problem formally defined below.

We assume that the LTI descriptor system under consideration is available in a state-space representation

(1) E x ′ ( t ) = A x ( t ) + B u ( t ) , y ( t ) = C x ( t ) + D u ( t )

for given matrices E , A ∈ R n × n , B ∈ R n × m , C ∈ R p × n , D ∈ R p × m . The transfer function of the descriptor system in (1) (defined over s ∈ C such that sE − A is invertible) is

(2) H ( s ) = C ( s E − A ) − 1 B + D

with the L ∞ norm

‖ H ‖ L ∞ ≔ sup ω ∈ R σ max ( H ( i ω ) ) = sup ω ∈ R , ω ⩾ 0 σ max ( H ( i ω ) ) ,

where σ _max(⋅) denotes the largest singular value of its matrix argument, and the last equality holds as A, B, C, D, E are real matrices. Note that we customarily set ‖ H ‖ L ∞ = sup ω ∈ R σ max ( H ( i ω ) ) = ∞ if H has a pole on the imaginary axis, or the norm of its restriction to the imaginary axis is not bounded. Let C + ≔ { z ∈ C | Re z > 0 } , and L ₂ denote a space of functions f : R → R k satisfying ‖ f ‖ L 2 ≔ ∫ − ∞ ∞ ‖ f ( t ) ‖ 2 2 d t < ∞ for some k ∈ Z + . To be specific, in our setting, L ₂ consists of either input functions u(t) or output functions y(t) (so k = m or k = p), and which one L ₂ refers to will be clear from the context. If the system in (1) is asymptotically stable with poles in the open left half of the complex plane, then the L ∞ norm of H is the same as the H ∞ norm of H defined as

‖ H ‖ H ∞ ≔ sup s ∈ C + σ max ( H ( s ) ) ,

which in turn is equal to the induced norm of the operator φ: L ₂ → L ₂ associated with (1) in the time domain mapping u to y as φu = y (for an explicit expression for φ see Ref. [3], Theorem 2.29]), that is given by

‖ φ ‖ L 2 ≔ sup ‖ φ u ‖ L 2 / ‖ u ‖ L 2 | u ∈ L 2 , u ≠ 0 .

Hence, under the asymptotic stability assumption on the descriptor system in (1), we have ‖ H ‖ L ∞ = ‖ H ‖ H ∞ = ‖ φ ‖ L 2 .

The L ∞ model reduction problem considered in this work for a given descriptor system of order n and for a prescribed positive integer r < n concerns finding a reduced descriptor system of order r that is closest to the given system of order n with respect to the L ∞ norm. Formally, let S r e d = ( A r e d , E r e d , B r e d , C r e d , D r e d ) denote a system of order r with the state-space representation

(3) E r e d x ′ ( t ) = A r e d x ( t ) + B r e d u ( t ) , y ( t ) = C r e d x ( t ) + D r e d u ( t ) ,

described by the matrices E r e d , A r e d ∈ R r × r , B r e d ∈ R r × m , C r e d ∈ R p × r , D r e d ∈ R p × m , and with the transfer function

(4) H ( s ; S r e d ) = C r e d ( s E r e d − A r e d ) − 1 B r e d + D r e d .

Furthermore, let S = (A, E, B, C, D) be the given system of order n and with the transfer function H as in (2). Setting σ ( ω ; S r e d ) ≔ σ max ( H ( i ω ) − H ( i ω ; S r e d ) ) , the L ∞ model reduction problem involves finding a descriptor system S ⋆ r e d of order r that minimizes the objective

(5) ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ = sup ω ∈ R σ ( ω ; S r e d ) = sup ω ∈ R , ω ⩾ 0 σ ( ω ; S r e d )

over all descriptor systems S r e d of order r . The objective in (5) is non-convex, and here we aim to determine a local minimizer of this objective numerically. The quality of the determined local minimizer also matters, however this issue is largely dependent upon with which reduced system of order r our approach is initialized.

Two important remarks are in order regarding the minimization of the objective in (5). First, in addition to non-convexity, an additional difficulty is the nonsmooth nature of the problem. The objective in (5) as a function of S r e d is typically not differentiable when σ ( ω ; S r e d ) has multiple global maximizers over ω ⩾ 0. Secondly, under asymptotic stability assumptions on the original system and the reduced system, the error ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ gives a uniform upper bound on how much the outputs of the original and reduced systems can differ. To be precise, suppose that the system S of order n, and the reduced system S r e d of order r are asymptotically stable. Furthermore, let us denote with φ and φ r the operators in the time domain corresponding to the systems in (1) and (3), respectively. For every u ∈ L ₂, we have

‖ y − y r ‖ L 2 ⩽ ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ ‖ u ‖ L 2 ,

where y , y r are such that y = φ u and y r = φ r u . This means that if a small error ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ can be ensured by the minimization of (5), then the output y r = φ r u of the minimizing reduced system approximates the original output y = φ u well uniformly over every input u of prescribed norm.

1.1 Literature and contributions

For an asymptotically stable descriptor system with the transfer function H, the H _∞ model reduction problem – that is, for a given small order r , finding an asymptotically stable system S r e d of order r such that ‖ H − H ( ⋅ ; S r e d ) ‖ H ∞ is small – has been under consideration for a long time. One of the classical approaches for the H ∞ model reduction problem is balanced truncation, which determines a state transformation so that the observability and controllability Gramians are the same diagonal matrix, and truncates the system matrices after applying this state transformation [5], [6], [7], [8]. The reduced system by balanced truncation is typically not a local minimizer of ‖ H − H ( ⋅ ; S r e d ) ‖ H ∞ over S r e d , though it usually is a good quality approximation of H with respect to the H ∞ norm [9]. The major difficulty with balanced truncation that limits its applicability to larger systems is that it requires the solution of two Lyapunov equations involving matrices of size equal to the order of the system. With iterative approaches for solving Lyapunov equations, such as ADI methods [8], [10], [11], balanced truncation is tractable for systems with higher order.

A classical alternative is finding a best approximation with respect to the Hankel norm (HNA) [12] rather than the H ∞ norm. Approaches to compute a globally optimal solution to HNA in polynomial time are proposed [12]. Generalized HNA approaches are introduced for larger sparse descriptor systems [13]. However, a globally optimal solution to HNA is again usually not a local minimizer of ‖ H − H ( ⋅ ; S r e d ) ‖ H ∞ . Furthermore, finding a globally optimal solution to HNA is even costlier than balanced truncation. Even with the efficient use of computational linear algebra tools [14], solving HNA for systems with high order is out of reach.

The iterative rational Krylov algorithm (IRKA) [15] was introduced in order to find a reduced system of prescribed order that is locally optimal with respect to the H 2 norm defined as ‖ H ‖ H 2 = 1 2 π ∫ − ∞ ∞ trace ( H ( i ω ) * H ( i ω ) ) d ω for a system with the transfer function H. Formally, IRKA is an iterative interpolatory approach that finds a local minimizer of ‖ H − H ( ⋅ ; S r e d ) ‖ H 2 over all systems S r e d of order r . In Ref. [16], starting from the reduced-order system of order r generated by IRKA, an optimization based approach is proposed to find a locally optimal solution of ‖ H − H ( ⋅ ; S r e d ) ‖ H ∞ for single-input-single-output (SISO) systems but, denoting with e the vector of ones, with respect to particular rank-one modifications Δ_A = ɛ ee ^T, Δ_B = −ɛ e, Δ_C = −ɛ e ^T, Δ_D = ɛ of the system matrices A r e d , B r e d , C r e d , D r e d generated by IRKA over ε ∈ R . In the reported results in Ref. [16], this optimization improves the accuracy of the reduced system returned by IRKA by a factor of 2–4 with respect to the H ∞ norm. But again the eventual system is usually not a local minimizer of the objective ‖ H − H ( ⋅ ; S r e d ) ‖ H ∞ over systems S r e d of order r .

Here, we propose an approach to compute a local minimizer of ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ over all systems S r e d of order r . Nonsmooth optimization techniques, specifically bundle methods [17], [18], and the gradient sampling algorithm [19], [20], have been employed to locally optimize such nonsmooth objectives. Bundle methods are especially suitable for convex objectives, but, to our knowledge, they could not be applied as effectively to nonconvex objectives. The gradient sampling algorithm is applicable to nonsmooth and nonconvex objectives. Indeed, it is shown to converge to a Clarke stationary point under local Lipschitz continuity assumption and other mild conditions on the objective [20], Section 3]. The difficulty with the gradient sampling algorithm is that it computes the gradient at sample points around the current iterate, which may be too costly for some objectives, e.g., for the L ∞ error objective here [21]. Consequently, due to efficiency considerations, our approach to compute a local minimizer of ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ uses smooth optimization techniques, which have been employed for solving nonsmooth optimization problems [22], [23] in the last 15 years. For instance, quasi-Newton methods, most often, seem to be capable of locating locally optimal solutions [23], [24], even though some counter examples are known [23], [24], [25]. When convergence occurs with a smooth optimization technique to a nonsmooth locally optimal solution, it usually occurs slowly at best at a linear rate. As a result, as we shall see below, a direct application of them to minimize ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ is prohibitively expensive even for systems of medium order, as it requires the computation of the objective, that is the L ∞ norm, too many times. Instead, we replace H with an approximation H ̃ of small order greater than r . Rather than ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ , we minimize ‖ H ̃ − H ( ⋅ ; S r e d ) ‖ L ∞ , then update H ̃ based on the minimizer, and repeat. The approximation H ̃ is built using subspace projections, and an update involves the expansion of the projection subspaces. We show that the proposed subspace framework converges quadratically under simplicity assumptions. We also describe how the asymptotic stability constraints can be imposed on the variable S r e d when minimizing ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ in case the original system in (1) is asymptotically stable. As the system corresponding to H − H ( ⋅ ; S r e d ) is asymptotically stable when S and S r e d are asymptotically stable, the incorporation of this constraint into our approach leads to a local minimization of ‖ H − H ( ⋅ ; S r e d ) ‖ H ∞ over all asymptotically stable systems S r e d of order r , i.e., locally optimal solution of the H ∞ model reduction problem.

Smooth optimization techniques have been recently used for model order reduction of asymptotically stable structured systems [26]. It is also observed over there that a direct minimization of the H ∞ -norm error objective via smooth optimization techniques is too costly. However, the proposed approach in Ref. [26] is not based on subspace projections. Instead, the H ∞ -norm error objective is replaced by a smooth objective that involves a prescribed number of largest singular values of the difference between the transfer functions of the original and reduced system at sample points on the imaginary axis. The minimization of this smoothened objective is illustrated to yield models more accurate than those by various structure-preserving model order reduction techniques on some benchmark examples. The reduced system obtained by the method in Ref. [26] is a local minimizer of the smoothened objective, which is related to but different than the H ∞ objective. In addition to nonsmoothness, the nonconvexity of the H ∞ error objective is a challenge. In Ref. [27], a convex relaxation of the H ∞ error objective is proposed for H ∞ model reduction.

On a related note, our recent work [28] concerns the minimization of the H ∞ norm of a descriptor system with large order with respect to a modest number of parameters. At the center of that work is a subspace framework to cope with the large order of the system. It may seem plausible to look at the current work from that perspective. However, we have too many optimization parameters here. Attempting to apply the framework of [28] to attain quick convergence in the setting here is not feasible, as doing so yields projection subspaces growing rapidly (i.e., see Algorithm 2 in Ref. [28] to attain superlinear convergence). In the framework here, it suffices to add a small number of new directions independent of r into the subspaces at every iteration. In particular, if m = p, then only 4m new directions are added into the subspaces at every iteration. Moreover, we observe quick convergence, so the subspaces remain small throughout. Also in the context of parametric descriptor systems, in Refs. [29], [30], an H ∞ error objective is minimized on a discrete set of parameter values to choose interpolation points for interpolatory model order reduction techniques.

1.2 Outline

We first consider the direct minimization of ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ over systems S r e d of order r by means of smooth optimization techniques in Section 2. In this section, we indicate the optimization variables, and spell out expressions for the first derivatives of the objective with respect to these variables. As we shall see, direct optimization is too costly even for systems with moderate order, since smooth optimization techniques converge very slowly and require the evaluation of the L ∞ objective too many times. Consequently, in Section 3, we replace the transfer function H with an approximating transfer function H ̃ of small order greater than r that Hermite interpolates H at several points on the imaginary axis. Then we minimize ‖ H ̃ − H ( ⋅ ; S r e d ) ‖ L ∞ (by smooth optimization techniques), and refine H ̃ so that Hermite interpolation with H at another point on the imaginary axis is attained based on the computed minimizer of ‖ H ̃ − H ( ⋅ ; S r e d ) ‖ L ∞ . We introduce a refinement step on H ̃ so that interpolation properties can also be attained between the full objective ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ and the reduced objective ‖ H ̃ − H ( ⋅ ; S r e d ) ‖ L ∞ . Then the procedure is repeated with the refined H ̃ . In Section 4, we investigate the interpolation properties between full objective and the reduced objective. Based on these interpolation properties, we argue in Section 5 that the algorithm converges at a quadratic rate under smoothness and nondegeneracy assumptions. If the original descriptor system is asymptotically stable, it may be natural to minimize ‖ H ̃ − H ( ⋅ ; S r e d ) ‖ L ∞ subject to the asymptotic stability constraints on the reduced system S r e d . We discuss in Section 6 the incorporation of such asymptotic stability constraints on the reduced system into our approach. Section 7 is devoted to the details that need to be taken into account in a practical implementation of the proposed algorithm such as the initialization of the smooth optimization routines, and termination. A MATLAB implementation of the algorithm is publicly available. In Section 8, we report numerical results obtained with this implementation. The numerical results indicate quick convergence to a locally optimal solution, and the capability to deal with systems having orders in the range of tens of thousands.

2 Use of first-order and second-order derivative-based methods

First-order methods such as the gradient descent algorithm, and second-order methods such as quasi-Newton algorithms equipped with proper line-searches have been successfully applied to nonsmooth optimization problems in recent years [22], [23], [31], [32]. In these circumstances, if a curvature condition is employed in the line-search, this should take into consideration that the directional derivatives need not converge to zero unlike the situation for smooth optimization problems, e.g., if Wolfe conditions are imposed in the line-search, weak Wolfe conditions should be used rather than strong Wolfe conditions [23], Section 4]. Also, for termination small gradient norms should not be required. Instead, for instance, a failure to take a reasonably long step in the objective along the descent search direction may indicate convergence to a locally optimal solution [31], Section 3].

The objective to be minimized in (5) for the L ∞ model reduction problem can be expressed as

(6) F ( S r e d ) ≔ sup ω ⩾ 0 σ max H ( i ω ) − H ( i ω ; S r e d ) = sup ω ⩾ 0 σ max E ( i ω ; S r e d ) = ‖ E ( ⋅ ; S r e d ) ‖ L ∞ , E ( s ; S r e d ) ≔ [ C − C r e d ] s E − A 0 0 s E r e d − A r e d − 1 B B r e d + ( D − D r e d ) ,

where H ( ⋅ ; S r e d ) is as in (4). Assuming that the reduced system is at most index one and has semi-simple poles, by the Kronecker canonical form, there exist invertible r × r real matrices W, V such that WE ^red V is diagonal, and WA ^red V is block diagonal with 2 × 2 and 1 × 1 blocks along the diagonal. (Here, a 2 × 2 block along the diagonal of WA ^red V corresponds to a pair of complex conjugate eigenvalues, and a 1 × 1 block to a real or an infinite eigenvalue of the pencil L(s) = A ^red − sE ^red.) Consequently, the reduced system is equivalent to a system (with the same transfer function) for which A r e d , E r e d are converted into tridiagonal and diagonal forms, respectively. Hence, under index one and semi-simple pole assumptions, we can perform the minimization over tridiagonal A r e d and diagonal E r e d . Recalling the dimensions of A r e d , B r e d , C r e d , D r e d , E r e d , there are precisely 4 r − 2 + r m + p r + p m optimization variables.

The gradient descent algorithm, as well as quasi-Newton algorithms to minimize F require the gradients of F . To this end, suppose there is a unique ω _* ⩾ 0 satisfying

σ max H ( i ω * ) − H i ω * ; S r e d = F ( S r e d ) = sup ω ⩾ 0 σ max H ( i ω ) − H ( i ω ; S r e d ) ,

ensuring that F is differentiable at S r e d . Additionally, let u, v denote a consistent pair of unit left, right singular vectors corresponding to σ max H ( i ω * ) . − H i ω * ; S r e d , and let us introduce u ̃ ≔ u * C r e d i ω * E r e d − A r e d − 1 , v ̃ ≔ i ω * E r e d − A r e d − 1 B r e d v . Then, by employing the analytical formulas for the derivatives of singular value functions [33], [34], Section 3.3], the gradients of F are given by

(7) ∇ A r e d F ( S r e d ) = − diag ( R e ( u ̃ T ⊙ v ̃ ) ) − diag − 1 ( R e ( u ̃ ( 2 : r ) T ⊙ v ̃ ( 1 : r − 1 ) ) ) − diag + 1 ( R e ( u ̃ ( 1 : r − 1 ) T ⊙ v ̃ ( 2 : r ) ) ) , ∇ E r e d F ( S r e d ) = − ω * diag ( I m ( u ̃ T ⊙ v ̃ ) ) , ∇ B r e d F ( S r e d ) = − R e ( u ̃ T v T ) , ∇ C r e d F ( S r e d ) = − R e ( u ̄ v ̃ T ) , ∇ D r e d F ( S r e d ) = − R e ( u ̄ v T ) ,

where Re(⋅), Im(⋅) denote the real part, imaginary part, respectively, of their vector or matrix arguments. Also, above ⊙ denotes the Hadamard product, u ̄ is the complex conjugate of u, and the notation diag(w) represents the square diagonal matrix whose diagonal entries are formed of the entries of the vector w. The notations diag₋₁(w) and diag₊₁(w) are similar to diag(w) but with the difference that the subdiagonal and superdiagonal entries of the matrix are filled with the entries of w rather than the diagonal entries.

It is essential that a quasi-Newton method such as BFGS generates approximate Hessians that are positive definite. This is traditionally imposed by the line-searches. For instance, if BFGS is to be used to minimize F , then a line-search ensuring the satisfaction of the weak Wolfe conditions may be adopted so that the approximate Hessians remain positive definite. On the other hand, for the gradient descent algorithm to minimize F , it is sufficient to adopt a simpler line-search that guarantees only sufficient reduction in the objective, e.g., an Armijo backtracking line-search.

One difficulty with using methods such as gradient descent and BFGS to minimize F is that these algorithms converge rather slowly only at a linear rate at best. This may sound surprising especially for BFGS, which typically converges superlinearly for smooth problems. Slower convergence for BFGS is an artifact of nonsmoothness. As a result of linear convergence rates or worse, the objective F typically needs to be evaluated many times until reaching a prescribed accuracy. This may be prohibitively expensive, as it is apparent from (6) that evaluation of F involves the computation of the L ∞ norm of E ( ⋅ ; S r e d ) , which may be costly assuming the original system in (1) is large-scale.

To illustrate the slow convergence issues described in the previous paragraph, and the computational difficulties that come with it, we apply the gradient descent algorithm to the iss example from the SLICOT collection. The system associated with this example has order n = 270, and m = p = 3. We attempt to solve the L ∞ model reduction problem for r = 12 starting with the initial reduced-order model generated by balanced truncation. The errors ( F ) and the 2-norms of the gradients of the errors ( ‖ ∇ F ‖ 2 ) of the iterates of the gradient descent algorithm are reported in Table 1. It takes 37 iterations until the errors in two consecutive iterations differ by no more than 10⁻⁶ in a relative sense. The initial L ∞ -norm error 4.5 × 10⁻³ approximately (of the system obtained from balanced truncation) is reduced to 2.4 × 10⁻³ approximately after 37 iterations. The eventual reduced model obtained appears to be a local minimizer of F up to prescribed tolerances, as evidenced by the plots in Figure 1. Note however that according to the last columns in Table 1 the gradients of F do not seem to be converging to zero, which indicates that the objective is not differentiable at the local minimizer being approached. Meanwhile, the objective F is evaluated 624 times, since the line-search at each iteration requires several objective function evaluations (i.e., to be precise 8–28 evaluations per iteration) until the satisfaction of the sufficient decrease condition. This results in a total runtime of about 500 s, costly for a system of relatively small order. To conclude, direct applications of the gradient descent algorithm do not seem viable for systems of even modest order (e.g., a few thousands). On the same example, BFGS requires 239 evaluations of the objective F with a total runtime of about 240 s, so applying BFGS directly is also costly.

Table 1:

This concerns the iss example with r = 12 . The objective F ( k ) ≔ F ( A ( k ) , B ( k ) , C ( k ) , D ( k ) , E ( k ) ) , and the 2-norm of ∇ F ( k ) ≔ ∇ F ( A ( k ) , B ( k ) , C ( k ) , D ( k ) , E ( k ) ) for the iterate (A ^(k), B ^(k), C ^(k), D ^(k), E ^(k)) by the gradient descent method at the kth iteration are listed.

k	F ( k )	‖ ∇ F ( k ) ‖ 2
0	0.004470060020	1.000093488
1	0.004346739384	0.833556647
2	0.003609940202	1.000097230
3	0.003175718111	0.769359926
4	0.002975716755	1.000095596
5	0.002946113130	0.999918608
6	0.002697635041	0.844275929
7	0.002656707905	0.999952423
30	0.002415516341	0.803721909
31	0.002415479783	1.000008471
32	0.002415475189	0.803718441
33	0.002415456030	1.000008467
34	0.002415454613	0.803716708
35	0.002415444154	1.000008465
36	0.002415439844	0.803714645
37	0.002415438945	1.000008462

$Figure 1: The locally minimal reduced system generated by the gradient descent method for the iss example and r = 12 $\mathsf{r}=12$ is varied, and the error F $\mathcal{F}$ is plotted as a function of the variation. In each one of the plots (a)–(h), only the indicated entry of one of the optimal coefficients A r e d , B r e d , C r e d , D r e d , E r e d ${A}^{\mathsf{r}\mathsf{e}\mathsf{d}},{B}^{\mathsf{r}\mathsf{e}\mathsf{d}},{C}^{\mathsf{r}\mathsf{e}\mathsf{d}},{D}^{\mathsf{r}\mathsf{e}\mathsf{d}},{E}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ is varied by amounts in [−0.5, 0.5]. Zero variation corresponds to the optimal reduced system.$

Figure 1:

The locally minimal reduced system generated by the gradient descent method for the iss example and r = 12 is varied, and the error F is plotted as a function of the variation. In each one of the plots (a)–(h), only the indicated entry of one of the optimal coefficients A r e d , B r e d , C r e d , D r e d , E r e d is varied by amounts in [−0.5, 0.5]. Zero variation corresponds to the optimal reduced system.

3 A subspace framework

The computational difficulty in minimizing the objective F in (6) is due to the large order of the original system S = (A, E, B, C, D). In this section, we propose to replace this system with a system of smaller order S _q = (A _q, E _q, B _q, C _q, D _q) with the state-space representation

(8) E q x q ′ ( t ) = A q x q ( t ) + B q u ( t ) , y ( t ) = C q x q ( t ) + D u ( t ) ,

and solve the resulting L ∞ model reduction problem, that is minimize

(9) F q ( S r e d ) ≔ sup ω ⩾ 0 σ max H q ( i ω ) − H ( i ω ; S r e d ) = sup ω ⩾ 0 σ max E q ( i ω ; S r e d ) ,

where

H q ( s ) ≔ C q ( s E q − A q ) − 1 B q + D , E q ( s ; S r e d ) ≔ C q − C r e d s E q − A q 0 0 s E r e d − A r e d − 1 B q B r e d + ( D − D r e d ) .

The question that we need to address is how do we form a small system S _q = (A _q, E _q, B _q, C _q, D) that is a good representative of the original system near a local minimizer of the original L ∞ model reduction problem.

Recall how pure Newton’s method operates to minimize a function f : R ℓ → R . It approximates f with a quadratic model, and finds a local minimizer x ̃ of the quadratic model. Then, assuming f is twice differentiable at x ̃ , it refines the quadratic model so that the refined quadratic model Q satisfies f ( x ̃ ) = Q ( x ̃ ) , ∇ f ( x ̃ ) = ∇ Q ( x ̃ ) and ∇ 2 f ( x ̃ ) = ∇ 2 Q ( x ̃ ) . In the context of L ∞ model reduction, we view F q as the model function for F , even though F q is not quadratic. We minimize F q locally rather than F , and refine the small system in (8) with the hope that the objective error function F q + 1 of the refined system interpolates F and its first two derivatives at the computed minimizer of F q .

To obtain the small system in (8), we employ projection-based model reduction. In particular, let V q , W q be two subspaces of R n of equal dimension. Denoting with V _q, W _q matrices whose columns form orthonormal bases for V q , W q , the original system is approximated by

W q T ( E V q x q ′ ( t ) − A V q x q ( t ) − B u ( t ) ) = 0 , y ( t ) = C V q x q ( t ) + D u ( t ) ,

giving rise to a system of the form (8) with

(10) E q = W q T E V q , A q = W q T A V q , B q = W q T B , C q = C V q .

For the realization of the ideas in the previous paragraph, we need to be equipped with a tool that gives us the capability to interpolate H(s) and its derivatives at a prescribed point in the complex plane with those of the transfer function for the small system. This tool is introduced in the next result, which follows from [35], Theorem 1].

Theorem 1.

Let μ ∈ C be such that A − μE is invertible. Suppose

⨁ j = 0 κ R e { ( A − μ E ) − 1 E } j ( A − μ E ) − 1 B ⊆ V q , ⨁ j = 0 κ I m { ( A − μ E ) − 1 E } j ( A − μ E ) − 1 B ⊆ V q , ⨁ j = 0 κ R e C ( A − μ E ) − 1 { E ( A − μ E ) − 1 } j * ⊆ W q , ⨁ j = 0 κ I m C ( A − μ E ) − 1 { E ( A − μ E ) − 1 } j * ⊆ W q .

Then, with A _q, E _q, B _q, C _q defined as in (10), if A _q − μE _q is invertible, we have

H(μ) = H _q(μ) and H ( μ ̄ ) = H q ( μ ̄ ) ;
H ( j ) ( μ ) = H q ( j ) ( μ ) and H ( j ) ( μ ̄ ) = H q ( j ) ( μ ̄ ) for j = 1, …, 2κ + 1.

Our proposed subspace framework at iteration r first finds a minimizer of F q ( S r e d ) , say S q r e d = A q r e d , B q r e d , C q r e d , D q r e d , E q r e d . This is followed by the computation of an ω q ∈ R , ω q ⩾ 0 such that

F S q r e d = sup ω ⩾ 0 σ max H ( i ω ) − H i ω ; S q r e d = σ max H ( i ω q ) − H i ω r ; S q r e d .

Computing such an ω _q requires the large-scale L ∞ -norm computation in (6) but by replacing S r e d = ( A r e d , B r e d , C r e d , D r e d , E r e d ) with S q r e d = A q r e d , B q r e d , C q r e d , D q r e d , E q r e d . Then subspaces are expanded so that H and its first three derivatives are interpolated at iω _q by those of the transfer function for the small system. A formal description of the framework is given in Algorithm 1 below. As the subspaces V q and W q are required to be of equal dimension, the description assumes that the number of inputs and the outputs are equal, i.e., m = p. Even if it is omitted here for simplicity, it is straightforward to modify the directions V ̃ q + 1 , W ̃ q + 1 in lines 11–12 to be added to the subspaces V q , W q in order to deal with the systems for which m ≠ p; see, e.g., [36], Lemma 3.1]. The final refinement step in line 15 aims at the satisfaction of the interpolation condition F S q r e d = F q + 1 S q r e d , as well as the interpolation conditions on the derivatives of F ( S r e d ) and F q + 1 ( S r e d ) at S q r e d . This step is elaborated on in the next subsection.

Algorithm 1

(subspace framework for L ∞ model reduction).

Input: System S = (A, E, B, C, D) as in (1), the order r ∈ Z + of the reduced system sought, and an initial estimate S 0 r e d = A 0 r e d , E 0 r e d , B 0 r e d , C 0 r e d , D 0 r e d of order r for a minimizer of F as in (6).

Output: Estimate S ⋆ r e d = A ⋆ r e d , E ⋆ r e d , B ⋆ r e d , C ⋆ r e d , D ⋆ r e d for a minimizer of F as in (6).

1: Choose the initial subspaces V 0 , W 0 and orthonormal bases V ₀, W ₀ for them.

2: Form A ₀, B ₀, C ₀, E ₀ using (10), and let S ₀ = (A ₀, E ₀, B ₀, C ₀, D).

% main loop

3: for q = 0, 1, … do

4: if q ⩾ 1 then

5: S q r e d ← a minimizer of F q ( S r e d ) (for F q defined as in (9)).

6: end if

7: ω _q ← a maximizer of σ ω ; S q r e d = σ max H ( i ω ) − H i ω ; S q r e d over ω ⩾ 0.

8: if q ⩾ 1 then

9: Return if convergence has occurred with S ⋆ r e d ← S q r e d .

10: end if

% expand the subspaces to interpolate at iω _q

11: V ̃ q + 1 ← R e ( i ω q E − A ) − 1 B R e ( i ω q E − A ) − 1 E ( i ω q E − A ) − 1 B I m ( i ω q E − A ) − 1 B I m ( i ω q E − A ) − 1 E ( i ω q E − A ) − 1 B .

12: W ̃ q + 1 ← R e ( i ω q E − A ) − * C * R e ( i ω q E − A ) − * E ( i ω q E − A ) − * C * I m ( i ω q E − A ) − * C * I m ( i ω q E − A ) − * E ( i ω q E − A ) − * C * .

13: V q + 1 ← orth V q V ̃ q + 1 and W q + 1 ← orth W q W ̃ q + 1 .

% update the small system

14: Form A _q+1, B _q+1, C _q+1, E _q+1 using (10), and let S _q+1 = (A _q+1, E _q+1, B _q+1, C _q+1, D).

% refine the small system

15: Refine V _q+1, W _q+1 and S _q+1 if necessary (using Algorithm 2 below in Section 3.1).

16: end for

3.1 Refinement step

First we make a few observations regarding the relation between F S q r e d and F q + 1 S q r e d at the qth subspace iteration in Algorithm 1 right before the refinement step.

At the qth iteration of Algorithm 1 right after line 14, by Theorem 1, we have

(11) H ( i ω q ) = H q + 1 ( i ω q ) , H ( j ) ( i ω q ) = H q + 1 ( j ) ( i ω q )

for j = 1, 2, 3 under the assumptions that A − iω _q E and A _q+1 − iω _q E _q+1 are invertible. Consequently, H ( i ω q ) − H i ω q ; S q r e d and H q + 1 ( i ω q ) − H i ω q ; S q r e d are equal, and share the same set of left and right singular vectors. It immediately follows that setting

(12) σ q + 1 ( ω ; S r e d ) ≔ σ max ( H q + 1 ( i ω ) − H ( i ω ; S r e d ) ) ,

and recalling σ ( ω ; S r e d ) = σ max ( H ( i ω ) − H ( i ω ; S r e d ) ) , we have

(13) σ ω q ; S q r e d = σ q + 1 ω q ; S q r e d .

Indeed, as the singular values and vectors of H ( i ω q ) − H i ω q ; S q r e d and H q + 1 ( i ω q ) − H i ω q ; S q r e d are the same, and the first two derivatives of H ( i ω ) − H i ω ; S q r e d and H q + 1 ( i ω ) − H i ω ; S q r e d at ω = ω _q are equal due to (11), we also have

(14) d j σ d ω j ω q ; S q r e d = d j σ q + 1 d ω j ω q ; S q r e d

for j = 1, 2. Now ω _q is a global maximizer of σ ω ; S q r e d over ω implying

d σ d ω ω q ; S q r e d = 0 , d 2 σ d ω 2 ω q ; S q r e d ⩽ 0 .

Assuming that the last inequality on the second derivative above holds strictly, (14) implies ω _q is also a local maximizer of σ q + 1 ω ; S q r e d .

Regarding F S q red and F q + 1 S q red , the following relation always hold:

(15) F S q r e d = sup ω ⩾ 0 σ ω ; S q r e d = σ ω q ; S q r e d = σ q + 1 ω q ; S q r e d ⩽ sup ω ⩾ 0 σ q + 1 ω ; S q r e d = F q + 1 S q r e d ,

where the third equality is due to the interpolatory property in (13). As argued in the previous paragraph, the point ω _q is not only a global maximizer of σ ω ; S q r e d , but also generically a local maximizer of σ q + 1 ω ; S q r e d . If it happens that ω _q is also a global maximizer of σ q + 1 ω ; S q r e d beyond being a local maximizer, then the inequality in equation (15) above becomes an equality, and we have the interpolation property

(16) F S q r e d = F q + 1 S q r e d .

In the refinement step, if it happens that ω _q is merely a local maximizer of σ q + 1 ω ; S q r e d , but not a global maximizer, then we find a global maximizer ω q ( 0 ) of σ q + 1 ω ; S q r e d over ω ⩾ 0 (equivalently compute the L ∞ norm of H q + 1 ( ⋅ ) − H ⋅ ; S q r e d ). Observe that finding such a global maximizer has a small computational cost, as the orders of S _q+1 and S q r e d are small. Then, by employing Theorem 1, we expand the subspaces V q + 1 , W q + 1 further so that the interpolatory properties are attained between H _q+1(iω) after this refinement and H(iω) at ω = ω q ( 0 ) , which in turn implies interpolatory properties between σ q + 1 ω ; S q r e d and σ ω ; S q r e d at ω = ω q ( 0 ) . If ω _q after this refinement of S _q+1 is still only a local maximizer of σ q + 1 ω ; S q r e d , but not a global maximizer, then we repeat this refinement procedure of S _q+1 up until ω _q becomes a global maximizer of σ q + 1 ω ; S q r e d (in practice up to prescribed tolerances). A formal description of the refinement step is given below in Algorithm 2. For simplicity, in line 3 of Algorithm 2 it is assumed that ω _q is the unique global maximizer of σ ω ; S q r e d . More generally, all of the global maximizers of σ ω ; S q r e d can be returned in line 7 of Algorithm 1 (e.g., by employing the level-set methods [37], [38] to compute the L ∞ norm), and whether ω q ( j ) is equal to any of these global maximizers can be checked in line 3 of Algorithm 2.

Assuming σ ω ; S q red is Lipschitz continuous, σ q + 1 ω ; S q red is Lipschitz continuous with a uniform Lipschitz constant over the iterations of Algorithm 2, and the maximizer ω q ( j ) of σ q + 1 ω ; S q red over ω ⩾ 0 in line 2 of Algorithm 2 at every j is required to be in a prescribed bounded interval, the gap | ω q ( j ) − ω q | can be made less than any prescribed amount after finitely many iterations of Algorithm 2. At this point, the interpolation condition (16) is also met up to a multiple of the prescribed amount.

Algorithm 2

(refinement step).

1: for j = 0, 1, … do

2: ω q ( j ) ← a maximizer of σ q + 1 ω ; S q r e d = σ max H q + 1 ( i ω ) − H i ω ; S q r e d over ω ⩾ 0.

3: if ω q ( j ) = ω q (up to prescribed tolerances) then

4: Terminate with V _q+1, W _q+1 and S _q+1.

5: end if

% expand the subspaces to interpolate at i ω q ( j )

6: V ̃ q + 1 ← R e ( i ω q ( j ) E − A ) − 1 B R e ( i ω q ( j ) E − A ) − 1 E ( i ω q ( j ) E − A ) − 1 B I m ( i ω q ( j ) E − A ) − 1 B I m ( i ω q ( j ) E − A ) − 1 E ( i ω q ( j ) E − A ) − 1 B .

7: W ̃ q + 1 ← R e ( i ω q ( j ) E − A ) − * C * R e ( i ω q ( j ) E − A ) − * E ( i ω q ( j ) E − A ) − * C * I m ( i ω q ( j ) E − A ) − * C * I m ( i ω q ( j ) E − A ) − * E ( i ω q ( j ) E − A ) − * C * .

8: V q + 1 ← orth V q + 1 V ̃ q + 1 and W q + 1 ← orth W q + 1 W ̃ q + 1 .

% update the small system

9: Form A _q+1, B _q+1, C _q+1, E _q+1 using (10),

and let S _q+1 = (A _q+1, E _q+1, B _q+1, C _q+1, D).

10: end for

4 Interpolation properties of the subspace framework

Suppose that ω _q is a global maximizer of σ q + 1 ω ; S q r e d by the termination of the refinement step, in which case the interpolation condition (16) holds due to (15). It can be shown that, assuming F and F q + 1 are twice differentiable at S q r e d , indeed all of the first two derivatives of F and F q + 1 are equal at S q r e d as well. To this end, let x ₁, x ₂ be any two entries of the matrix variables A r e d , B r e d , C r e d , D r e d , E r e d of F and F q + 1 . Recalling

E ( i ω ; S r e d ) = H ( i ω ) − H ( i ω ; S r e d ) , E q + 1 ( i ω ; S r e d ) = H q + 1 ( i ω ) − H ( i ω ; S r e d ) ,

and by employing (11), it is apparent that

(17) E i ω q ; S q r e d = E q + 1 i ω q ; S q r e d ,

(18) ∂ E ∂ y i ω q ; S q r e d = ∂ E q + 1 ∂ y i ω q ; S q r e d ,

(19) ∂ 2 E ∂ y ∂ z i ω q ; S q r e d = ∂ 2 E q + 1 ∂ y ∂ z i ω q ; S q r e d ,

for all y, z ∈ {ω, x ₁, x ₂}. By exploiting

F ( S r e d ) = sup ω ⩾ 0 σ max ( E ( i ω ; S r e d ) ) , F q + 1 ( S r e d ) = sup ω ⩾ 0 σ max ( E q + 1 ( i ω ; S r e d ) ) ,

and using implicit differentiation

(20) ∂ F ∂ y S q r e d = ∂ ( σ max ◦ E ) ∂ y i ω q ; S q r e d = ∂ ( σ max ◦ E q + 1 ) ∂ y i ω q ; S q r e d = ∂ F q + 1 ∂ y S q r e d

for y ∈ {x ₁, x ₂}, where the second equality is due to (18), as well as (17) implying the fact that E i ω q ; S q r e d and E q + 1 i ω q ; S q r e d have the same left and right singular vectors. We remark that, for the first and third equalities in (20), we use the fact ω _q is a global maximizer of σ ω ; S q r e d = σ max ( E i ω ; S q r e d ) and σ q + 1 ω ; S q r e d = σ max ( E q + 1 i ω ; S q r e d ) , respectively.

Moreover, for any y, z ∈ {ω, x ₁, x ₂}, we have

∂ 2 ( σ max ◦ E ) ∂ y ∂ z i ω q ; S q r e d = ∂ 2 ( σ max ◦ E q + 1 ) ∂ y ∂ z i ω q ; S q r e d

due to (18) and (19) combined with the fact that E i ω q ; S q r e d , E q + 1 i ω q ; S q r e d have the same singular values and vectors due to (17). Consequently,

(21) ∂ 2 F ∂ y ∂ z S q r e d = ∂ 2 ( σ max ◦ E ) ∂ y ∂ z i ω q ; S q r e d + ∂ 2 ( σ max ◦ E ) ∂ y ∂ ω i ω q ; S q r e d × − ∂ 2 ( σ max ◦ E ) ∂ ω ∂ z i ω q ; S q r e d / ∂ 2 ( σ max ◦ E ) ∂ ω 2 i ω q ; S q r e d = ∂ 2 ( σ max ◦ E q + 1 ) ∂ y ∂ z i ω r ; S q r e d + ∂ 2 ( σ max ◦ E q + 1 ) ∂ y ∂ ω i ω q ; S q r e d × − ∂ 2 ( σ max ◦ E q + 1 ) ∂ ω ∂ z i ω q ; S q r e d / ∂ 2 ( σ max ◦ E q + 1 ) ∂ ω 2 i ω q ; S q r e d = ∂ 2 F q + 1 ∂ y ∂ z S q r e d

for any y, z ∈ {x ₁, x ₂}.

5 A quadratic convergence result regarding Algorithm 1

In this section, we establish quadratic convergence of the iterates of Algorithm 1 under additional smoothness assumptions. Here and in the next section, D r , m , p denotes the set consisting of every descriptor system S r e d of order r and index at most one with semi-simple poles, m inputs, p outputs. Throughout this section, we make use of the vectorization V ( S r e d ) of the system S r e d = ( A r e d , E r e d , B r e d , C r e d , D r e d ) defined as

(22) V ( S r e d ) ≔ vec ( A r e d ) T vec ( E r e d ) T vec ( B r e d ) T vec ( C r e d ) T vec ( D r e d ) T T ,

where vec(M) denotes the vector obtained by stacking up the columns of matrix M. The gradients ∇ F ( S r e d ) and ∇ F q + 1 ( S r e d ) are vectors formed of the first partial derivatives of F ( S r e d ) and F q + 1 ( S r e d ) based on the ordering of the variables, i.e., the entries of A r e d , E r e d , B r e d , C r e d , D r e d , in the vectorization in (22). Similarly, ∇ 2 F ( S r e d ) and ∇ 2 F q + 1 ( S r e d ) denote the Hessians of F ( S r e d ) and F q + 1 ( S r e d ) based on the ordering of the variables according to (22).

We assume that there are two consecutive iterates S q r e d and S q + 1 r e d of Algorithm 1 that are sufficiently close to a local minimizer S * r e d of F ( S r e d ) . Moreover, we silently assume throughout that the interpolation properties in (20) and (21) hold at S q r e d . We also keep the assumption stated below that guarantees that F ( S r e d ) is real analytic at S * r e d .

Assumption 2.

The maximum of σ ω ; S * r e d over all ω ⩾ 0 is attained at a unique ω _*. Furthermore, σ ω * ; S * r e d = σ max ( E i ω * ; S * r e d ) is a simple singular value of E i ω * ; S * r e d .

An assumption regarding the smoothness of F q + 1 ( S r e d ) that we rely on is given next. Recalling ‖v‖₂ for a vector v denotes the 2-norm of v, we make use of the distance ‖ S ̃ r e d − S ̂ r e d ‖ ≔ ‖ V ( S ̃ r e d ) − V ( S ̂ r e d ) ‖ 2 for systems S ̃ r e d , S ̂ r e d ∈ D r , m , p , and the ball B ( S ̂ r e d , δ ) ≔ { S ̃ r e d ∈ D r , m , p | ‖ S ̃ r e d − S ̂ r e d ‖ < δ } for a system S ̂ r e d ∈ D r , m , p and positive real number δ.We remark that part (i) of Assumption 3 guarantees that F q + 1 ( S r e d ) is real-analytic in the ball B S q r e d , δ q , and so three times differentiable in this ball, which we depend on in part (ii) of Assumption 3.

Assumption 3.

For every S r e d ∈ B S q r e d , δ q with δ q ≔ ‖ S q + 1 r e d − S q r e d ‖ the following conditions hold:
1. the maximum of σ q + 1 ( ω ; S r e d ) defined in (12) over all ω ⩾ 0 is attained at a unique point, say at ω ̃ ;
2. the singular value σ q + 1 ( ω ̃ ; S r e d ) = σ max ( E q + 1 ( i ω ̃ ; S r e d ) ) of E q + 1 ( i ω ̃ ; S r e d ) is simple.
Moreover, all of the third derivatives of F q + 1 ( S r e d ) can be bounded by quantities independent of q at all S r e d ∈ B S q r e d , δ q .

We state and prove the main result that relates ‖ S q r e d − S * r e d ‖ and ‖ S q + 1 r e d − S * r e d ‖ below.

Theorem 4.

Suppose that two consecutive iterates S q r e d and S q + 1 r e d of Algorithm 1 are sufficiently close to a local minimizer S * r e d of F ( S r e d ) . Moreover, suppose Assumptions 2, 3 hold, and ∇ 2 F S * r e d is invertible. Then there is a constant C independent of q such that

‖ S q + 1 r e d − S * r e d ‖ ⩽ C ‖ S q r e d − S * r e d ‖ 2 .

Proof.

By continuity σ ( ω ; S r e d ) = σ max ( E ( i ω ; S r e d ) ) remains simple at all ω and S r e d ∈ D r , m , p such that ω is sufficiently close to ω _* and S r e d is sufficiently close to S * r e d , where ω _* is as in Assumption 2. Thus, by the analytic implicit function theorem, there is δ ̃ > 0 such that F ( S r e d ) is real analytic at all S r e d ∈ B S * r e d , δ ̃ (see, e.g., [39], Lemma 16] for the details in the analogous context of the distance instability). By the assumption that ∇ 2 F S * r e d is invertible, and continuity of the second partial derivatives of F ( S r e d ) in B S * r e d , δ ̃ , the Hessian ∇ 2 F ( S r e d ) remains invertible in a ball B S * r e d , δ for some δ < δ ̃ . Furthermore, without loss of generality, we assume S q r e d , S q + 1 r e d are close enough to S * r e d so that S q r e d , S q + 1 r e d ∈ B S * r e d , δ , and the ball B S q r e d , δ q in Assumption 3 is contained in B S * r e d , δ . We let β ≔ min S r e d ∈ B S * r e d , δ σ min ( ∇ 2 F ( S r e d ) ) > 0 , and note that ∇ 2 F ( S r e d ) is Lipschitz continuous in B S * r e d , δ , say with the Lipschitz constant γ.

By an application of Taylor’s theorem with integral remainder, we have

(23) 0 = ∇ F S * r e d = ∇ F S q r e d + ∫ 0 1 ∇ 2 F S q r e d + t S * r e d − S q r e d V S * r e d − V S q r e d d t = ∇ F S q r e d + ∇ 2 F S q r e d V S * r e d − V S q r e d + O ‖ S * r e d − S q r e d ‖ 2 ,

where, for the third equality, we have used the Lipschitz continuity of ∇ 2 F ( S r e d ) in B S * r e d , δ . Additionally, by Taylor’s theorem with second order Lagrange remainder,

(24) 0 = ∇ F q + 1 S q + 1 r e d = ∇ F q + 1 S q r e d + ∇ 2 F q + 1 S q r e d ( V S q + 1 r e d − V S q r e d ) + O ‖ S q + 1 r e d − S q r e d ‖ 2 = ∇ F S q r e d + ∇ 2 F S q r e d ( V S q + 1 r e d − V S q r e d ) + O ‖ S q + 1 r e d − S q r e d ‖ 2 ,

where the third equality is due to ∇ F q + 1 S q r e d = ∇ F S q r e d and ∇ 2 F q + 1 S q r e d = ∇ 2 F S q r e d , that are consequences of (20) and (21).

By employing (24) in (23), we deduce

∇ 2 F S q r e d ( V S * r e d − V S q + 1 r e d ) = O ‖ S q + 1 r e d − S q r e d ‖ 2 + O ‖ S * r e d − S q r e d ‖ 2 .

Finally, noting ‖ ∇ 2 F S q r e d ( V S * r e d − V S q + 1 r e d ) ‖ ⩾ β ‖ S * r e d − S q + 1 r e d ‖ , from the last equality we obtain

‖ S q + 1 r e d − S * r e d ‖ ⩽ O ‖ S q r e d − S * r e d ‖ 2

as desired. □

6 Dealing with asymptotic stability constraints

In many applications, in addition to the objective that the reduced-order system sought S r e d = ( A r e d , E r e d , B r e d , C r e d , D r e d ) of order r is close to the original system with respect to the L ∞ norm, the system S r e d sought may also be required to be asymptotically stable; see, e.g., [40], [41], [42]. As we search through reduced-order systems of index at most one, it follows from the Weierstrass canonical form [43] of ( A r e d , E r e d ) that the asymptotic stability requirement is equivalent to having all of the poles of the system in the open left half of the complex plane, i.e., to the condition α ( A r e d , E r e d ) < 0 , where α ( A r e d , E r e d ) is the spectral abscissa of the pencil L ( s ) = A r e d − s E r e d defined by

α ( A r e d , E r e d ) ≔ max Re ( z ) | z ∈ C s.t. det ( A − z E ) = 0 .

In this setting, with F ( S r e d ) defined as in (6), rather than the unconstrained minimization of F ( S r e d ) over all descriptor systems S r e d ∈ D r , m , p , it may be desirable to solve the constrained minimization problem

(25) min F ( S r e d ) : S r e d ∈ D r , m , p s.t. α ( A r e d , E r e d ) ⩽ β

for a prescribed negative real number β, where D r , m , p denotes the set of all descriptor systems of order r and index at most one with semi-simple poles, m inputs, p outputs.

Extension of the proposed subspace framework, that is Algorithm 1, to deal with the constrained minimization problem in (25) rather than the unconstrained minimization of F ( S r e d ) is straightforward. The only difference in Algorithm 1 is in line 5, where S q r e d is no longer a minimizer of F q ( S r e d ) , but instead a minimizer of the constrained problem

(26) min F q ( S r e d ) : S r e d ∈ D r , m , p s.t. α ( A r e d , E r e d ) ⩽ β

for the reduced function F q ( S r e d ) as in (9). The problem in (26) involves only the small systems S _q as well as S r e d , and is solvable by means of Newton-method based approaches (e.g., using [31], Procedure 2]). Such a Newton-method based approach makes use of the gradient of the constraint function C ( S r e d ) ≔ α ( A r e d , E r e d ) − β , in addition to the gradient of the objective F q ( S r e d ) . Let λ be the rightmost eigenvalue of the pencil L ( s ) = A r e d − s E r e d with u and v denoting a pair of corresponding left and right eigenvectors normalized so that u * E r e d v = 1 , and assume λ is a simple eigenvalue and the unique rightmost eigenvalue of L(s), which ensure the differentiability of C ( S r e d ) . Then, by differentiating the equation A r e d v = λ E r e d v with respect to the entries of A r e d and E r e d and multiplying with u* from left, the partial derivatives of C ( S r e d ) are given by

∂ C A i j r e d ( S r e d ) = R e ( u ̄ i v j ) , ∂ C E j j r e d ( S r e d ) = − R e ( λ u ̄ j v j ) ,

where Re(z) denotes the real part of a complex scalar z ∈ C , while A i j r e d is the subdiagonal, superdiagonal or diagonal entry of the matrix variable A r e d at position (i, j), and E j j r e d is the diagonal entry of E r e d at position (j, j).

We remark that, assuming ω _q is again a global maximizer of σ q + 1 ω ; S q r e d after the refinement step, the interpolation properties

F S q r e d = F q + 1 S q r e d , ∇ F S q r e d = ∇ F q + 1 S q r e d , ∇ 2 F S q r e d = ∇ 2 F q + 1 S q r e d

still hold. Moreover, if a logarithmic barrier approach is adopted for the solution of the constrained problems, then, in essence, constrained problems are turned into unconstrained problems by lifting the constraints to the objective via the logarithmic barrier functions

L μ ( S r e d ) = F ( S r e d ) − μ log ( β − α ( A r e d , E r e d ) ) , L q μ ( S r e d ) = F q ( S r e d ) − μ log ( β − α ( A r e d , E r e d ) )

associated with problems (25), (26), respectively, where log(⋅) denotes the natural logarithm of its parameter, μ is a positive real number and represent the barrier parameter. In this case, the interpolation properties extend to the logarithmic barrier functions as well. In particular, we have

L μ S q r e d = L q + 1 μ S q r e d , ∇ L μ S q r e d = ∇ L q + 1 μ S q r e d , ∇ 2 L μ S q r e d = ∇ 2 L q + 1 μ S q r e d

for every positive real number μ.

There are alternatives to our approach above to enforce asymptotic stability on the reduced system. One possibility is to define the H ∞ error objective as ∞ for reduced systems that are not asymptotically stable, and deal with the unconstrained minimization of the resulting objective. It is then essential to start with a reduced system that is asymptotically stable, and the sufficient decrease condition in the line-search keeps the iterates away from reduced systems that are not asymptotically stable. Such an approach is adopted in Ref. [21] in the design of a controller with a minimal H ∞ objective for the closed-loop system. Another alternative is to ensure asymptotic stability by requiring that the reduced system has a certain structure, e.g., port-Hamiltonian structure following the practice in Ref. [26]. Once again this results in the unconstrained minimization of the H ∞ error objective but over structured systems depending on additional optimization parameters. We opt to depend on the constrained optimization formulation in (25) due to the availability of software such as GRANSO [31] to deal with constraints.

7 Practical issues

Here we spell out a few practical issues regarding Algorithm 1 such as how we form the initial systems S ₀, S 0 r e d , when we terminate, the details of bases for projection subspaces, solutions of reduced L ∞ -norm minimization problems, and L ∞ -norm computations.

7.1 Initialization

The initial subspaces V 0 , W 0 (in line 1 of Algorithm 1) are chosen so that H ₀, the transfer function of S ₀, interpolates H at the imaginary parts of a prescribed number of dominant poles of H. Formally, suppose H does not have any pole on the imaginary axis, and, for a prescribed ℓ, let s 1 , … , s ℓ ∈ C be the most dominant ℓ poles of H with nonnegative imaginary parts (as the poles of H appear in complex conjugate pairs such that any two complex conjugate poles have the same dominance metric). We set

V 0 = ⨁ k = 1 ℓ ⨁ j = 0 1 R e ( A − i I m ( s k ) E ) − 1 E j ( A − i I m ( s k ) E ) − 1 B ⨁ I m ( A − i I m ( s k ) E ) − 1 E j ( A − i I m ( s k ) E ) − 1 B , W 0 = ⨁ k = 1 ℓ ⨁ j = 0 1 R e C ( A − i I m ( s k ) E ) − 1 E ( A − i I m ( s k ) E ) − 1 j * ⨁ I m C ( A − i I m ( s k ) E ) − 1 E ( A − i I m ( s k ) E ) − 1 j * ,

where Im(s _k) denotes the imaginary part of s k ∈ C . Theorem 1 ensures that

H ( i I m ( s k ) ) = H 0 ( i I m ( s k ) ) , H ( j ) ( i I m ( s k ) ) = H 0 ( j ) ( i I m ( s k ) ) H ( − i I m ( s k ) ) = H 0 ( − i I m ( s k ) ) , H ( j ) ( − i I m ( s k ) ) = H 0 ( j ) ( − i I m ( s k ) )

for j = 1, 2, 3 and k = 1, …, ℓ.

Additionally, at every subspace iteration with q > 0, an initial point is needed for the solution of the minimization problem in line 5 of Algorithm 1 regardless of how it is solved, e.g., via gradient descent or BFGS. This initialization carries significance, as it affects which local minimizer of F q is to be converged. At a subspace iteration with q > 1, the minimizer is initialized with the optimal reduced system from the previous iteration, that is with S q − 1 r e d . For iteration q = 1, initial S 0 r e d of order r must be supplied to Algorithm 1. We set S 0 r e d as either

the model of order r obtained from an application of balanced truncation, or
the model of order r whose transfer function interpolates H at the imaginary parts of a prescribed number of dominant poles of H.

For the latter choice, we remark that the number of dominant poles used to form S ₀ is strictly greater than the number of dominant poles used to form this initial model S 0 r e d for the minimizer. For either choice, we make sure dim V 0 = dim W 0 > r by using sufficiently many dominant poles of H when forming S ₀ = (A ₀, E ₀, B ₀, C ₀, D).

An issue that requires attention is that S 0 r e d = A 0 r e d , E 0 r e d , B 0 r e d , C 0 r e d , D 0 r e d must be such that A 0 r e d is tridiagonal and E 0 r e d is diagonal, whereas balanced truncation or the interpolatory approach yields the system ( A ̂ , E ̂ , B ̂ , C ̂ , D ̂ ) of order r such that A ̂ and E ̂ do not necessarily have these structures. Let us suppose that the system ( A ̂ , E ̂ , B ̂ , C ̂ , D ̂ ) has semi-simple poles, and E ̂ is invertible. Then we can first compute the eigenvalues of the r × r pencil L ̂ ( s ) = A ̂ − s E ̂ , and form a block diagonal real matrix T ₂ with 2×2 and 1×1 blocks along its diagonal that have the same eigenvalues as L ̂ . The 2×2 and 1×1 blocks of T ₂ on its diagonal correspond to a conjugate pair of complex eigenvalues and real eigenvalues of L ̂ , respectively. Here we remark that T ₂ is an r × r matrix. Hence, we can compute its eigenvalue decomposition

(27) T 2 = U Λ U − 1

for a diagonal matrix Λ and invertible U at ease. We also have the eigenvalue decomposition

E ̂ − 1 A ̂ = V Λ V − 1

at hand. Note that the middle factors in eigenvalue decompositions above are the same, as T ₂ has the same eigenvalues as the pencil L ̂ , which in turn has the same eigenvalues as E ̂ − 1 A ̂ . But then

H ̂ ( s ) ≔ C ̂ ( s E ̂ − A ̂ ) − 1 B ̂ + D ̂ = C ̂ ( s I − E ̂ − 1 A ̂ ) − 1 E ̂ − 1 B ̂ + D ̂ = C ̂ ( s I − V Λ V − 1 ) − 1 E ̂ − 1 B ̂ + D ̂ = ( C ̂ V ) ( s I − Λ ) − 1 ( V − 1 E ̂ − 1 B ̂ ) + D ̂ = ( C ̂ V ) s I − U − 1 T 2 U − 1 ( V − 1 E ̂ − 1 B ̂ ) + D ̂ = ( C ̂ V U − 1 ) ( s I − T 2 ) − 1 ( U V − 1 E ̂ − 1 B ̂ ) + D ̂ .

Hence, we can use

A 0 r e d ≔ T 2 , E 0 r e d ≔ I , B 0 r e d ≔ U V − 1 E ̂ − 1 B ̂ , C 0 r e d ≔ C ̂ V U − 1 , D 0 r e d ≔ D ̂

as the matrices of the initial system S 0 r e d .

Remark 1.

The invertibility assumption above on E ̂ amounts to initializing the optimization in line 5 of Algorithm 1 at the first subspace iteration (for q = 1) with a descriptor system S 0 r e d with all finite poles, a property stronger than our general assumption that the reduced system sought is at most index one. However, this invertibility assumption initially is not restrictive. In particular, optimization is still over at most index one systems, and is permitted to yield an index one system S q r e d = A q r e d , E q r e d , B q r e d , C q r e d , D q r e d for q ⩾ 1 with singular E q r e d .

Remark 2.

When the pencil L ̂ ( s ) = A ̂ − s E ̂ has complex eigenvalues, the matrices V and U from the eigenvalue decompositions above are complex. Yet, T ₂ and U satisfying (27) can be chosen so that the product VU ⁻¹ is real, ensuring in turn that B 0 r e d and C 0 r e d are real matrices.

To see VU ⁻¹ is real for particular choices of T ₂, U, suppose a ± ib consist of a pair of complex conjugate eigenvalues of L ̂ , and suppose they appear as the ℓth, (ℓ + 1)th entries along the diagonal of Λ. Moreover, suppose u ± iv with u , v ∈ R r are eigenvectors of L ̂ corresponding to a ± ib appearing on the ℓth, (ℓ + 1)th columns of V. Then we have

(28) a b − b a = 1 / 2 1 / 2 i / 2 − i / 2 ︸ Z a + i b 0 0 a − i b 1 / 2 − i / 2 1 / 2 i / 2 ︸ Z − 1 = Z * ,

so the real matrix on the left has the eigenvalues a ± ib, and we use it on the 2×2 diagonal block of T ₂ with row and column indices ℓ, ℓ + 1. Using the decomposition in (28), in particular Z ⁻¹, we can set up U so that ( 1 / 2 ) ( e ℓ + e ℓ + 1 ) , ( i / 2 ) ( − e ℓ + e ℓ + 1 ) are on the ℓth, (ℓ + 1)th columns of U ⁻¹, where e _ℓ is the ℓth column of the r × r identity matrix. Recalling u ± iv are the eigenvectors on the ℓth, (ℓ + 1)th columns of V, the ℓth, (ℓ + 1)th columns of VU ⁻¹ are real and equal to 2 u , 2 v , respectively.

7.2 Termination

The termination in line 9 of Algorithm 1 is determined based on the values of ‖ H ( ⋅ ) − H ⋅ ; S q r e d ‖ L ∞ at two consecutive subspace iterations. The error ‖ H ( ⋅ ) − H ⋅ ; S q r e d ‖ L ∞ is readily available at the qth subspace iteration after line 7, as

‖ H ( ⋅ ) − H ⋅ ; S q r e d ‖ L ∞ = σ max H ( i ω q ) − H i ω q ; S q r e d .

To be precise, we terminate at the qth subspace iteration in line 9 if q ≥ 1 and

(29) ‖ H ( ⋅ ) − H ⋅ ; S q r e d ‖ L ∞ − ‖ H ( ⋅ ) − H ⋅ ; S q − 1 r e d ‖ L ∞ ⩽ t o l ‖ H ( ⋅ ) − H ⋅ ; S q r e d ‖ L ∞

for a prescribed tolerance t o l .

The termination condition for the minimizer to solve the minimization problem in line 5 of Algorithm 1 also requires some care. Recall that the objective F q here is nonsmooth, and, as a result, the norms of the gradients of F q at the iterates generated by the minimizer do not have to converge to zero. Instead, the minimizer is terminated if the line-search fails (to return a point that causes sufficient decrease), or the decrease in F q at two consecutive iterates is less than ε ⋅ t o l in a relative sense, where we use the same tolerance tol as in (29) and ɛ is a real number in (0,0.5).

As for the termination condition of the refinement step (i.e., the condition in line 3 of Algorithm 2) employed in practice, we rely on

ω q ( j ) − ω q ⩽ t o l ω q ,

where we again use the same tolerance t o l as in (29).

7.3 Orthonormalization of the bases for the subspaces

Keeping the bases for the subspaces V q , W q (i.e., the columns of V _q, W _q) orthonormal improves the robustness of the algorithm against the rounding errors. For instance, then the system matrices A _q, B _q, C _q, E _q can be formed more accurately in the presence of rounding errors.

This orthonormality property of the bases is attained in line 13 of Algorithm 1, as well as line 8 of Algorithm 2. In line 13 of Algorithm 1, V _q and W _q are already orthonormal bases for V q and W q . The expansion directions V ̃ q + 1 , W ̃ q + 1 to be included in the subspaces to obtain the expanded subspaces V q + 1 , W q + 1 have to be orthonormalized with respect to the existing orthonormal bases V _q, W _q. This is achieved in practice by executing

(30) V ̃ q + 1 ← V ̃ q + 1 − V q V q T V ̃ q + 1 , W ̃ q + 1 ← W ̃ q + 1 − W q W q T W ̃ q + 1 .

Near convergence the interpolation points iω _q of Algorithm 1 start not changing by much in consecutive iterations. This results in the new expansion directions V ̃ q + 1 , W ̃ q + 1 that are nearly contained in the existing subspaces V q , W q . In this case, the orthonormalization in (30) of V ̃ q + 1 , W ̃ q + 1 with respect to existing V _q, W _q suffers from cancellation type rounding errors. Applying the orthonormalization in (30) several times improves the accuracy, and result in directions V ̃ q + 1 , W ̃ q + 1 that are better orthonormalized against V _q, W _q. In practice, we apply (30) a few times, e.g., 3–4 times, then orthonormalize the resulting V ̃ q + 1 , W ̃ q + 1 via the Gram–Schmidt procedure, and take V q + 1 = V q V ̃ q + 1 , W q + 1 = W q W ̃ q + 1 as the matrices whose columns form orthonormal bases for V q + 1 , W q + 1 . In line 8 of Algorithm 2, the columns of V _q+1 and W _q+1 are similarly orthonormalized. We ultimately use V _q+1, W _q+1 when forming the system S _q+1 in line 14 of Algorithm 1, and in line 9 of Algorithm 2.

7.4 Solution of the reduced L ∞ -norm minimization problem

We use BFGS to minimize the reduced L ∞ objective F q ( S r e d ) in line 5 of Algorithm 1. To be precise, we have explored two alternatives here; a small variation of a MATLAB implementation of a line-search BFGS due to Michael L. Overton making use of weak Wolfe conditions, and GRANSO [31]. The former is only meant for unconstrained problems when we do not impose the asymptotic stability constraints described in Section 7, whereas the asymptotic stability constraints in Section 6 are also incorporated into this optimization when we use GRANSO.

7.5 Computation of the L ∞ norm

Algorithm 1 in line 7 requires the computation of the L ∞ norm of a system whose order is the sum of the order of the original system S and r . If the original system does not have large order, we use the built-in norm command in MATLAB for these L ∞ -norm computations. Otherwise, if the original system has large order, we use the subspace framework introduced in Ref. [44] for the large-scale L ∞ -norm computations. Additionally, the minimization of the reduced L ∞ objective in line 5 via BFGS requires small-scale L ∞ -norm computations, which we carry out using the getPeakGain command in MATLAB. The use of the routines in the ROSTAPACK package [45], [46], especially the routine normTfMaxPeak, instead of the built-in MATLAB routines norm, getPeakGain may possibly improve the efficiency.

8 Numerical results

In this section, we report the results of numerical experiments performed with a MATLAB implementation of Algorithm 1 taking also the practical issues indicated in the previous section into account. Sections 8.1 and 8.2 concern experiments on rather smaller order systems, Section 8.3 concerns experiments on a system of medium order, while the results of experiments on several large-order systems are reported in Section 8.4. All of the experiments are conducted in MATLAB 2020b on an iMac with Mac OS 12.1 operating system, Intel^® Core™ i5-9600K CPU and 32GB RAM.

The numerical experiments are performed using the variation of the MATLAB implementation of BFGS due to Michael L. Overton for the solution of the reduced L ∞ -norm minimization problems. Hence, the asymptotic stability constraints are not imposed. The original systems in all of the experiments in Sections 8.1–8.3 concerning small- to medium-order systems are asymptotically stable, and the computed optimal reduced systems in these examples also turn out to be asymptotically stable. The tolerance tol for termination (discussed in Section 7.2) is set equal to 10⁻⁸ in Sections 8.1–8.3, and 10⁻⁶ in Section 8.4. As noted in Section 7.1, a prescribed number of most dominant poles with nonnegative imaginary parts are used to form S ₀, and in Section 8.4 to form S 0 red as well. Throughout this section, the shorthand NIPs stands for “nonnegative imaginary parts”.

For comparison or initialization purposes, some of the numerical experiments involve the application of balanced truncation for which we use the MATLAB toolbox MORLAB-5.0 [47], in particular the routine ml_ct_dss_bt or ml_ct_ss_bt depending on whether the system at hand is a descriptor system or more specifically a linear time-invariant system. Moreover, the Hankel singular values computed for smaller systems for comparison purposes are retrieved by calling the built-in routine hankelsv in MATLAB. As the first three subsections concern the model reduction of relatively smaller systems, the built-in routine norm is employed in line 7 of Algorithm 1 for L ∞ -norm computations, while the subspace framework in Ref. [44] is employed for this purpose in Section 8.4 that concerns the model reduction of descriptor systems with large order.

8.1 ISS example

We start with the iss example of order n = 270 that is also considered when optimizing the objective F directly in Section 2. As before, we seek the nearest reduced descriptor system of order r = 12 with respect to the L ∞ norm. An application of Algorithm 1 with the initial estimate S 0 r e d produced by balanced truncation terminates when q = 6. The error F S ⋆ r e d = 0.0022516 of the estimate S ⋆ r e d returned is nearly half of the error F S 0 r e d = 0.0044701 of the initial estimate S 0 r e d . The optimal reduced system S ⋆ r e d is indeed slightly better than the estimate returned by the direct optimization at which the objective F takes the value 0.0024154. Yet the total runtime is about 66 s, much shorter than 500 s, roughly the time required by the direct optimization. The local optimality of S ⋆ r e d = A ⋆ r e d , B ⋆ r e d , C ⋆ r e d , D ⋆ r e d , E ⋆ r e d is supported by Figure 2, which indicates an increase in the objective F if one of the entries of one of A ⋆ r e d , B ⋆ r e d , C ⋆ r e d , D ⋆ r e d , and E ⋆ r e d is modified. Moreover, the Hankel singular value σ r + 1 for this example, a lower bound for the minimal error possible for any system of order r , is 0.0022353 smaller than F S ⋆ r e d = 0.0022516 only by a slim margin, so S ⋆ r e d must be nearly optimal globally as well.

$Figure 2: The figure is similar to Figure 1 and concerns the iss example with r = 12 $\mathsf{r}=12$ . Only now the minimization of F $\mathcal{F}$ is performed using the subspace framework outlined in Algorithm 1. Specifically, each plot depicts F $\mathcal{F}$ as a function of the variation of one of the entries of one of A r e d ${A}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , B r e d ${B}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , C r e d ${C}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , D r e d ${D}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , E r e d ${E}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ . Zero variation corresponds to the optimal reduced system by Algorithm 1.$

Figure 2:

The figure is similar to Figure 1 and concerns the iss example with r = 12 . Only now the minimization of F is performed using the subspace framework outlined in Algorithm 1. Specifically, each plot depicts F as a function of the variation of one of the entries of one of A r e d , B r e d , C r e d , D r e d , E r e d . Zero variation corresponds to the optimal reduced system by Algorithm 1.

The largest singular values of the errors σ max ( H ( i ω ) − H i ω ; S 0 r e d ) and σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) of the initial estimate S 0 r e d and the optimal estimate S ⋆ r e d are plotted as functions of ω in Figure 3. The singular value error function σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) for the optimal S ⋆ r e d is extremely flat, as indeed σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) ∈ [ 2.02 ⋅ 1 0 − 3 , 2.26 ⋅ 1 0 − 3 ] at all ω. Furthermore, the error σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) is maximized at four distinct points marked by the circles on the right-hand plot. This indicates that the objective F is not differentiable at the computed optimizer S ⋆ r e d .

$Figure 3: The plots of σ max ( H ( i ω ) − H i ω ; S 0 r e d ) ${\sigma }_{\mathrm{max}}\left(H\left(\mathrm{i}\omega \right)-H\left(\mathrm{i}\omega ;{S}_{0}^{\mathsf{r}\mathsf{e}\mathsf{d}}\right)\right)$ (left) and σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) ${\sigma }_{\mathrm{max}}\left(H\left(\mathrm{i}\omega \right)-H\left(\mathrm{i}\omega ;{S}_{\star }^{\mathsf{r}\mathsf{e}\mathsf{d}}\right)\right)$ (right) as functions of ω for the iss example with r = 12 $\mathsf{r}=12$ , where S 0 r e d ${S}_{0}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ is the initial estimate, and S ⋆ r e d ${S}_{\star }^{\mathsf{r}\mathsf{e}\mathsf{d}}$ is the optimal estimate computed by Algorithm 1. In each plot, the circles mark the points where the largest singular value function attains the largest value.$

Figure 3:

The plots of σ max ( H ( i ω ) − H i ω ; S 0 r e d ) (left) and σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) (right) as functions of ω for the iss example with r = 12 , where S 0 r e d is the initial estimate, and S ⋆ r e d is the optimal estimate computed by Algorithm 1. In each plot, the circles mark the points where the largest singular value function attains the largest value.

Information about the progress of Algorithm 1 is given in Table 2. We start with the reduced system S ₀ of order 36 that interpolates the original system S of order 270 at three points on the imaginary axis, namely the imaginary parts of the most dominant three poles of S with NIPs. At every iteration, if no refinement step is performed, the order of the reduced system S _q increases by 4m = 12. Additionally, each refinement step results in an increase of 4m = 12 in the order of S _q. We observe in the second column that the error F S q r e d at the minimizer S q r e d of the reduced objective F q decays rapidly with respect to q. The total number of objective function evaluations is 492 (i.e., the sum of the function evaluations in the fifth column), however the L ∞ objective to be minimized involves the reduced system S _q rather than the full system S. For instance, the number of L ∞ -norm computations performed are 105, 155, 123 at iterations q = 1, 2, 3. Yet, these L ∞ -norm computations involve the reduced system S _q of order 72, 84, 120 for q = 1, 2, 3. Observe that the number of BFGS iterations eventually decrease at the later iterations, as the computed optimal S q r e d used as the initial estimate when minimizing F q + 1 becomes stationary, i.e., as the computed minimizer S q r e d of F q is also close to a minimizer of F q + 1 . Refinement steps are needed only at the initial iteration when q = 0 and when q = 2. No refinement step turns out to be necessary at the later iterations. This is a generic pattern that we observe in vast majority of examples we have experimented on.

Table 2:

The iterates and information about the progress of Algorithm 1 on the iss example with r = 12 . The columns of red order, # BFGS iter, # fun evals, and # refine list the order of the system S _q, number of BFGS iterations, number of objective function evaluations performed by BFGS, and number of refinement steps performed at the qth iteration.

q	F S q r e d	Red order	# BFGS iter	# fun evals	# refine
0	0.004470060020	36	—	—	2
1	0.003517059977	72	38	105	0
2	0.002259657400	84	45	155	2
3	0.002252138011	120	35	123	0
4	0.002251613679	132	11	48	0
5	0.002251609387	144	2	29	0
6	0.002251607779	156	2	32	—

8.2 CD player model

Our next example is the CD player model which is available in the SLICOT library. The model is a linear-time invariant system of order n = 120 and with m = 2 inputs and p = 2 outputs. The details of the model can be found in Ref. [48], and the references therein. Our primary purpose here is to compare on this example Algorithm 1 with the approach in Ref. [16] for H ∞ model reduction based on rank-one modifications of the system matrices. As the approach in Ref. [16] is for SISO systems, the results are reported over there for this example but with only the second input and the first output. We follow the same practice here when applying our approach. The initial estimate S 0 r e d for a minimizer for Algorithm 1 is constructed using balanced truncation. Moreover, the initial reduced system S ₀ is of order 12, and is constructed so that it interpolates the full system S at the imaginary parts of its most dominant three poles with NIPs.

The reduced systems S ⋆ r e d of order r = 2,4,6,8,10 are computed using Algorithm 1. Table 3 lists the relative errors ‖ H − H ⋅ ; S ⋆ r e d ‖ L ∞ / ‖ H ‖ L ∞ for the reduced system S ⋆ r e d computed by various approaches. In particular, the columns of IHA, MBT, HNA are the reported results in Ref. [16], Table 4] by using the approach introduced in Ref. [16] initialized with the model returned by, respectively, IRKA, balanced truncation, the best Hankel norm approximation. Moreover, the columns of BT and Lower Bnd correspond to the relative error of the reduced model by balanced truncation, and the theoretical lower bound σ r + 1 / ‖ H ‖ L ∞ for any reduced system of order r for the relative error, where σ r + 1 is the ( r + 1 ) st largest Hankel singular value of the system. As can be seen in Table 3, our approach produces reduced systems with smaller errors compared to those produced by other approaches in all cases. The reduced systems produced by Algorithm 1 does not seem far away from global optimality either, as their errors are slightly greater than the theoretical lower bounds in terms of the Hankel singular values in the last column.

Table 3:

This table concerns the “CD player model”. Relative errors ‖ H − H ⋅ ; S ⋆ r e d ‖ L ∞ / ‖ H ‖ L ∞ are listed for the optimal estimate S ⋆ r e d computed by various methods for finding reduced systems of order r = 2,4,6,8,10 , as well as the lower bound σ r + 1 / ‖ H ‖ L ∞ .

r	Alg. 1	IHA	MBT	HNA	BT	Lower Bnd
2	3.12 × 10⁻¹	3.66 × 10⁻¹	3.68 × 10⁻¹	3.35 × 10⁻¹	3.69 × 10⁻¹	1.95 × 10⁻¹
4	1.82 × 10⁻²	2.14 × 10⁻²	2.25 × 10⁻²	2.00 × 10⁻²	2.25 × 10⁻²	1.13 × 10⁻²
6	9.44 × 10⁻³	1.04 × 10⁻²	1.19 × 10⁻²	1.23 × 10⁻²	1.23 × 10⁻²	6.79 × 10⁻³
8	4.18 × 10⁻³	4.85 × 10⁻³	6.40 × 10⁻³	5.99 × 10⁻³	6.41 × 10⁻³	3.20 × 10⁻³
10	7.45 × 10⁻⁴	8.99 × 10⁻⁴	1.24 × 10⁻³	1.08 × 10⁻³	1.32 × 10⁻³	5.86 × 10⁻⁴

We give some details of Algorithm 1 applied to find a reduced system of order r = 8 in Figures 4 and 5, as well as in Table 4. In particular, Figure 4 supports that the reduced system S ⋆ r e d by Algorithm 1 is locally optimal, i.e., small variations in the entries of the system matrices cause increase in the L ∞ error objective. Figure 5 displays the error σ max ( H ( i ω ) − H i ω ; S 0 r e d ) of the initial model, and the error σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) of the model by Algorithm 1 as functions of ω. Once again the error function for the optimal model S ⋆ r e d is flatter, even if it is not as pronounced as for the iss example, compared to that for the initial model S 0 r e d . The error function σ max ( H ( i ω ) − H i ω ; S ⋆ r e d ) for the optimal model attains its maximum at five different ω values, which implies that the objective F is not smooth at S ⋆ r e d . As displayed in Table 4, the convergence occurs again quite rapidly; indeed four iterations are sufficient to reach prescribed accuracy and terminate. At each iteration, the order of the reduced system increases by 4m = 4. Additionally, the refinement step performed in the initial iteration causes also an increase of 4m = 4 in the order of the reduced system. Larger number of BFGS iterations are needed at iterations with q = 1, 2, when the objective involves reduced systems of order 20, 24, respectively. The total runtime is about 15 s, and the relative error at termination is F S ⋆ r e d / ‖ H ‖ L ∞ = ‖ H − H ⋅ ; S ⋆ r e d ‖ L ∞ / ‖ H ‖ L ∞ = ( 2.87 × 1 0 − 1 ) / ( 6.87 × 1 0 1 ) = 4.18 × 1 0 − 3 .

$Figure 4: The figure is analogous to Figure 1, but concerns the “CD player model” with r = 8 $\mathsf{r}=8$ . Each plot depicts F $\mathcal{F}$ as a function of the variation of one of the entries of one of A r e d ${A}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , B r e d ${B}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , C r e d ${C}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , D r e d ${D}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ , E r e d ${E}^{\mathsf{r}\mathsf{e}\mathsf{d}}$ . Zero variation corresponds to the optimal reduced system by Algorithm 1.$

Figure 4:

The figure is analogous to Figure 1, but concerns the “CD player model” with r = 8 . Each plot depicts F as a function of the variation of one of the entries of one of A r e d , B r e d , C r e d , D r e d , E r e d . Zero variation corresponds to the optimal reduced system by Algorithm 1.

$Figure 5: The plots illustrate the errors of the initial, optimal models by Algorithm 1 for the “CD player model” with r = 8 $\mathsf{r}=8$ , and are analogous to those in Figure 3.$

Figure 5:

The plots illustrate the errors of the initial, optimal models by Algorithm 1 for the “CD player model” with r = 8 , and are analogous to those in Figure 3.

Table 4:

The iterates and information about the progress of Algorithm 1 on the “CD player model” for finding a reduced system or order r = 8 . The columns represent quantities as in Table 2.

q	F S q r e d	Red order	# BFGS iter	# fun evals	# refine
0	0.439972058849	12	—	—	1
1	0.291281639337	20	565	1,479	0
2	0.287107598817	24	134	387	0
3	0.287107598817	28	1	32	—

8.3 FOM model

We next report numerical results on the FOM example available in the SLICOT library. The FOM example is a linear time-invariant system of order n = 1,006, and with m = p = 1. The details are given in Ref. [49], Example 3]. Here, we are mainly interested in investigating the quality of the estimates for optimal reduced systems produced by Algorithm 1. To this end, we compare the errors of the reduced systems by Algorithm 1 with those of balanced truncation, as well as the theoretical lower bounds for the errors in terms of Hankel singular values for varying choices of prescribed order r of the reduced system sought. As in Sections 8.1 and 8.2, we set the initial estimate S 0 r e d for a minimizer as the system produced by balanced truncation, and the initial reduced system S ₀ is always of order 12 and interpolates the full system S at the imaginary parts of its most dominant three poles with NIPs.

In Figure 6, the L ∞ error ‖ H − H ⋅ ; S ⋆ r e d ‖ L ∞ of the optimal reduced system S ⋆ r e d by Algorithm 1 and balanced truncation are plotted as functions of the prescribed order r of the reduced system sought. Included in the figure is also the plot of the Hankel singular value σ r + 1 , a theoretical lower bound for the L ∞ error ‖ H − H ( ⋅ ; S r e d ) ‖ L ∞ of any system S r e d of order r . Especially when r ∈ [ 2,6 ] , the errors of the reduced systems by Algorithm 1 are quite close to the theoretical lower bound. Indeed, the errors of the reduced systems by Algorithm 1 usually differ by the theoretical lower bound by a factor of two at most. Moreover, in most of cases the errors of the reduced systems by Algorithm 1 is significantly less than the errors of the reduced systems by balanced truncation.

$Figure 6: Errors of the reduced systems of order r ∈ [ 2,11 ] $\mathsf{r}\in \left[2,11\right]$ produced by Algorithm 1 and balanced truncation (BT), as well as the ( r + 1 ) $\left(\mathsf{r}+1\right)$ st largest Hankel singular value σ r + 1 ${\sigma }_{\mathsf{r}+1}$ for the FOM example.$

Figure 6:

Errors of the reduced systems of order r ∈ [ 2,11 ] produced by Algorithm 1 and balanced truncation (BT), as well as the ( r + 1 ) st largest Hankel singular value σ r + 1 for the FOM example.

8.4 Systems with large order

Finally, we report results on systems with large order arising from modeling of power plants due to Rommes and his colleagues. All of these large-scale examples are available on the website of Rommes.^[1]

Unlike the previous three subsections, we form the initial estimate for the minimizer S 0 r e d using the dominant poles of the system. For each system, we first compute the ten most dominant poles of the system with NIPs using the approach in Ref. [36], in particular its implementation publicly available at https://zenodo.org/record/5103430. Then S 0 r e d of order r is constructed so as to interpolate the full system S at the imaginary parts of its r / ( 4 m ) most dominant poles with NIPs. Similarly, the initial reduced system S ₀ is constructed such that it interpolates S at the imaginary parts of its ℓ most dominant poles with NIPs, where ℓ = 7 if the system is single-input-single-output (with m = 1), and ℓ = 3 if the system is multiple-input-multiple-output (with m > 1). The order of the resulting reduced system S ₀ is 4mℓ. In all of the examples, the prescribed order r is such that r < 4 m ℓ , that is the order of S ₀ is greater than the prescribed order r .

Even Algorithm 1 requires the computation of the L ∞ norm of systems of order n + r a few times (usually not more than 5–6 times in our experiments) in line 7, where n is the large order of the system. The classical level-set approaches [37], [38] for L ∞ -norm computation and their implementations in MATLAB are usually no more applicable, or when they are applicable, they take an excessive amount of time. Instead, we employ the interpolatory subspace framework in Ref. [44] for these large-scale L ∞ -norm computations, that is for maximizing σ max H ( i ω ) − H i ω ; S q r e d over ω at the qth iteration. As the approach in Ref. [44] is locally convergent, whether the initial interpolation points are sufficiently close to global maximizers of σ max H ( i ω ) − H i ω ; S q r e d plays a large role in converging to a global maximizer. We choose the initial interpolation points as the union of the imaginary parts of the ten most dominant poles with NIPs, and 15 equally-spaced points on the interval [ − 0.1 , 2 M ] with M denoting the largest of the imaginary parts of the ten most dominant poles with NIPs.

The absolute and relative errors of the computed reduced systems of order r along with the total runtimes are reported in Table 5. For systems S20PI_n, S40PI_n, M40PI_n of order n = 1,182 or n = 2,182, we have also computed reduced systems of order r by balanced truncation. In these examples, the errors of the reduced systems by Algorithm 1 are smaller than those of the reduced systems by balanced truncation. The implementation of balanced truncation in MORLAB-5.0 that we rely on is based on dense linear algebra routines (unlike our approach which benefits from sparsity), so we do not report the runtimes for balanced truncation. As evident from Table 5, Algorithm 1 is able to deal with systems of order ten thousands in a couple of minutes in the worst case. Most of the runtime of Algorithm 1 is usually taken by BFGS for solving reduced L ∞ -norm minimization problems in line 5 involving small systems. In the end, rather than performing quite a few large-scale L ∞ -norm computations, we end up performing quite a few small-scale L ∞ -norm computations, and only a few large-scale L ∞ -norm computations. This results in an approach not only computationally feasible but also more reliable, as small-scale L ∞ -norm computations can be fulfilled accurately, efficiently and reliably using the level-set methods [37], [38] without worrying about local convergence.

Table 5:

The absolute errors ‖ H − H ⋅ ; S ⋆ r e d ‖ L ∞ (error) and relative errors ‖ H − H ⋅ ; S ⋆ r e d ‖ L ∞ / ‖ H ‖ L ∞ (rel error) for systems of large order, where S ⋆ r e d is the optimal reduced system by either Algorithm 1 or balanced truncation (BT). Total runtimes for Algorithm 1 in seconds are also listed in the last column.

Example	n, m = p	r	Approach	Error	Rel error	Time
S20PI_n	1,182, 1	12	Alg. 1	7.67 × 10⁻¹	2.23 × 10⁻¹	19.8
S20PI_n	1,182, 1	16	Alg. 1	7.66 × 10⁻¹	2.22 × 10⁻¹	36.2
S20PI_n	1,182, 1	12	BT	1.76 × 10⁰	5.11 × 10⁻¹	—
S20PI_n	1,182, 1	16	BT	1.32 × 10⁰	3.84 × 10⁻¹	—
S40PI_n	2,182, 1	12	Alg. 1	9.30 × 10⁻¹	2.78 × 10⁻¹	48.1
S40PI_n	2,182, 1	16	Alg. 1	6.71 × 10⁻¹	2.00 × 10⁻¹	38.1
S40PI_n	2,182, 1	32	BT	1.75 × 10⁰	5.23 × 10⁻¹	—
M40PI_n	2,182, 3	12	Alg. 1	1.99 × 10⁰	5.22 × 10⁻¹	52.2
M40PI_n	2,182, 3	24	Alg. 1	1.70 × 10⁰	4.45 × 10⁻¹	117.1
M40PI_n	2,182, 3	36	BT	3.07 × 10⁰	8.03 × 10⁻¹	—
ww_vref_6405	13,251, 1	12	Alg. 1	5.80 × 10⁻⁴	2.04 × 10⁻¹	9.2
ww_vref_6405	13,251, 1	16	Alg. 1	4.19 × 10⁻⁴	1.48 × 10⁻¹	15.1
xingo_afonso	13,250, 1	12	Alg. 1	3.55 × 10⁻²	8.74 × 10⁻³	14.4
xingo_afonso	13,250, 1	16	Alg. 1	3.56 × 10⁻²	8.77 × 10⁻³	14.0
xingo_afonso	13,250, 1	20	Alg. 1	1.13 × 10⁻²	2.79 × 10⁻³	26.2
bips07_1998	15,066, 4	16	Alg. 1	1.24 × 10¹	6.30 × 10⁻²	127.6
bips07_1998	15,066, 4	32	Alg. 1	9.67 × 10⁰	4.91 × 10⁻²	219.8
bips07_3078	21,228, 4	16	Alg. 1	1.27 × 10¹	6.06 × 10⁻²	200.8
bips07_3078	21,228, 4	32	Alg. 1	1.00 × 10¹	4.78 × 10⁻²	274.1

9 Software

A MATLAB implementation of Algorithm 1 is publicly available at https://zenodo.org/record/8344591. The numerical results reported in the previous section are obtained with this implementation. Scripts are included to reproduce the results for the CD player model in Section 8.2, and the xingo_afonso, bips07_1998 examples in Section 8.4. The results for other benchmark examples can be obtained similarly.

10 Conclusions

We have proposed an approach to find a locally optimal solution of the L ∞ -norm model reduction problem. Our approach is based on the usage of smooth optimization techniques such as the gradient descent method and BFGS. A direct application of such smooth optimization techniques for the L ∞ -norm model reduction problem does not seem suitable even for systems with modest order, as smooth optimization techniques converge very slowly and require the evaluation of the costly L ∞ -norm objective too many times. Hence, our approach replaces the original system of modest or large order with a system of small order, and solves the resulting reduced L ∞ -norm minimization problem by means of the smooth optimization techniques. Then it refines and increases slightly the order of the reduced system based on the minimizer of this reduced minimization problem. This refinement is performed with an eye to interpolation between the full and reduced L ∞ objectives. Under smoothness assumptions, admittedly strong in this context, we have given formal arguments for the quick convergence of the approach. We have also described how asymptotic stability constraints on the small system of prescribed order sought can be incorporated into the approach. The numerical experiments on a variety of real benchmark examples indicate that our approach retrieves indeed a locally optimal solution of the L ∞ -norm model reduction problem in practice. Moreover, on some small benchmark examples, we have obtained reduced systems not far away for from being optimal globally according to the theoretical lower bounds in terms of Hankel singular values. Experiments on large benchmark examples illustrate that the approach is usually suitable for systems of order a few ten thousands.

The quality of the converged locally optimal solution depends on the initial guess for the optimal reduced system. To generate the initial guess, we have employed two different strategies based on balanced truncation and dominant poles. The first of these strategies may not be applicable unless the original system is asymptotically stable. On the other hand, there is no asymptotic stability requirement for the second strategy. However, a strategy generating a good initial guess is certainly worth further research. The proposed approach typically requires a few large-scale L ∞ -norm computations. Performing these L ∞ -norm computations accurately, especially without getting stagnated at a local maximizer that is not optimal globally, is crucial for the reliability of the proposed approach. We have employed the interpolatory subspace framework in Ref. [44] with the initial interpolation points chosen based on the dominant poles for these large-scale L ∞ -norm computations. This approach usually seems to work well in practice for large-scale L ∞ -norm computations. Still, we hope to explore further a good initial interpolation selection strategy for [44] so that it converges globally, leading to the correct L ∞ norm with very high probability. Other efficient and accurate candidates for large-scale L ∞ -norm computation are worth studying. In Ref. [16], the original system is replaced by a smaller order system obtained from the Loewner framework [50] to reduce the burden of large-scale L ∞ -norm computations. We have not attempted here to incorporate the Loewner framework into our approach. As a future work, our approach can possibly benefit from the Loewner framework; for instance, the initial reduced system replacing the full system can perhaps be obtained using the Loewner framework. Our quick convergence result for the proposed approach is under strong smoothness assumptions. Investigating the order of convergence of the approach in the likely nonsmooth setting (i.e., when the L ∞ objective at the converged minimizer is nonsmooth) is a possible direction for future research. Last but not the least, the convergence of smooth optimization techniques such as BFGS on a variety of nonsmooth optimization problems is observed empirically, yet these empirical observations could not be supported by a general convergence theory so far. Analyzing and understanding rigorously the convergence of BFGS and other smooth optimization techniques when the objective is nonsmooth at the optimizers sought are important open problems.

Corresponding author: Emre Mengi, Department of Mathematics, Koç University, Rumeli Feneri Yolu 34450, Sarıyer, Istanbul, Türkiye, E-mail: emengi@ku.edu.tr

Acknowledgment

The author is grateful to two anonymous referees for their invaluable feedback on the initial version of this manuscript.

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: The author has accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors state no conflict of interest.
Research funding: None declared.
Data availability: The raw data and codes associated with the current study are publicly available at https://zenodo.org/record/8344591 and https://sites.google.com/site/rommes/software.

References

[1] K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations, 2nd ed. Philadelphia, PA, SIAM, 1996.10.1137/1.9781611971224Search in Google Scholar

[2] Y. Feng and M. Yagoubi, Robust Control of Linear Descriptor Systems, Singapore, Springer, 2017.10.1007/978-981-10-3677-4Search in Google Scholar

[3] P. Kunkel and V. Mehrmann, Differential-Algebraic Equations: Analysis and Numerical Solution, Zurich, Switzerland, EMS Press, 2006.10.4171/017Search in Google Scholar

[4] R. Riaza, Differential Algebraic Systems. Analytical Aspects and Circuit Applications. Notes on Mathematics and its Applications, Singapore, World Scientific, 2008.10.1142/9789812791818Search in Google Scholar

[5] A. C. Antoulas, Approximation of Large-Scale Dynamical Systems, Volume 6 of Adv. Des. Control, Philadelphia, PA, SIAM Publications, 2005.10.1137/1.9780898718713Search in Google Scholar

[6] V. Mehrmann and T. Stykel, “Balanced truncation model reduction for large-scale systems in descriptor form,” in Dimension Reduction of Large-Scale Systems, P. Benner, D. C. Sorensen, and V. Mehrmann, Eds., Berlin, Springer, vol. 45, 2005, pp. 83–115.10.1007/3-540-27909-1_3Search in Google Scholar

[7] B. Moore, “Principal component analysis in linear systems: controllability, observability, and model reduction,” IEEE Trans. Autom. Control, vol. 26, no. 1, pp. 17–32, 1981. https://doi.org/10.1109/tac.1981.1102568.Search in Google Scholar

[8] T. Stykel, “Gramian-based model reduction for descriptor systems,” Math. Contol Signals Syst., vol. 16, no. 4, pp. 297–319, 2004. https://doi.org/10.1007/s00498-004-0141-4.Search in Google Scholar

[9] S. Gugercin and A. C. Antoulas, “A survey of model reduction by balanced truncation and some new results,” Int. J. Control, vol. 77, no. 8, pp. 748–766, 2004. https://doi.org/10.1080/00207170410001713448.Search in Google Scholar

[10] S. Gugercin, D. C. Sorensen, and A. C. Antoulas, “A modified low-rank Smith method for large-scale Lyapunov equations,” Numer. Algorithm., vol. 32, no. 1, pp. 27–55, 2003. https://doi.org/10.1023/a:1022205420182.10.1023/A:1022205420182Search in Google Scholar

[11] T. Penzl, “A cyclic low-rank Smith method for large sparse Lyapunov equations,” SIAM J. Sci. Comput., vol. 21, no. 4, pp. 1401–1418, 1999. https://doi.org/10.1137/s1064827598347666.Search in Google Scholar

[12] K. Glover, “All optimal hankel-norm approximations of linear multivariable systems and their L∞${\mathcal{L}}_{\infty }$-error bounds,” Int. J. Control, vol. 39, no. 6, pp. 1115–1193, 1984. https://doi.org/10.1080/00207178408933239.Search in Google Scholar

[13] P. Benner and S. W. R. Werner, “Hankel-norm approximation of large-scale descriptor systems,” Adv. Comput. Math., vol. 46, Art.no. 40, 2020, https://doi.org/10.1007/s10444-020-09750-w.Search in Google Scholar

[14] P. Benner, E. S. Quintana-Orti, and G. Quintana-Orti, “Computing optimal hankel norm approximations of large-scale systems,” in Proceedings of 43rd IEEE Conference on Decision and Control, vol. 3, 2004, pp. 3078–3083.10.1109/CDC.2004.1428939Search in Google Scholar

[15] S. Gugercin, A. C. Antoulas, and C. Beattie, “ H 2 model reduction for large-scale linear dynamical systems,” SIAM J. Matrix Anal. Appl., vol. 30, no. 2, pp. 609–638, 2008. https://doi.org/10.1137/060666123.Search in Google Scholar

[16] G. Flagg, C. A. Beattie, and S. Gugercin, “Interpolatory H∞ model reduction,” Syst. Control Lett., vol. 62, no. 7, pp. 567–574, 2013. https://doi.org/10.1016/j.sysconle.2013.03.006.Search in Google Scholar

[17] C. K. Krzysztof, Methods of Descent for Nondifferentiable Optimization, Berlin, New York, Springer-Verlag, 1985.Search in Google Scholar

[18] M. M. Mäkelä and P. Neittaanmäki, Nonsmooth Optimization: Analysis and Algorithms with Applications to Optimal Control, Singapore, World Scientific, 1992.10.1142/1493Search in Google Scholar

[19] J. V. Burke, A. S. Lewis, and M. L. Overton, “Two numerical methods for optimizing matrix stability,” Linear Algebra Appl., vols. 351–352, pp. 117–145, 2002, https://doi.org/10.1016/s0024-3795(02)00260-4.Search in Google Scholar

[20] J. V. Burke, A. S. Lewis, and M. L. Overton, “A robust gradient sampling algorithm for nonsmooth, nonconvex optimization,” SIAM J. Opt., vol. 15, no. 3, pp. 751–779, 2005. https://doi.org/10.1137/030601296.Search in Google Scholar

[21] S. W. R. Werner, M. L. Overton, and B. Peherstorfer, “Multifidelity robust controller design with gradient sampling,” SIAM J. Sci. Comput., vol. 45, no. 2, pp. A933–A957, 2023. https://doi.org/10.1137/22m1500137.Search in Google Scholar

[22] A. Asl and M. L. Overton, “Analysis of the gradient method with an Armijo–Wolfe line search on a class of non-smooth convex functions,” Optim. Method Softw., vol. 35, no. 2, pp. 223–242, 2020. https://doi.org/10.1080/10556788.2019.1673388.Search in Google Scholar

[23] A. S. Lewis and M. L. Overton, “Nonsmooth optimization via quasi-Newton methods,” Math. Program., vol. 141, nos. 1–2, pp. 135–163, 2013, https://doi.org/10.1007/s10107-012-0514-2.Search in Google Scholar

[24] C. Lemaréchal, Numerical Experiments in Nonsmooth Optimization, Laxenburg, Austria, International Institute for Applied System Analysis (IIASA), 1982, pp. 61–84.Search in Google Scholar

[25] J. V. Burke, F. E. Curtis, A. S. Lewis, M. L. Overton, and L. E. A. Simões, “Gradient sampling methods for nonsmooth optimization,” in Numerical Nonsmooth Optimization: State of the Art Algorithms, A. M. Bagirov, M. Gaudioso, N. Karmitsa, M. M. Mäkelä, and S. Taheri, Eds., Cham, Switzerland, Springer International Publishing, 2020, pp. 201–225.10.1007/978-3-030-34910-3_6Search in Google Scholar

[26] P. Schwerdtner and M. Voigt, “SOBMOR: structured optimization-based model order reduction,” SIAM J. Sci. Comput., vol. 45, no. 2, pp. A502–A529, 2023. https://doi.org/10.1137/20m1380235.Search in Google Scholar

[27] A. Megretski, “H-infinity model reduction with guaranteed suboptimality bound,” in 2006 American Control Conference, 2006, p. 6.10.1109/ACC.2006.1655397Search in Google Scholar

[28] A. Aliyev, P. Benner, E. Mengi, and M. Voigt, “A subspace framework for H-infinity norm minimization,” SIAM J. Matrix Anal. Appl., vol. 41, no. 2, pp. 928–956, 2020. https://doi.org/10.1137/19m125892x.Search in Google Scholar

[29] S. Chellappa, L. Feng, V. de la Rubia, and P. Benner, “Adaptive interpolatory mor by learning the error estimator in the parameter domain,” in Model Reduction of Complex Dynamical Systems, P. Benner, T. Breiten, H. Faßbender, M. Hinze, T. Stykel, and R. Zimmermann, Eds., Cham, Switzerland, Birkhauser, 2021, pp. 97–117.10.1007/978-3-030-72983-7_5Search in Google Scholar

[30] L. Feng and P. Benner, “A new error estimator for reduced-order modeling of linear parametric systems,” IEEE Trans. Microw. Theory, vol. 67, no. 12, pp. 4848–4859, 2019. https://doi.org/10.1109/tmtt.2019.2948858.Search in Google Scholar

[31] F. E. Curtis, T. Mitchell, and M. L. Overton, “A BFGS-SQP method for nonsmooth, nonconvex, constrained optimization and its evaluation using relative minimization profiles,” Optim. Method Softw., vol. 32, no. 1, pp. 148–181, 2017. https://doi.org/10.1080/10556788.2016.1208749.Search in Google Scholar

[32] F. E. Curtis and M. L. Overton, “A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization,” SIAM J. Opt., vol. 22, no. 2, pp. 474–500, 2012. https://doi.org/10.1137/090780201.Search in Google Scholar

[33] P. Lancaster, “On eigenvalues of matrices dependent on a parameter,” Numer. Math., vol. 6, no. 1, pp. 377–387, 1964. https://doi.org/10.1007/bf01386087.Search in Google Scholar

[34] E. Mengi, E. A. Yildirim, and M. Kiliç, “Numerical optimization of eigenvalues of Hermitian matrix functions,” SIAM J. Matrix Anal. Appl., vol. 35, no. 2, pp. 699–724, 2014. https://doi.org/10.1137/130933472.Search in Google Scholar

[35] C. Beattie and S. Gugercin, “Interpolatory projection methods for structure-preserving model reduction,” Syst. Control Lett., vol. 58, no. 3, pp. 225–232, 2009. https://doi.org/10.1016/j.sysconle.2008.10.016.Search in Google Scholar

[36] E. Mengi, “Large-scale estimation of dominant poles of a transfer function by an interpolatory framework,” SIAM J. Sci. Comput., vol. 44, no. 4, pp. A2412–A2438, 2022. https://doi.org/10.1137/21m1434349.Search in Google Scholar

[37] S. Boyd and V. Balakrishnan, “A regularity result for the singular values of a transfer matrix and a quadratically convergent algorithm for computing its L∞-norm,” Syst. Control Lett., vol. 15, no. 1, pp. 1–7, 1990. https://doi.org/10.1016/0167-6911(90)90037-u.Search in Google Scholar

[38] N. A. Bruinsma and M. Steinbuch, “A fast algorithm to compute the H∞-norm of a transfer function matrix,” Syst. Control Lett., vol. 14, no. 4, pp. 287–293, 1990. https://doi.org/10.1016/0167-6911(90)90049-z.Search in Google Scholar

[39] E. Mengi, “Large-scale and global maximization of the distance to instability,” SIAM J. Matrix Anal. Appl., vol. 39, no. 4, pp. 1776–1809, 2018. https://doi.org/10.1137/18m1177019.Search in Google Scholar

[40] K. Carlberg, R. Tuminaro, and P. Boggs, “Preserving Lagrangian structure in nonlinear model reduction with application to structural dynamics,” SIAM J. Sci. Comput., vol. 37, no. 2, pp. B153–B184, 2015. https://doi.org/10.1137/140959602.Search in Google Scholar

[41] I. Kalashnikova, B. van Bloemen Waanders, S. Arunajatesan, and M. Barone, “Stabilization of projection-based reduced order models for linear time-invariant systems via optimization-based eigenvalue reassignment,” Comput. Methods Appl. Mech. Eng., vol. 272, pp. 251–270, 2014, https://doi.org/10.1016/j.cma.2014.01.011.Search in Google Scholar

[42] G. Serre, P. Lafon, X. Gloerfelt, and C. Bailly, “Reliable reduced-order models for time-dependent linearized Euler equations,” J. Comput. Phys., vol. 231, no. 15, pp. 5176–5194, 2012. https://doi.org/10.1016/j.jcp.2012.04.019.Search in Google Scholar

[43] F. R. Gantmacher, The Theory of Matrices, vol. II, New York, NY, USA, Chelsea Publishing Company, 1959.Search in Google Scholar

[44] A. Aliyev, P. Benner, E. Mengi, P. Schwerdtner, and M. Voigt, “Large-scale computation of L∞${\mathcal{L}}_{\infty }$-norms by a greedy subspace method,” SIAM J. Matrix Anal. Appl., vol. 38, no. 4, pp. 1496–1516, 2017. https://doi.org/10.1137/16m1086200.Search in Google Scholar

[45] P. Benner and T. Mitchell, “Faster and more accurate computation of the H∞${\mathcal{H}}_{\infty }$ norm via optimization,” SIAM J. Sci. Comput., vol. 40, no. 5, pp. A3609–A3635, 2018. https://doi.org/10.1137/17m1137966.Search in Google Scholar

[46] T. Mitchell, “ROSTAPACK: RObust STAbility PACKage v3.0,” 2024. http://www.timmitchell.com/software/ROSTAPACK/.Search in Google Scholar

[47] P. Benner and S. W. R. Werner, “MORLAB – model order reduction LABoratory (version 5.0),” 2019. Available at: http://www.mpi-magdeburg.mpg.de/projects/morlab.Search in Google Scholar

[48] Y. Chahlaoui and P. Van Dooren, “Benchmark examples for model reduction of linear time-invariant dynamical systems,” in Dimension Reduction of Large-Scale Systems, P. Benner, D. C. Sorrenson and V. Mehrmann, Eds., Berlin, Springer, vol. 45, 2005, pp. 379–392.10.1007/3-540-27909-1_24Search in Google Scholar

[49] T. Penzl, “Algorithms for model reduction of large dynamical systems,” Linear Algebra Appl., vol. 415, nos. 2–3, pp. 322–343, 2006, https://doi.org/10.1016/j.laa.2006.01.007.Search in Google Scholar

[50] A. J. Mayo and A. C. Antoulas, “A framework for the solution of the generalized realization problem,” Linear Algebra Appl., vol. 425, nos. 2–3, pp. 634–662, 2007, https://doi.org/10.1016/j.laa.2007.03.008.Search in Google Scholar

Received: 2023-09-15

Accepted: 2024-10-04

Published Online: 2025-01-15

Published in Print: 2025-09-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jnma-2023-0115

Keywords for this article

H ∞ model reduction; descriptor system; quasi-Newton methods; Petrov–Galerkin projection; Hermite interpolation

Creative Commons

BY 4.0

A subspace framework for L ∞ model reduction

Article

Abstract

1 Introduction

1.1 Literature and contributions

1.2 Outline

2 Use of first-order and second-order derivative-based methods

3 A subspace framework

Theorem 1.

Algorithm 1

3.1 Refinement step

Algorithm 2

4 Interpolation properties of the subspace framework

5 A quadratic convergence result regarding Algorithm 1

Assumption 2.

Assumption 3.

Theorem 4.

Proof.

6 Dealing with asymptotic stability constraints

7 Practical issues

7.1 Initialization

Remark 1.

Remark 2.

7.2 Termination

7.3 Orthonormalization of the bases for the subspaces

7.4 Solution of the reduced L ∞ -norm minimization problem

7.5 Computation of the L ∞ norm

8 Numerical results

8.1 ISS example

8.2 CD player model

8.3 FOM model

8.4 Systems with large order

9 Software

10 Conclusions

Acknowledgment

References

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue

A subspace framework for L _∞ model reduction

7.4 Solution of the reduced L ∞ -norm minimization problem