The robust isolated calmness of spectral norm regularized convex matrix optimization problems

Ziran Yin; Xiaoyu Chen; Jihong Zhang

doi:10.1515/math-2025-0189

Artikel Open Access

The robust isolated calmness of spectral norm regularized convex matrix optimization problems

Ziran Yin , Xiaoyu Chen und Jihong Zhang

Veröffentlicht/Copyright: 11. August 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

$Open Mathematics$

Aus der Zeitschrift Open Mathematics Band 23 Heft 1

Abstract

This article aims to provide a series of characterizations of the robust isolated calmness of the Karush-Kuhn-Tucker (KKT) mapping for spectral norm regularized convex optimization problems. By establishing the variational properties of the spectral norm function, we directly prove that the KKT mapping is isolated calm if and only if the strict Robinson constraint qualification (SRCQ) and the second-order sufficient condition (SOSC) hold. Furthermore, we obtain the crucial result that the SRCQ for the primal/dual problem and the SOSC for the dual/primal problem are equivalent. The obtained results can derive more equivalent conditions of the robust isolated calmness of the KKT mapping, thereby enriching the stability theory of spectral norm regularized optimization problems and enhancing the usability of isolated calmness in algorithm applications.

Keywords: isolated calmness; second-order sufficient condition; spectral norm; strict Robinson constraint qualification; critical cone

MSC 2010: 90C25; 90C31; 65K10

1 Introduction

Consider the spectral norm regularized matrix optimization problem

(1.1) min X ∈ ℜ m × n h ( Q X ) + ⟨ C , X ⟩ + ‖ X ‖ 2 s.t. ℬ X − b ∈ P ,

where h : ℜ d → ℜ is twice continuously differentiable convex function and is essentially strictly convex, Q : ℜ m × n → ℜ d and ℬ : ℜ m × n → ℜ l are two linear operators, C ∈ ℜ m × n and b ∈ ℜ l are given data, and P ⊆ ℜ l is a given convex polyhedra cone. The function ‖ X ‖ 2 is the spectral norm function, namely, the largest singular value of X . Without loss of generality, we suppose that m ≤ n in what follows. For convenience, in the following text, we always use θ to represent the spectral norm function, i.e., θ ( ⋅ ) = ‖ ⋅ ‖ 2 . Problem (1.1) has a wide range of applications in various fields, such as matrix approximation problems [1], matrix Chebyshev polynomial problems [2], and H ∞ synthesis problems [3–5], etc. Recently, spectral norm regularized problem has also been applied in deep learning and neural networks problems [6–8].

Stability analysis theory is very crucial in studying the convergence of algorithms. As one of the important concepts in stability properties, isolate calmness (Definition 1) may guarantee the linear convergence rate of some algorithms, such as the alternating direction method of multipliers [9] and the proximal augmented Lagrangian method [10]. There are many publications on studying the isolated calmness of the Karush-Kuhn-Tucke (KKT) mapping (ICKKTM) for optimization problems. For the nonlinear semidefinite programming, Zhang and Zhang [11] show that the ICKKTM can be derived from the second-order sufficient condition (SOSC) and the strict Robinson constraint qualification (SRCQ). Zhang et al. [12] demonstrate that both the SOSC and the SRCQ are sufficient and necessary for the ICKKTM for the nonlinear second-order cone programming problem. It is important to note that Ding et al. [13] study the robust isolated calmness of a large class of cone programming problems. They prove that if and only if both the SOSC and the SRCQ hold, the KKT mapping is robustly isolated calm.

When the optimization problem has some kind of special linear structure, by establishing the equivalence between the constraint qualification of the primal (dual) problem and second-order optimality condition of the dual (primal) problem, one can obtain more characterizations of stability properties. For instance, for the standard semidefinite programming problem, Chan and Sun [14] show that the constraint nondegeneracy for the primal (dual) problem is equivalent to the strong SOSC for the dual (primal), then they obtain a series of equivalent characterizations of the strong regularity of the KKT point. Han et al. [15] discover the equivalence relationship between the dual (primal) SRCQ and the primal (dual) SOSC for convex composite quadratic semidefinite programming problem, thereby deriving a series of equivalent conditions of ICKKTM. For the case where θ in problem (1.1) is the nuclear norm function, Cui and Sun [16] show that the primal (dual) SRCQ is equivalent to the dual (primal) SOSC. Therefore, they derive more equivalent conditions of the robust ICKKTM.

Inspired by the work in [16], given the widespread application of spectral norm regularized convex matrix optimization problems, a natural question is whether the results in [16] can be extended to spectral norm regularized convex optimization problems. We provide an positive answer in this article. Due to the special structure of the critical cone of spectral norm function, we provide several equivalent conditions of the robust ICKKTM. That is to say, the conclusions in [16] are still valid for problem (1.1). Compared to [16], the difference in this article is that we use the established variational properties of the proximal mapping of spectral norm to provide a direct proof of the ICKKTM for problem (1.1), while [16] provides an indirect proof of the isolated calmness based on [13, Theorem 24] regarding the optimization problems with a C 2 -cone reducible constraint set. In addition, compared to the SOSC, the SRCQ for primal or dual problem is easier to verify, which can enhance the practicality of isolated calmness in algorithm research.

The organization of the subsequent content is as follows. We provide some notations and preliminaries on variational analysis, which will be used in the following text in Section 2. The variational properties of the spectral norm function are studied in Section 3, including the relationship between the critical cones of the spectral norm function and its conjugate function, and the explicit expression of the directional derivative of the proximal mapping of the spectral norm function. The results in Section 3 play a vital role in obtaining the important conclusions of this article. In Section 4, we prove that the primal (dual) SRCQ holds if and only if the dual (primal) SOSC holds and thus establish more equivalent conditions for the robust ICKKTM for problem (1.1). We conclude this article in Section 5.

Some common symbols and notations for matrices are as follows:

For any positive integer t , we denote by [ t ] the index set { 1 , … , t } . For any Z ∈ ℜ m × n , the ( i , j ) th entry of Z is denoted as Z i j , where i ∈ [ m ] , j ∈ [ n ] . Let μ ⊆ [ m ] and ν ⊆ [ n ] be two index sets. We write Z ν to be the submatrix of Z that only retains all columns in ν , and Z μ ν to be the ∣ μ ∣ × ∣ ν ∣ submatrix of Z that only retains all rows in μ and columns in ν .
For any d ∈ ℜ m , Diag ( d ) represents the m × m diagonal matrix, where the i th diagonal element is d i , i ∈ [ m ] .
Using “trace” to represent the sum of diagonal elements in a given square matrix. For any two matrices P and Q in ℜ m × n , the inner product of P and Q is written as ⟨ P , Q ⟩ ≔ trace ( P T Q ) . The Hadamard product of matrices P and Q is represented by the symbol “ ∘ ”, i.e., the ( i , j ) th entry of P ∘ Q ∈ ℜ m × n is P i j Q i j .
Let S w be the linear space of all w × w real symmetric matrices, and S + w and S − w be the cones of all w × w positive and negative semidefinite matrices, respectively.

2 Notation and preliminaries

Let X and Y be two finite dimensional real Euclidean spaces. Let B Y be the unit ball in Y . Let D ⊆ X be a nonempty closed convex set. For any z ∈ D , the tangent cone [17, Definition 6.1] of D at z is defined by T D ( z ) ≔ { d ∈ X ∣ ∃ z k → z with z k ∈ D and τ k ↓ 0 s.t. ( z k − z ) ⁄ τ k → d } . We use N D ( x ) ≔ { d ∈ X : ⟨ d , z − x ⟩ ≤ 0 , ∀ z ∈ D } to denote the normal cone of D at x ∈ D . Denote by δ D ( x ) the indicator function over D , i.e., δ D ( x ) = 0 if x ∈ D , and δ D ( x ) = ∞ if x ∉ D . Define the support function of the set D as σ ( y , D ) ≔ sup x ∈ D ⟨ x , y ⟩ , y ∈ X . For a given x ∈ X , define Π D ( x ) ≔ arg min { ‖ d − x ‖ ∣ d ∈ D } as the projection mapping onto D . Suppose the function f : X → ( − ∞ , + ∞ ] is proper closed convex, denote by dom f ≔ { x ∣ f ( x ) < ∞ } the effective domain of f , by f * the conjugate of f and by ∂ f the subdifferential of f . For more details, one can refer standard convex analysis [17]. For any convex cone K ⊂ Y , denote by K ∘ ≔ { z ′ ∈ Y ∣ ⟨ z ′ , z ⟩ ≤ 0 , ∀ z ∈ K } the polar of K .

The definition of isolated calmness below, which is taken from [13, Definition 2], is the most important concept in this article.

Definition 1

The set-valued mapping G : X ⇉ Y is said to be isolated calm at u ¯ for v ¯ if v ¯ ∈ G ( u ¯ ) , and there exist a positive constant κ and neighborhoods U of u ¯ and V of v ¯ such that

(2.1) G ( u ) ∩ V ⊂ { v ¯ } + κ ‖ u − u ¯ ‖ B Y ∀ u ∈ U .

Moreover, we say that G is robustly isolated calm at u ¯ for v ¯ if (2.1) holds, and for each u ∈ U , G ( u ) ∩ V ≠ ∅ .

For any closed convex function g : X → ( − ∞ , + ∞ ] , we know from [18, Proposition 2.58] that g is directionally epidifferentiable. We use g ↓ ( x ; ⋅ ) to denote the directional epiderivatives of g . Further, if g ↓ ( x ; d ) is finite for x ∈ dom g and d ∈ X , we define the lower second-order directional epiderivative of g for any w ∈ X by

g − ↓ ↓ ( x ; d , w ) ≔ liminf τ ↓ 0 w ′ → w g ( x + τ d + 1 2 τ 2 w ′ ) − g ( x ) − τ g ↓ ( x ; d ) 1 2 τ 2 .

3 Variational analysis of the spectral norm function

In this section, we will provide a direct expression for the directional derivative of the proximal mapping of the spectral norm function. Then discuss the relationship between directional derivatives of the proximal mapping, the so-called sigma term and the critical cones.

Given an arbitrary matrix Q ∈ ℜ m × n , let σ 1 ( Q ) ≥ σ 2 ( Q ) ≥ … ≥ σ m ( Q ) be the singular values of Q . Denote σ ( Q ) ≔ ( σ 1 ( Q ) , σ 2 ( Q ) , … , σ m ( Q ) ) T . For any integer p > 0 , let O p be the set of all p × p orthogonal matrices. We assume that Q ∈ ℜ m × n admits the singular value decomposition (SVD) as follows:

(3.1) Q = U [ Diag ( σ ( Q ) ) 0 ] V T = U [ Diag ( σ ( Q ) ) 0 ] [ V 1 V 2 ] T = U Diag ( σ ( Q ) ) V 1 T ,

where U ∈ O m and [ V 1 V 2 ] ∈ O n with V 1 ∈ ℜ n × m and V 2 ∈ ℜ n × ( n − m ) . Define the following three index sets

(3.2) a ≔ { 1 ≤ i ≤ m ∣ σ i ( Q ) > 0 } , b ≔ { 1 ≤ i ≤ m ∣ σ i ( Q ) = 0 } , c ≔ { m + 1 , … , n } .

For any two matrices P , W ∈ S n , the well known Fan’s inequality [19] takes the form

(3.3) ⟨ P , W ⟩ ≤ λ ( P ) T λ ( W ) ,

where λ ( P ) represents the eigenvalue vector of P whose elements are arranged in nonincreasing order.

For any P , W ∈ ℜ m × n , by [20, Theorem 31.5], we know that W ∈ ∂ θ ( P ) (or, equivalently, P ∈ ∂ θ * ( W ) ) holds if and only if

Prox θ ( P + W ) = P ⇔ Prox θ * ( P + W ) = W ,

where Prox θ : ℜ m × n → ℜ m × n denote the proximal mapping of θ [21, Definition 6.1], namely,

Prox θ ( Z ) ≔ arg min Z ′ ∈ ℜ m × n θ ( Z ′ ) + 1 2 ‖ Z ′ − Z ‖ 2 , Z ∈ ℜ m × n ,

and Prox θ * : ℜ m × n → ℜ m × n denote the proximal mapping of θ * . Denote Q ≔ P + W . Let Q have the SVD as (3.1). Then by [21, Theorem 7.29 and Example 7.31], we have that

P = U [ Diag ( σ ( P ) ) 0 ] V T , W = U [ Diag ( σ ( W ) ) 0 ] V T

and

σ ( P ) = κ ( σ ( Q ) ) and σ ( W ) = σ ( Q ) − κ ( σ ( Q ) ) ,

where κ : ℜ m → ℜ m is the proximal mapping of the l ∞ norm (i.e., the maximum absolute value of the elements in a vector in ℜ m ).

Define ϕ : ℜ → ℜ as the following scalar function:

ϕ ( x ) ≔ min { x , λ * } , x ∈ ℜ ,

where λ * > 0 and satisfies ∑ i = 1 m [ σ i ( Q ) − λ * ] + = 1 (for any x ∈ ℜ , [ x ] + means the nonnegative part of x ). Then, we can conclude that κ ( σ ) = ( ϕ ( σ 1 ) , ϕ ( σ 2 ) , … , ϕ ( σ m ) ) and

P = Prox θ ( Q ) = U [ Diag ( ϕ ( σ 1 ) , ϕ ( σ 2 ) , … , ϕ ( σ m ) ) 0 ] V T .

Obviously, Prox θ ( ⋅ ) can be regarded as Löwner’s operator related to ϕ .

Let ν 1 ( Q ) , ν 2 ( Q ) , … , ν r ( Q ) be the nonzero different singular values of Q , and there exists an integer 1 ≤ r ˜ ≤ r such that

ν 1 ( Q ) > ν 2 ( Q ) > … > ν r ˜ ( Q ) ≥ λ * > ν r ˜ + 1 ( Q ) > … > ν r ( Q ) > 0 .

To proceed, we further define several index sets. We denote a r + 1 ≔ b for convenience. Divide the set a into subsets as follows:

a k ≔ { i ∈ a ∣ σ i ( Q ) = ν k ( Q ) } , k ∈ [ r ] ,

and define two index sets associated with λ * :

(3.4) α ≔ ⋃ k = 1 r ˜ a k , β ≔ ⋃ k = r ˜ + 1 r + 1 a k .

Specifically, σ i ( P ) = min { σ i ( Q ) , λ * } , i = 1 , … , m , namely,

σ i ( P ) = λ * , if i ∈ α , σ i ( Q ) , if i ∈ β .

In fact, λ * coincides with the largest singular value of P , namely, λ * = ‖ P ‖ 2 . According to Watson [22], the subdifferential of θ can be characterized as follows:

(3.5) ∂ θ ( P ) = { U α H V α T , ∀ H ∈ S + ∣ α ∣ , trace ( H ) = 1 } .

Therefore, we know that σ i ( W ) = 0 when i ∈ β , and the set α can be divided into three subsets as follows:

(3.6) α 1 ≔ { i ∈ α ∣ σ i ( W ) = 1 } , α 2 ≔ { i ∈ α ∣ 0 < σ i ( W ) < 1 } , α 3 ≔ { i ∈ α ∣ σ i ( W ) = 0 } .

Hence, there exist integers 0 ≤ r 0 ≤ 1 and r ˜ − 1 ≤ r 1 ≤ r ˜ such that

α 1 = ⋃ k = 1 r 0 a k , α 2 = ⋃ k = r 0 + 1 r 1 a k , α 3 = ⋃ k = r 1 + 1 r ˜ a k .

Since the spectral norm function is globally Lipschitz continuous with modulus 1 and convex, θ is directional differentiable at any point in ℜ m × n . From (3.5), for any D ∈ ℜ m × n , the directional derivative of θ at P in the direction D can be expressed as follows:

(3.7) θ ′ ( P ; D ) = sup S ∈ ∂ θ ( P ) ⟨ D , S ⟩ = ‖ U α T D V α ‖ 2 .

For W ∈ ∂ θ ( P ) , or, equivalently, P ∈ ∂ θ * ( W ) , the critical cone of θ at P for W and the critical cone of θ * at W for P are defined as follows:

(3.8) C θ ( P , W ) ≔ { D ∈ ℜ m × n ∣ θ ′ ( P ; D ) = ⟨ D , W ⟩ }

and

(3.9) C θ * ( W , P ) ≔ { D ∈ ℜ m × n ∣ ( θ * ) ′ ( W ; D ) = ⟨ D , P ⟩ } .

Next we will give the expression for the directional derivative of Prox θ ( ⋅ ) . By [23], we know that Prox θ ( ⋅ ) is directional differentiable and the directional derivative can be obtained via the directional derivative of ϕ . Clearly the directional derivative of ϕ is

ϕ ′ ( p ; h ) = 0 , if p > λ * , min { h , 0 } , if p = λ * , h , if p < λ * .

For any Q ∈ ℜ m × n , denote Θ α α 2 : ℜ m × n → ℜ ∣ α ∣ × ∣ α ∣ , Θ α β 1 : ℜ m × n → ℜ ∣ α ∣ × ∣ β ∣ , Θ α β 2 : ℜ m × n → ℜ ∣ α ∣ × ∣ β ∣ , and Θ α c : ℜ m × n → ℜ ∣ α ∣ × ∣ c ∣ as follows:

( Θ α α 2 ( Q ) ) i j ≔ 2 λ * σ i ( Q ) + σ j ( Q ) , i ∈ [ ∣ α ∣ ] , j ∈ [ ∣ α ∣ ] , ( Θ α β 1 ( Q ) ) i j ≔ λ * − σ j + ∣ α ∣ ( Q ) σ i ( Q ) − σ j + ∣ α ∣ ( Q ) , i ∈ [ ∣ α ∣ ] , j ∈ [ ∣ β ∣ ] , ( Θ α β 2 ( Q ) ) i j ≔ λ * + σ j + ∣ α ∣ ( Q ) σ i ( Q ) + σ j + ∣ α ∣ ( Q ) , i ∈ [ ∣ α ∣ ] , j ∈ [ ∣ β ∣ ] , ( Θ α c ( Q ) ) i j ≔ λ * σ i ( Q ) , i ∈ [ ∣ α ∣ ] , j ∈ [ n − m ] .

Define two linear matrix operators S : ℜ p × p → S p and T : ℜ p × p → ℜ p × p by

(3.10) S ( H ) ≔ 1 2 ( H + H T ) and T ( H ) ≔ 1 2 ( H − H T ) , ∀ H ∈ ℜ p × p .

For all Q ∈ ℜ m × n and D ∈ ℜ m × n , let D = [ D 1 D 2 ] with D 1 ∈ ℜ m × m , and D 2 ∈ ℜ m × ( n − m ) . We define four matrix mappings Ξ 1 : ℜ m × n × ℜ m × n → ℜ ∣ α ∣ × ∣ α ∣ , Ξ 2 : ℜ m × n × ℜ m × n → ℜ ∣ α ∣ × ∣ β ∣ , Ξ 3 : ℜ m × n × ℜ m × n → ℜ ∣ β ∣ × ∣ α ∣ , and Ξ 4 : ℜ m × n × ℜ m × n → ℜ ∣ α ∣ × ∣ c ∣ by

(3.11) Ξ 1 ( Q , D ) ≔ Θ α α 2 ( Q ) ∘ T ( D 1 ) α α , Ξ 2 ( Q , D ) ≔ Θ α β 1 ( Q ) ∘ S ( D 1 ) α β + Θ α β 2 ( Q ) ∘ T ( D 1 ) α β , Ξ 3 ( Q , D ) ≔ ( Θ α β 1 ( Q ) ) T ∘ S ( D 1 ) β α + ( Θ α β 2 ( Q ) ) T ∘ T ( D 1 ) β α , Ξ 4 ( Q , D ) ≔ Θ α c ( Q ) ∘ D α c .

Therefore, according to [23, Theorem 3], the directional derivative Prox θ ′ ( Q ; D ) can be written as follows:

(3.12) Prox θ ′ ( Q ; D ) = U Ξ 1 ( Q , D ˜ ) + Ξ α α ( D ˜ ) Ξ 2 ( Q , D ˜ ) Ξ 4 ( Q , D ˜ ) Ξ 3 ( Q , D ˜ ) D ˜ β β D ˜ β c V T ,

where D ˜ ≔ U T D V and

(3.13) Ξ α α ( D ˜ ) ≔ 0 α 1 × α 1 0 α 2 × α 2 Π S − ∣ α 3 ∣ ( S ( D ˜ α 3 α 3 ) ) .

Theorem 3.1 in [24] shows that the singular value function is second-order directionally differentiable in ℜ m × n . This means that the spectral norm function is second-order directional differentiable, and θ ″ ( X ; ⋅ ) = θ ⇊ ( X ; ⋅ ) . To analyze the SOSC of problem (1.1), which helps to characterize the ICKKTM, we need to compute the “sigma term” of the problem (1.1), namely, the conjugate function of the second-order directional derivative of θ . For any k = 1 , … , r , define Ω a k : ℜ m × n × ℜ m × n → ℜ a k × a k by

Ω a k ( P , D ) ≔ ( S ( D ˜ 1 ) ) a k T ( Σ ( P ) − ν k ( P ) I m ) † ( S ( D ˜ 1 ) ) a k − ( 2 ν k ( P ) ) − 1 D ˜ a k c D ˜ a k c T + ( T ( D ˜ 1 ) ) a k T ( − Σ ( P ) − ν k ( P ) I m ) † ( T ( D ˜ 1 ) ) a k ,

where Z † is the Moore-Penrose pseudo-inverse of Z , Σ ( P ) ≔ Diag ( σ ( P ) ) , I m is the m × m identity matrix and D ˜ = [ D ˜ 1 D ˜ 2 ] = [ U T D V 1 U T D V 2 ] . By [25], we know that ∀ D ∈ ℜ m × n , the conjugate of θ ″ ( P ; D , ⋅ ) is

(3.14) ψ ( P , D ) * ( W ) ≔ ( θ ″ ( P ; D , ⋅ ) ) * ( W ) = ⟨ ( Σ ( W ) ) α α , 2 Ω α ( P , D ) ⟩ = ∑ k = 1 r 1 σ k ( W ) trace ( 2 Ω a k ( P , D ) ) ,

where σ k ( W ) = ( σ i ( W ) ) i ∈ a k . Clearly, ν k ( P ) = λ * when k = 1 , … , r ˜ . Continuing to calculate formula (3.14), we can obtain the explicit expression for ψ ( P , D ) * ( W ) as follows:

ψ ( P , D ) * ( W ) = ∑ 1 ≤ l ≤ r 1 r ˜ + 1 ≤ t ≤ r + 1 2 σ l ( W ) ν t ( P ) − λ * ‖ ( S ( D ˜ 1 ) ) a l a t ‖ 2 − ∑ 1 ≤ l ≤ r 1 σ l ( W ) λ * ‖ D ˜ a l c ‖ 2 + ∑ 1 ≤ l ≤ r 1 1 ≤ t ≤ r + 1 2 σ l ( W ) − ν t ( P ) − λ * ‖ ( T ( D ˜ 1 ) ) a l a t ‖ 2 .

We are now prepared to give the main conclusions of this section, that is, some properties of the critical cones and the conjugate function of second-order directional derivative ψ * .

Propositions 10 and 12 in [25] characterized the critical cones of θ and θ * defined in (3.8) and (3.9), respectively. We summarize the results as follows.

Proposition 1

D ∈ C θ ( P , W ) if and only if there exists some τ ∈ ℜ such that
λ ∣ α 1 ∣ ( S ( D ˜ α 1 α 1 ) ) ≥ τ ≥ λ 1 ( S ( D ˜ α 3 α 3 ) )
and
S ( D ˜ α α ) = S ( D ˜ α 1 α 1 ) τ I ∣ α 2 ∣ S ( D ˜ α 3 α 3 ) ,
where λ ( Z ) denote the real eigenvalue vector of symmetric matrix Z being arranged in nonincreasing order.
D ∈ C θ * ( W , P ) if and only if
trace ( D ˜ α α ) = 0 , S ( D ˜ α 1 α 1 ) ∈ S − ∣ α 1 ∣ , D ˜ α 3 α 3 ∈ S + ∣ α 3 ∣
and
D ˜ = D ˜ α 1 α 1 D ˜ α 1 α 2 D ˜ α 1 α 3 D ˜ α 1 β D ˜ α 1 c D ˜ α 2 α 1 D ˜ α 2 α 2 D ˜ α 2 α 3 D ˜ α 2 β D ˜ α 2 c D ˜ α 3 α 1 D ˜ α 3 α 2 D ˜ α 3 α 3 0 0 D ˜ β α 1 D ˜ β α 2 0 0 0 .

Remark 1

For the spectral norm function θ , the τ in part (i) of proposition 1 is actually the largest eigenvalue of S ( D ˜ α α ) , i.e., τ = λ 1 ( S ( D ˜ α α ) ) . This means that τ is relevant to D . It is easy to observe that the index sets α 1 and α 2 cannot exist simultaneously. When α 1 ≠ ∅ , there must be ∣ α 1 ∣ = 1 and α 2 = ∅ , then D ∈ C θ ( P , W ) if and only if

S ( D ˜ α α ) = λ 1 ( S ( D ˜ α α ) ) S ( D ˜ α 3 α 3 ) .

When α 1 = ∅ , α 2 ≠ ∅ , D ∈ C θ ( P , W ) if and only if

S ( D ˜ α α ) = λ 1 ( S ( D ˜ α α ) ) I ∣ α 2 ∣ S ( D ˜ α 3 α 3 ) .

In brief, we can conclude that D ∈ C θ ( P , W ) if and only if

(3.15) S ( D ˜ α α ) = λ 1 ( S ( D ˜ α α ) ) I ∣ α 1 ∣ + ∣ α 2 ∣ S ( D ˜ α 3 α 3 ) .

Proposition 2

Let W ∈ ∂ θ ( P ) and Q = P + W admit the SVD as in (3.1). Let the index sets α , β , α 1 , α 2 , α 3 , and c be defined as (3.2), (3.4), and (3.6). Given any H ∈ ℜ m × n , denote H ˜ = U T H V . Then the following results hold:

If H ∈ C θ * ( W , P ) and φ ( W , H ) * ( P ) = 0 , then H ∈ ( C θ ( P , W ) ) ∘ .
H ∈ C θ ( P , W ) and ψ ( P , H ) * ( W ) = 0 if and only if H ∈ ( C θ * ( W , P ) ) ∘ .

Proof

We first prove part ( i ) . By [25, Proposition 16], ∀ H ∈ ℜ m × n , ψ ( P , H ) * ( W ) = 0 if and only if φ ( W , H ) * ( P ) = 0 , which also is equivalent to

H ˜ α 1 α 1 H ˜ α 1 α 2 H ˜ α 2 α 1 H ˜ α 2 α 2 ∈ S ∣ α 1 ∣ + ∣ α 2 ∣ , H ˜ α 1 α 3 = ( H ˜ α 3 α 1 ) T , H ˜ α 2 α 3 = ( H ˜ α 3 α 2 ) T ,

H ˜ α 1 β = ( H ˜ β α 1 ) T = 0 , H ˜ α 2 β = ( H ˜ β α 2 ) T = 0 , H ˜ α 1 c = 0 , H ˜ α 2 c = 0 .

Then, it follows from part ( i i ) of Proposition 1 that H ∈ C θ * ( W , P ) and φ ( W , H ) * ( P ) = 0 are equivalent to

(3.16) H ˜ α α ∈ S ∣ α ∣ , trace ( H ˜ α α ) = 0 , H α 1 α 1 ∈ S − ∣ α 1 ∣ , H α 3 α 3 ∈ S + ∣ α 3 ∣ and H ˜ = H ˜ α α 0 0 0 0 0 .

By part ( i ) of Proposition 1, for any D ∈ C θ ( P , W ) , there exists τ ′ ∈ ℜ such that

λ ∣ α 1 ∣ ( S ( D ˜ α 1 α 1 ) ) ≥ τ ′ ≥ λ 1 ( S ( D ˜ α 3 α 3 ) )

and

S ( D ˜ α α ) = S ( D ˜ α 1 α 1 ) τ ′ I ∣ α 2 ∣ S ( D ˜ α 3 α 3 ) .

Hence, ∀ D ∈ C θ ( P , W ) and H satisfies (3.16), based on the Fan’s inequality (3.3), we have that

⟨ H , D ⟩ = ⟨ H ˜ , D ˜ ⟩ = ⟨ H ˜ α α , D ˜ α α ⟩ = ⟨ S ( H ˜ α α ) , D ˜ α α ⟩ = ⟨ H ˜ α α , S ( D ˜ α α ) ⟩ = ⟨ H ˜ α 1 α 1 , S ( D ˜ α 1 α 1 ) ⟩ + τ ′ trace ( H ˜ α 2 α 2 ) + ⟨ H ˜ α 3 α 3 , S ( D ˜ α 3 α 3 ) ⟩ ≤ ( λ ( H ˜ α 1 α 1 ) ) T λ ( S ( D ˜ α 1 α 1 ) ) + τ ′ trace ( H ˜ α 2 α 2 ) + ( λ ( H ˜ α 3 α 3 ) ) T λ ( S ( D ˜ α 3 α 3 ) ) = ( − λ ( H ˜ α 1 α 1 ) ) T ( − λ ( S ( D ˜ α 1 α 1 ) ) ) + τ ′ trace ( H ˜ α 2 α 2 ) + ( λ ( H ˜ α 3 α 3 ) ) T λ ( S ( D ˜ α 3 α 3 ) ) ≤ τ ′ trace ( H ˜ α 1 α 1 ) + τ ′ trace ( H ˜ α 2 α 2 ) + τ ′ trace ( H ˜ α 3 α 3 ) = τ ′ trace ( H ˜ α α ) = 0 ,

which means that H ∈ ( C θ ( P , W ) ) ∘ . Conversely, when τ ′ in C θ ( P , W ) equals to 0, we cannot obtain trace ( H ˜ α α ) = 0 , i.e., the reverse conclusion is not true. Then the proof of part ( i ) is completed.

To prove ( i i ) , it is easy to see that ψ ( P , H ) * ( W ) = 0 and H ∈ C θ ( P , W ) hold if and only if there exists τ ″ ∈ ℜ such that

λ ∣ α 1 ∣ ( S ( H ˜ α 1 α 1 ) ) ≥ τ ″ ≥ λ 1 ( S ( H ˜ α 3 α 3 ) )

and

H ˜ = S ( H ˜ α 1 α 1 ) 0 0 0 0 0 τ ″ I ∣ α 2 ∣ 0 0 0 0 0 H ˜ α 3 α 3 H ˜ α 3 β H ˜ α 3 c 0 0 H ˜ β α 3 H ˜ β β H ˜ β 3 c ,

which can be proved to be equivalent to H ∈ ( C θ * ( W , P ) ) ∘ by using the same method as in part ( i ) .□

The following propositions provide some properties of directional derivatives of the proximal mapping of θ .

Proposition 3

Let W ∈ ∂ θ ( P ) and Q = P + W admit the SVD as in (3.1). Let the index sets α , β , α 1 , α 2 , α 3 , and c be defined as (3.2), (3.4), and (3.6). Given any D ∈ ℜ m × n , denote D ˜ = U T D V . The function ψ ( P , D ) * ( W ) is defined in (3.14). For any D ∈ ℜ m × n , if H = Prox θ ′ ( Q ; H + D ) holds, then H ∈ C θ ( P , W ) and ⟨ H , D ⟩ = − ψ ( P , D ) * ( W ) .

Proof

Suppose H = Prox θ ′ ( Q ; H + D ) holds, together with the expression for Prox θ ′ ( Q ; ⋅ ) in (3.12), we have that

(3.17) H ˜ = U T Prox θ ′ ( Q ; H + D ) V = Ξ 1 ( Q , H ˜ + D ˜ ) + Ξ α α ( H ˜ + D ˜ ) Ξ 2 ( Q , H ˜ + D ˜ ) Ξ 4 ( Q , H ˜ + D ˜ ) Ξ 3 ( Q , H ˜ + D ˜ ) H ˜ β β + D ˜ β β H ˜ β c + D ˜ β c ,

where Ξ α α is defined in (3.13). We first verify H ∈ C θ ( P , W ) . The equality

H ˜ α α = Ξ 1 ( Q , H ˜ + D ˜ ) + Ξ α α ( H ˜ + D ˜ ) = Θ α α 2 ( Q ) ∘ ( T ( H ˜ 1 ) α α + T ( D ˜ 1 ) α α ) + 0 α 1 × α 1 0 α 2 × α 2 Π S − ∣ α 3 ∣ ( S ( H ˜ α 3 α 3 ) + S ( D ˜ α 3 α 3 ) )

implies that

S ( H ˜ ) = 0 α 1 × α 1 0 α 2 × α 2 Π S − ∣ α 3 ∣ ( S ( H ˜ α 3 α 3 ) + S ( D ˜ α 3 α 3 ) ) .

From part (i) of proposition 1, we obtain that H ∈ C θ ( P , W ) .

For convenience, we denote α ′ ≔ α 1 ∪ α 2 . By directly calculating (3.17), we deduce that H = Prox θ ′ ( Q ; H + D ) if and only if

(3.18) S ( H ˜ α ′ α ′ ) = 0 , S ( H ˜ 1 ) α ′ α 3 = 0 , S ( H ˜ 1 ) α 3 α ′ = 0 , S − ∣ α 3 ∣ ∋ S ( H ˜ α 3 α 3 ) ⊥ S ( D ˜ α 3 α 3 ) = D ˜ α 3 α 3 ∈ S + ∣ α 3 ∣ , T ( D ˜ α ′ α ′ ) i j = σ i ( Q ) + σ j ( Q ) 2 λ * − 1 ( H ˜ α ′ α ′ ) i j , i ∈ [ ∣ α ′ ∣ ] , j ∈ [ ∣ α ′ ∣ ] , ( T ( D ˜ 1 ) α ′ α 3 ) i j = σ i ( Q ) − λ * 2 λ * ( H ˜ α ′ α 3 ) i j , i ∈ [ ∣ α ′ ∣ ] , j ∈ [ ∣ α 3 ∣ ] , ( T ( D ˜ 1 ) α 3 α ′ ) j i = σ i ( Q ) − λ * 2 λ * ( H ˜ α 3 α ′ ) j i , i ∈ [ ∣ α ′ ∣ ] , j ∈ [ ∣ α 3 ∣ ] , ( D ˜ α β ) i j = λ * ( σ i ( Q ) − λ * ) λ * 2 − ( σ j + ∣ α ∣ ( Q ) ) 2 ( H ˜ α β ) i j + σ j + ∣ α ∣ ( Q ) ( σ i ( Q ) − λ * ) λ * 2 − ( σ j + ∣ α ∣ ( Q ) ) 2 ( H ˜ β α ) j i , i ∈ [ ∣ α ∣ ] , j ∈ [ ∣ β ∣ ] , ( D ˜ β α ) j i = λ * ( σ i ( Q ) − λ * ) λ * 2 − ( σ j + ∣ α ∣ ( Q ) ) 2 ( H ˜ β α ) j i + σ j + ∣ α ∣ ( Q ) ( σ i ( Q ) − λ * ) λ * 2 − ( σ j + ∣ α ∣ ( Q ) ) 2 ( H ˜ α β ) i j , i ∈ [ ∣ α ∣ ] , j ∈ [ ∣ β ∣ ] , ( D ˜ α c ) i j = σ i ( Q ) λ * − 1 ( H ˜ α c ) i j , i ∈ [ ∣ α ∣ ] , j ∈ [ n − m ] , D ˜ β β = 0 , D ˜ β c = 0 .

Noting that ν k ( Q ) − λ * = σ k ( W ) for 1 ≤ k ≤ r 1 , ν t ( P ) = λ * for 1 ≤ t ≤ r ˜ and ν t ( P ) = ν t ( Q ) for r ˜ + 1 ≤ t ≤ r + 1 . Together with (3.18) and the fact that for any D ∈ ℜ p × p ,

⟨ D , D T ⟩ = ‖ S ( D ) ‖ 2 − ‖ T ( D ) ‖ 2 , ⟨ D , D ⟩ = ‖ D ‖ 2 = ‖ S ( D ) ‖ 2 + ‖ T ( D ) ‖ 2 ,

we obtain that

⟨ H , D ⟩ = ⟨ H ˜ , D ˜ ⟩ = ⟨ H ˜ α α , D ˜ α α ⟩ + ⟨ H ˜ α β , D ˜ α β ⟩ + ⟨ H ˜ β α , D ˜ β α ⟩ + ⟨ H ˜ α c , D ˜ α c ⟩ = ⟨ H ˜ α α , T ( D ˜ α α ) ⟩ + ⟨ H ˜ α β , D ˜ α β ⟩ + ⟨ H ˜ β α , D ˜ β α ⟩ + ⟨ H ˜ α c , D ˜ α c ⟩ = ∑ 1 ≤ l ≤ r 1 1 ≤ t ≤ r ˜ ν l ( Q ) − λ * λ * ‖ ( T ( H ˜ 1 ) ) a l a t ‖ 2 + ∑ 1 ≤ l ≤ r 1 ν l ( Q ) − λ * λ * ‖ H ˜ a l c ‖ 2 + ∑ 1 ≤ l ≤ r 1 r ˜ + 1 ≤ t ≤ r + 1 − 2 ( ν l ( Q ) − λ * ) ν t ( Q ) − λ * ‖ S ( H ˜ 1 ) a l a t ‖ 2 + 2 ( ν l ( Q ) − λ * ) ν t ( Q ) + λ * ‖ T ( H ˜ 1 ) a l a t ‖ 2 = − ∑ 1 ≤ l ≤ r 1 r ˜ + 1 ≤ t ≤ r + 1 2 σ l ( W ) ν t ( P ) − λ * ‖ S ( H ˜ 1 ) a l a t ‖ 2 + ∑ 1 ≤ l ≤ r 1 σ l ( W ) λ * ‖ H ˜ a l c ‖ 2 + ∑ 1 ≤ l ≤ r 1 1 ≤ t ≤ r + 1 2 σ l ( W ) ν t ( P ) + λ * ‖ T ( H ˜ 1 ) a l a t ‖ 2 ,

so we have

⟨ H , D ⟩ = ⟨ H ˜ , D ˜ ⟩ = − ψ ( P , H ) * ( W ) .

This completes the proof.□

Proposition 4

Let W ∈ ∂ θ ( P ) and Q = P + W admit the SVD as in (3.1). Let the index sets α , β , α 1 , α 2 , α 3 , and c be defined as (3.2), (3.4), and (3.6). For any D ∈ ℜ m × n , then

Prox θ ′ ( Q ; D ) = 0 ⇔ D ∈ ( C θ 0 ( P , W ) ) ∘ ,

where C θ 0 ( P , W ) represents the critical cone C θ ( P , W ) in the case of τ = 0 , i.e.,

C θ 0 ( P , W ) = H ∈ ℜ m × n ∣ S ( H ˜ α α ) = 0 ∣ α 1 ∣ + ∣ α 2 ∣ S ( H ˜ α 3 α 3 ) , S ( H ˜ α 3 α 3 ) ∈ S − ∣ α 3 ∣ .

Proof

It is easy to verify either Prox θ ′ ( Q ; D ) = 0 or D ∈ ( C θ 0 ( P , W ) ) ∘ is equivalent to

□ D ˜ = D ˜ α α 0 0 0 0 0 , D ˜ α α ∈ S ∣ α ∣ , D α 3 α 3 ∈ S + ∣ α 3 ∣ .

4 The robust ICKKTM

In this section, owing to the special linear structure of the problem (1.1), we first show the primal (dual) SRCQ is equivalent to the dual (primal) SOSC. In fact, when the spectral norm in problem (1.1) is substituted by the nuclear norm, the same results are already given in [16]. Therefore, we only briefly state here, and the main tools we used are Propositions 1 and 2. The Lagrangian dual of problem (1.1) is as follows:

(4.1) max y , w , S − ⟨ b , y ⟩ − δ P ∘ ( y ) − h * ( w ) s.t. ℬ * y + Q * w + S + C = 0 , ‖ S ‖ * ≤ 1 ,

where ( y , w , S ) ∈ ℜ l × ℜ d × ℜ m × n , ℬ * and Q * represent the adjoint of ℬ and Q , respectively. The first order optimality conditions, i.e., the KKT conditions for the primal problem (1.1) and the dual problem (4.1) are given by

(4.2) Q * ∇ h ( Q X ) + C + S + ℬ * y = 0 , P ∘ ∋ y ⊥ ℬ X − b ∈ P , S ∈ ∂ θ ( X ) ( X , y , S ) ∈ ℜ m × n × ℜ l × ℜ m × n

and

(4.3) ℬ * y + Q * w + S + C = 0 , P ∘ ∋ y ⊥ ℬ X − b ∈ P , Q X ∈ ∂ h * ( w ) , X ∈ ∂ θ * ( S ) , ( y , w , S , X ) ∈ ℜ l × ℜ d × ℜ m × n × ℜ m × n ,

respectively. We denote by ℳ P ( X ) the set of Lagrangian multipliers respect to X for problem (1.1), namely, ℳ P ( X ) ≔ { ( y , S ) ∈ ℜ l × ℜ m × n ∣ ( X , y , S ) satisfies ( 4.2 ) } . We write ℳ D ( y , w , S ) the set of Lagrangian multipliers respect to ( y , w , S ) for problem (4.1), i.e., ℳ D ( y , w , S ) ≔ { X ∈ ℜ m × n ∣ ( y , w , S , X ) satisfies ( 4.3 ) } .

Let ( y , z ) ∈ ℜ l × ℜ l satisfies P ∘ ∋ y ⊥ z ∈ P . Define the critical cone of P at z for y as C P ( z , y ) ≔ T P ( z ) ∩ y ⊥ and the critical cone of P ∘ at y for z as C P ∘ ( y , z ) ≔ T P ∘ ( y ) ∩ z ⊥ , respectively. Obviously,

(4.4) ( C P ( z , y ) ) ∘ = C P ∘ ( y , z ) .

Now we are ready to demonstrate the crucial result of this article, namely, the relationship between the primal SRCQ and the dual SOSC. Actually, this result was given in [16, Theorem 5.2], but they omitted the proof details. For the sake of completeness, we supplement the proof below.

Theorem 1

Let ( y ¯ , w ¯ , S ¯ ) ∈ ℜ l × ℜ d × ℜ m × n be an optimal solution of problem (4.1) with ℳ D ( y ¯ , w ¯ , S ¯ ) ≠ ∅ . Let X ¯ ∈ ℳ D ( y ¯ , w ¯ , S ¯ ) . Then the following results are equivalent:

The dual SOSC holds at ( y ¯ , w ¯ , S ¯ ) with respect to X ¯ , i.e.,
(4.5) ⟨ H w , ( ∇ h * ) ′ ( w ¯ ; H w ) ⟩ − φ ( S ¯ , H S ) * ( X ¯ ) > 0 , ∀ ( H y , H w , H S ) ∈ C ( y ¯ , w ¯ , S ¯ ) \ { 0 } ,
where C ( y ¯ , w ¯ , S ¯ ) is the critical cone defined as follows:
C ( y ¯ , w ¯ , S ¯ ) ≔ ( D y , D w , D S ) ∈ ℜ l × ℜ d × ℜ m × n ∣ ℬ * D y + Q * D w + D S = 0 , D y ∈ C P ∘ ( y ¯ , ℬ X ¯ − b ) , D S ∈ C θ * ( S ¯ , X ¯ ) .
The primal SRCQ holds at X ¯ with respect to ( y ¯ , S ¯ ) , i.e.,
(4.6) ℬ ℐ ℜ m × n ℜ m × n + C P ( ℬ X ¯ − b , y ¯ ) C θ ( X ¯ , S ¯ ) = ℜ l ℜ m × n .

Proof

“ ( i ) ⇒ ( i i ) ”. Denote

Ψ ≔ ℬ ℐ ℜ m × n ℜ m × n + C P ( ℬ X ¯ − b , y ¯ ) C θ ( X ¯ , S ¯ ) .

Suppose on the contrary that the SRCQ (4.6) does not hold at X ¯ for ( y ¯ , S ¯ ) , i.e., Ψ ≠ ℜ l ℜ m × n . Then cl Ψ ≠ ℜ l ℜ m × n . Therefore, there exists H ¯ ∈ ℜ l ℜ m × n s.t. H ¯ ∉ cl Ψ . Since cl Ψ is a closed convex cone, H ¯ − Π cl Ψ ( H ¯ ) = Π ( cl Ψ ) ∘ ( H ¯ ) ≠ 0 . Denote H ≔ Π ( cl Ψ ) ∘ ( H ¯ ) = H 1 H 2 ∈ ℜ l ℜ m × n . Obviously, ⟨ H , D ⟩ ≤ 0 for any D ∈ cl Ψ , which implies that

ℬ * H 1 = 0 , H 1 ∈ C P ∘ ( y ¯ , ℬ X ¯ − b ) , H 2 = 0 .

Thus, we have that ( H 1 , 0,0 ) ∈ C ( y ¯ , w ¯ , S ¯ ) \ { 0 } and ⟨ 0 , ( ∇ h * ) ′ ( w ¯ ; 0 ) ⟩ − φ ( S ¯ , 0 ) * ( X ¯ ) = 0 , which contradicts the SOSC (4.5) at ( y ¯ , w ¯ , S ¯ ) with respect to X ¯ .

“ ( i i ) ⇒ ( i ) ” Suppose on the contrary that the SOSC (4.5) does not hold at ( y ¯ , w ¯ , S ¯ ) for X ¯ . Then, ∃ H ¯ = ( H ¯ y , H ¯ w , H ¯ S ) ∈ C ( y ¯ , w ¯ , S ¯ ) \ { 0 } such that

⟨ H ¯ w , ( ∇ h * ) ′ ( w ¯ ; H ¯ w ) ⟩ − φ ( S ¯ , H ¯ S ) * ( X ¯ ) = 0 .

From [25, Proposition 16], we know that φ ( S ¯ , H ) * ( X ¯ ) ≤ 0 for any H ∈ ℜ m × n . Moreover, because h is essentially strictly convex, by [20, Theorem 26.3], h * is essentially smooth. Then, ∇ h * is locally Lipschitz continuous and directionally differentiable on int ( dom h * ) . Since ( y ¯ , w ¯ , S ¯ ) is the optimal solution, it is not hard to verify by the mean value theorem that ⟨ H w , ( ∇ h * ) ′ ( w ¯ ; H w ) ⟩ > 0 for any H w ≠ 0 such that ( H y , H w , H S ) ∈ C ( y ¯ , w ¯ , S ¯ ) \ { 0 } . Therefore,

H ¯ w = 0 , φ ( S ¯ , H ¯ S ) * ( X ¯ ) = 0 .

From H ¯ ∈ C ( y ¯ , w ¯ , S ¯ ) \ { 0 } , we know that

ℬ * H ¯ y + H ¯ S = 0 , H ¯ y ∈ C P ∘ ( y ¯ , ℬ X ¯ − b ) , H ¯ S ∈ C θ * ( S ¯ , X ¯ ) .

Together with (4.4) and Proposition 2, we have

H ¯ y ∈ ( C P ( ℬ X ¯ − b , y ¯ ) ) ∘ , H ¯ S ∈ ( C θ ( X ¯ , S ¯ ) ) ∘ .

Because the SRCQ (4.6) holds at X ¯ for ( y ¯ , S ¯ ) , there exist W ˜ ∈ ℜ m × n and H ˜ = H ˜ 1 H ˜ 2 ∈ C P ( ℬ X ¯ − b ) C θ ( X ¯ , S ¯ ) such that

ℬ ℐ ℜ m × n W ˜ + H ˜ 1 H ˜ 2 = H ¯ y H ¯ S ,

i.e.,

ℬ W ˜ + H ˜ 1 = H ¯ y , W ˜ + H ˜ 2 = H ¯ S .

Thus,

⟨ H ¯ , H ¯ ⟩ = ⟨ H ¯ y , ℬ W ˜ + H ˜ 1 ⟩ + ⟨ H ¯ S , W ˜ + H ˜ 2 ⟩ = ⟨ ℬ * H ¯ y + H ¯ S , W ˜ ⟩ + ⟨ H ¯ y , H ˜ 1 ⟩ + ⟨ H ¯ S , H ˜ 2 ⟩ = 0 + ⟨ H ¯ y , H ˜ 1 ⟩ + ⟨ H ¯ S , H ˜ 2 ⟩ ≤ 0 ,

which means that H ¯ = 0 . This contradicts the hypothesis that H ¯ ≠ 0 .□

Switching the “primal” and “dual” in the aforementioned theorem, we can also obtain that the primal SOSC is equivalent to the dual SRCQ from Proposition 2. One may refer to [16, Theorem 5.1] for more details.

Theorem 2

Let X ¯ ∈ ℜ m × n be an optimal solution of problem (1.1) with ℳ P ( X ¯ ) ≠ ∅ . Let ( y ¯ , S ¯ ) ∈ ℳ P ( X ¯ ) . Then the following results are equivalent:

The dual SRCQ holds at y ¯ for X ¯ , i.e.,
(4.7) Q * ℜ d + ℬ * C P ∘ ( y ¯ , ℬ X ¯ − b ) + C θ * ( S ¯ , X ¯ ) = ℜ m × n .
The primal SOSC holds at X ¯ for ( y ¯ , S ¯ ) , i.e.,
(4.8) ⟨ Q D , ∇ 2 h ( Q X ¯ ) Q D ⟩ − ψ ( X ¯ , D ) * ( S ¯ ) > 0 , ∀ D ∈ C ( X ¯ ) \ { 0 } ,
where C ( X ¯ ) ≔ { D ∈ ℜ m × n ∣ ℬ D ∈ C P ( ℬ X ¯ − b , y ¯ ) , D ∈ C θ ( X ¯ , S ¯ ) } .

In the following, we first give the definition of the KKT mapping for problem (1.1). Then, we demonstrate that both the SOSC and the SRCQ are sufficient and necessary for the ICKKTM. Finally, based on the aforementioned result, we give a series of characterizations of the robust ICKKTM for problem (1.1).

The KKT conditions (4.2) for problem (1.1) can be equivalently expressed as the nonsmooth equation:

F ( X , y , S ) = 0 ,

where F : ℜ m × n × ℜ l × ℜ m × n → ℜ m × n × ℜ l × ℜ m × n is defined by

F ( X , y , S ) ≔ C + ℬ * y + Q * ∇ h ( Q X ) + S ℬ X − b − Π Q ( ℬ X − b + y ) X − Prox θ ( X + S ) , ( X , y , S ) ∈ ℜ m × n × ℜ l × ℜ m × n .

Denote the KKT mapping for problem (1.1) by

S KKT ( δ ) ≔ { ( X , y , S ) ∈ ℜ m × n × ℜ l × ℜ m × n ∣ F ( X , y , S ) = δ } .

Thus, we know from [13] that the isolated calmness of S KKT at the origin for ( X ¯ , y ¯ , S ¯ ) is equivalent to F ′ ( ( X ¯ , y ¯ , S ¯ ) ; ( Δ Z , Δ y , Δ S ) ) = 0 ⇒ ( Δ Z , Δ y , Δ S ) = 0 .

Define the strong SRCQ (SSRCQ) at X ¯ for ( y ¯ , S ¯ ) as follows:

(4.9) ℬ ℐ ℜ m × n ℜ m × n + C P ( ℬ X ¯ − b , y ¯ ) C θ 0 ( X ¯ , S ¯ ) = ℜ l ℜ m × n .

Similarly, define the weak SOSC (WSOSC) holds at X ¯ for ( y ¯ , S ¯ ) , i.e.,

(4.10) ⟨ Q D , ∇ 2 h ( Q X ¯ ) Q D ⟩ − ψ ( X ¯ , D ) * ( S ¯ ) > 0 , ∀ D ∈ C 0 ( X ¯ ) \ { 0 } ,

where C 0 ( X ¯ ) ≔ { D ∈ ℜ m × n ∣ ℬ D ∈ C P ( ℬ X ¯ − b , y ¯ ) , D ∈ C θ 0 ( X ¯ , S ¯ ) } .

Theorem 3

Let X ¯ ∈ ℜ m × n be an optimal solution of problem (1.1) with ( y ¯ , S ¯ ) ∈ ℳ P ( X ¯ ) ≠ ∅ . Then the following results are equivalent:

The SSRCQ (4.9) holds at X ¯ for ( y ¯ , S ¯ ) and the WSOSC (4.10) holds at X ¯ for ( y ¯ , S ¯ ) .
The KKT mapping S KKT is isolated calm at the origin for ( X ¯ , y ¯ , S ¯ ) .

Proof

“ ( i ) ⇒ ( i i ) ”. Let ( Δ Z , Δ y , Δ S ) ∈ ℜ m × n × ℜ l × ℜ m × n satisfies

(4.11) F ′ ( ( X ¯ , y ¯ , S ¯ ) ; ( Δ Z , Δ y , Δ S ) ) = Q * ∇ 2 h ( Q X ¯ ) Q Δ Z + ℬ * Δ y + Δ S ℬ Δ Z − Π Q ′ ( ℬ X ¯ − b + y ¯ ; ℬ Δ Z + Δ y ) Δ Z − Prox θ ′ ( X ¯ + S ¯ ; Δ Z + Δ S ) = 0 .

From the second equation of (4.11) and [9, Lemma 4.2], we know that

(4.12) ℬ Δ Z ∈ C Q ( ℬ X ¯ − b , y ¯ ) , ⟨ Δ y , ℬ Δ Z ⟩ = 0 .

Proposition 3 and the third equation of (4.11) means that

(4.13) Δ Z ∈ C θ 0 ( X ¯ , S ¯ ) , ⟨ Δ Z , Δ S ⟩ = − ψ ( X ¯ , Δ Z ) * ( S ¯ ) .

Thus, we can conclude from (4.12) and (4.13) that

Δ Z ∈ C 0 ( X ¯ ) .

By taking the inner product of Δ Z and the first equation of (4.11), together with (4.12) and (4.13), we can obtain

⟨ Δ Z , Q * ∇ 2 h ( Q X ¯ ) Q Δ Z + ℬ * Δ y + Δ S ⟩ = ⟨ Δ Z , Q * ∇ 2 h ( Q X ¯ ) Q Δ Z ⟩ + ⟨ Δ Z , ℬ * Δ y ⟩ + ⟨ Δ Z , Δ S ⟩ = ⟨ Q Δ Z , ∇ 2 h ( Q X ¯ ) Q Δ Z ⟩ + ⟨ ℬ Δ Z , Δ y ⟩ + ⟨ Δ Z , Δ S ⟩ = ⟨ Q Δ Z , ∇ 2 h ( Q X ¯ ) Q Δ Z ⟩ − ψ ( X ¯ , Δ Z ) * ( S ¯ ) = 0 .

Then, from the WSOSC (4.10), we know that

Δ Z = 0 .

Therefore, (4.11) reduces to

(4.14) ℬ * Δ y + Δ S Π Q ′ ( ℬ X ¯ − b + y ¯ ; Δ y ) Prox θ ′ ( X ¯ + S ¯ ; Δ S ) = 0 .

From Lemma 4.3 in [9] and Proposition 4, we obtain that (4.14) is equivalent to the following formula:

(4.15) Δ y Δ S ∈ ℬ ℐ ℜ m × n ℜ m × n + C P ( ℬ X ¯ − b , y ¯ ) C θ 0 ( X ¯ , S ¯ ) ∘ .

According to the SSRCQ (4.9),

( Δ y , Δ S ) = 0 .

This, together with Δ Z = 0 , imply that the KKT mapping is isolated calm at the origin for ( X ¯ , y ¯ , S ¯ ) .

We prove “ ( i i ) ⇒ ( i ) ” by contradiction. Assume that the SSRCQ (4.9) does not hold at X ¯ for ( y ¯ , S ¯ ) . Then ∃ ( Δ y , Δ S ) ≠ 0 s.t. (4.15) holds, which also means that (4.14) holds. Hence, F ′ ( ( X ¯ , y ¯ , S ¯ ) ; ( 0 , Δ y , Δ S ) ) = 0 . This contradicts with the ICKKTM at the origin for ( X ¯ , y ¯ , S ¯ ) .

Because X ¯ is an optimal solution of problem (1.1) and the SSRCQ (4.9) holds at X ¯ for ( y ¯ , S ¯ ) , the Lagrangian multipliers ℳ P ( X ¯ ) is a singleton and the second-order necessary condition holds at X ¯ , i.e.,

⟨ Q D , ∇ 2 h ( Q X ¯ ) Q D ⟩ − ψ ( X ¯ , D ) * ( S ¯ ) ≥ 0 , ∀ D ∈ C 0 ( X ¯ ) .

Suppose that the WSOSC (4.10) does not hold at X ¯ for ( y ¯ , S ¯ ) , which means that ∃ Δ Z ′ ∈ C 0 ( X ¯ ) \ { 0 } such that

⟨ Q Δ Z ′ , ∇ 2 h ( Q X ¯ ) Q Δ Z ′ ⟩ − ψ ( X ¯ , Δ Z ′ ) * ( S ¯ ) = 0 .

By [25, Proposition 16], ∀ H ∈ ℜ m × n , ψ ( X ¯ , H ) * ( S ¯ ) ≤ 0 . Besides, since h is essentially strictly convex, ⟨ Q H , ∇ 2 h ( Q X ¯ ) Q H ⟩ > 0 for any H ∈ ℜ m × n satisfies Q H ≠ 0 . Consequently,

Q Δ Z ′ = 0 , ψ ( X ¯ , Δ Z ′ ) * ( S ¯ ) = 0 .

From Δ Z ′ ∈ C 0 ( X ¯ ) \ { 0 } , we know that

(4.16) ℬ Δ Z ′ ∈ C P ( ℬ X ¯ − b , y ¯ ) , Δ Z ′ ∈ C θ 0 ( X ¯ , S ¯ ) .

Thus, from [9, Lemma 4.2], the first formula in (4.16) is equivalent to

(4.17) ℬ Δ Z ′ = Π Q ′ ( ℬ X ¯ − b + y ¯ ; ℬ Δ Z ′ ) .

We can conclude from the second formula in (4.16) and (3.12) that

(4.18) Δ Z ′ = Prox θ ′ ( X ¯ + S ¯ ; Δ Z ′ ) .

Therefore, formulas (4.17), (4.18), and Q Δ Z ′ = 0 imply that

F ′ ( ( X ¯ , y ¯ , S ¯ ) ; ( Δ Z ′ , 0,0 ) ) = 0 .

Notice that Δ Z ′ ≠ 0 , which contradicts with the isolated calmness of S KKT at the origin for ( X ¯ , y ¯ , S ¯ ) . That is, the WSOSC (4.10) holds at X ¯ for ( y ¯ , S ¯ ) . This completes the proof.□

It is said that the Robinson constraint qualification (RCQ) holds at X ¯ ∈ ℜ m × n for problem (1.1) if

(4.19) ℬ ℜ m × n + T P ( ℬ X ¯ − b ) = ℜ l .

Finally, we can establish a series of equivalent conditions of the robust ICKKTM for problem (1.1).

Theorem 4

Let X ¯ ∈ ℜ m × n and ( y ¯ , w ¯ , S ¯ ) ∈ ℜ l × ℜ d × ℜ m × n be optimal solutions of problems (1.1) and (4.1), respectively. Assume that ( y ¯ , S ¯ ) ∈ ℳ P ( X ¯ ) and X ¯ ∈ ℳ D ( y ¯ , w ¯ , S ¯ ) , and the RCQ (4.19) holds at X ¯ . Then the following statements are equivalent:

The KKT mapping S KKT is (robustly) isolated calm at the origin for ( X ¯ , y ¯ , S ¯ ) .
The primal WSOSC (4.10) holds at X ¯ for ( y ¯ , S ¯ ) and the primal SSRCQ (4.9) holds at X ¯ for ( y ¯ , S ¯ ) .
The primal SOSC (4.8) holds at X ¯ for ( y ¯ , S ¯ ) and the primal SRCQ (4.6) holds at X ¯ for ( y ¯ , S ¯ ) .
The primal SOSC (4.8) holds at X ¯ for ( y ¯ , S ¯ ) and the dual SOSC (4.5) holds at ( y ¯ , w ¯ , S ¯ ) for X ¯ .
The dual SRCQ (4.7) holds at ( y ¯ , w ¯ , S ¯ ) for X ¯ and the dual SOSC (4.5) holds at ( y ¯ , w ¯ , S ¯ ) for X ¯ .
The dual SRCQ (4.7) holds at ( y ¯ , w ¯ , S ¯ ) for X ¯ and the primal SRCQ (4.6) holds at X ¯ for ( y ¯ , S ¯ ) .

Proof

Since the spectral norm function ‖ ⋅ ‖ 2 is C 2 -cone reducible, the “robustly” in (a) of the Theorem 4, which means that S KKT is also lower semicontinuous at the origin for ( X ¯ , y ¯ , S ¯ ) , can be obtained automatically from [13, Theorem 17]. Besides, since the spectral norm function is Lipschitz continuous at X ¯ , by combining Theorems 1, 2, 3, [13, Theorem 17], and [16, Proposition 3.3], we can obtain that ( a ) ⇔ ( b ) ⇔ ( c ) ⇔ ( d ) ⇔ ( e ) ⇔ ( f ) .□

Remark 2

Proposition 3.3 in [16] is obtained indirectly by transforming problem (1.1) into a matrix conic optimization induced by nuclear norm and then applying [13, Theorem 24]. In the aforementioned theorem, we give a direct proof. The difference is that we need to use the variational properties of the directional derivatives of the proximal mapping of spectral norm, while [13] uses the projection mapping.

5 Conclusion

This article is devoted to establishing a series of equivalent conditions of the robust ICKKTM for spectral norm regularized convex matrix optimization problems. We extend the results for nuclear norm regularized convex optimization problems in [16] to spectral norm regularized convex optimization problems. There is still much work to be done, such as how to apply the variational properties of proximal mapping of spectral norm function to directly prove the equivalence between (a) and (c) in Theorem 4, whether the conclusions in this article or in [16] can be extended to the Ky Fan k -norm regularized problems, or whether the variational properties of the spectral norm function obtained in this article can be applied to provide stability results for the generalized variational inequality problem with the spectral norm function. These are all the topics we will study in the future.

Acknowledgments

The authors are very thankful to the referees for their valuable comments, which have improved the presentation of this manuscript.

Funding information: This work was supported by the National Natural Science Foundation of China (Grant numbers 12101101); the China Postdoctoral Science Foundation (Grant number 2022M720631); and the Fundamental Research Funds for the Central Universities (Grant number 3132025206).
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and consented to its submission to the journal, reviewed all the results, and approved the final version of the manuscript. ZY prepared the manuscript with contributions from all co-authors. All authors contributed to the review and editing of this article.
Conflict of interest: The authors state no conflict of interests.

References

[1] G. A. Watson, On matrix approximation problems with Ky Fan k norms, Numer. Algorithms 5 (1993), 263–272, DOI: https://doi.org/10.1007/BF02210386. 10.1007/BF02210386Suche in Google Scholar

[2] K. C. Toh and L. N. Trefethen, The Chebyshev polynomials of a matrix, SIAM J. Matrix Anal. Appl. 20 (1998), 400–419, DOI: https://doi.org/10.1137/S0895479896303739. 10.1137/S0895479896303739Suche in Google Scholar

[3] P. Apkarian and D. Noll, Nonsmooth H∞ synthesis, IEEE Trans. Autom. Control 51 (2006), 71–86. 10.1109/TAC.2005.860290Suche in Google Scholar

[4] V. Bompart, D. Noll, and P. Apkarian, Second-order nonsmooth optimization for H∞ synthesis, Numer. Math. 107 (2007), 433–454, DOI: https://doi.org/10.1007/s00211-007-0095-9. 10.1007/s00211-007-0095-9Suche in Google Scholar

[5] J. Yuan, D. Yang, D. Xun, K. Teo, C. Wu, A. Li, et al., Sparse optimal control of cyber-physical systems via PQA approach, Pac. J. Optim. 21 (2025), no. 3, 553–569, DOI: https://doi.org/10.61208/pjo-2025-027. 10.61208/pjo-2025-027Suche in Google Scholar

[6] A. Johansson, N. Engsner, C. Strannegård, and P. Mostad, Improved spectral norm regularization for neural networks, in: Modeling Decisions for Artificial Intelligence: 20th International Conference, MDAI 2023, Umeå, Sweden, June 19–22, 2023, Proceedings, Springer-Verlag, Berlin, Heidelberg, pp. 181–201, DOI: https://doi.org/10.1007/978-3-031-33498-6_13. 10.1007/978-3-031-33498-6_13Suche in Google Scholar

[7] Y. Yoshida and M. Takeru, Spectral norm regularization for improving the generalizability of deep learning, arXiv:1705.10941 (2017). Suche in Google Scholar

[8] R. Gao, W. Yang, and X. Sun, Defying forgetting in continual relation extraction via batch spectral norm regularization, 2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8, DOI: https://doi.org/10.1109/IJCNN60899.2024.10651110. 10.1109/IJCNN60899.2024.10651110Suche in Google Scholar

[9] D. R. Han, D. F. Sun, and L. W. Zhang, Linear rate convergence of the alternating direction method of multipliers for convex composite programming, Math. Oper. Res. 43 (2018), 622–637, DOI: https://doi.org/10.1287/moor.2017.0875. 10.1287/moor.2017.0875Suche in Google Scholar

[10] R. T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Math. Oper. Res. 1 (1976), 97–116, DOI: https://doi.org/10.1287/moor.1.2.97. 10.1287/moor.1.2.97Suche in Google Scholar

[11] Y. L. Zhang and L. W. Zhang, On the upper Lipschitz property of the KKT mapping for nonlinear semidefinite optimization, Oper. Res. Lett. 44 (2016), 474–478, DOI: https://doi.org/10.1016/j.orl.2016.04.012. 10.1016/j.orl.2016.04.012Suche in Google Scholar

[12] Y. Zhang, L. W. Zhang, J. Wu, and K. Wang, Characterizations of local upper Lipschitz property of perturbed solutions to nonlinear second-order cone programs, Optimization 66 (2017), 1079–1103, DOI: https://doi.org/10.1080/02331934.2017.1325887. 10.1080/02331934.2017.1325887Suche in Google Scholar

[13] C. Ding, D. F. Sun, and L. W. Zhang, Characterization of the robust isolated calmness for a class of conic programming problems, SIAM J. Optim. 5 (2017), 67–90, DOI: https://doi.org/10.1137/16M1058753. 10.1137/16M1058753Suche in Google Scholar

[14] Z. X. Chan and D. F. Sun, Constraint nondegeneracy, strong regularity, and nonsingularity in semidefinite programming, SIAM J. Optim. 19 (2008), 370–396, DOI: https://doi.org/10.1137/070681235. 10.1137/070681235Suche in Google Scholar

[15] D. R. Han, D. F. Sun, and L. W. Zhang, Linear rate convergence of the alternating direction method of multipliers for convex composite quadratic and semi-definite programming, arXiv:1508.02134 (2015).Suche in Google Scholar

[16] Y. Cui and D. F. Sun, A complete characterization of the robust isolated calmness of nuclear norm regularized convex optimization problems, J. Comput. Math. 36 (2018), 441–458, DOI: https://doi.org/10.4208/jcm.1709-m2017-0034. 10.4208/jcm.1709-m2017-0034Suche in Google Scholar

[17] R. T. Rockafellar and R. J. B. Wets, Variational Analysis, Berlin, Heidelberg, 1998, DOI: https://doi.org/10.1007/978-3-642-02431-3. 10.1007/978-3-642-02431-3Suche in Google Scholar

[18] J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problems, Springer, New York, 2000. 10.1007/978-1-4612-1394-9Suche in Google Scholar

[19] K. Fan, On a theorem of Weyl concerning eigenvalues of linear transformations, Proc. Natl. Acad. Sci. USA 35 (1949), 652–655, DOI: https://doi.org/10.1073/pnas.36.1.31. 10.1073/pnas.35.11.652Suche in Google Scholar PubMed PubMed Central

[20] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970. Suche in Google Scholar

[21] A. Beck, First-Order Methods in Optimization, Society for Industrial and Applied Mathematics, Philadelphia, 2017, DOI: https://doi.org/10.1137/1.9781611974997. 10.1137/1.9781611974997Suche in Google Scholar

[22] G. A. Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra Appl. 170 (1992), 33–45, DOI: https://doi.org/10.1016/0024-3795(92)90407-2. 10.1016/0024-3795(92)90407-2Suche in Google Scholar

[23] C. Ding, D. F. Sun, J. Sun, and K.-C. Toh, Spectral operators of matrices, Math. Program. 168 (2018), 509–531, DOI: https://doi.org/10.1007/s10107-017-1162-3. 10.1007/s10107-017-1162-3Suche in Google Scholar

[24] L. W. Zhang, N. Zhang, and X. T. Xiao, On the second-order directional derivatives of singular values of matrices and symmetric matrix-valued functions, Set-Valued Var. Anal. 21 (2013), 557–586, DOI: https://doi.org/10.1007/s11228-013-0237-4. 10.1007/s11228-013-0237-4Suche in Google Scholar

[25] C. Ding, Variational analysis of the Ky Fan k-norm, Set-Valued Var. Anal. 25 (2017), 265–296, DOI: https://doi.org/10.1007/s11228-016-0378-3. 10.1007/s11228-016-0378-3Suche in Google Scholar

Received: 2024-11-15

Revised: 2025-07-02

Accepted: 2025-07-06

Published Online: 2025-08-11

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/math-2025-0189

Schlagwörter für diesen Artikel

isolated calmness; second-order sufficient condition; spectral norm; strict Robinson constraint qualification; critical cone

Creative Commons

BY 4.0