Home Mathematics An ADMM-based heuristic algorithm for optimization problems over nonconvex second-order cone
Article Open Access

An ADMM-based heuristic algorithm for optimization problems over nonconvex second-order cone

  • Baha Alzalg EMAIL logo and Lilia Benakkouche
Published/Copyright: September 24, 2025
Become an author with De Gruyter Brill

Abstract

The nonconvex second-order cone (nonconvex SOC) is a nonconvex extension to the convex second-order cone, in the sense that it consists of any vector divided into two sub-vectors for which the Euclidean norm of the first sub-vector is at least as large as the Euclidean norm of the second sub-vector. This cone can be used to reformulate nonconvex quadratic programs in conic format and can arise in real-world applications. In an attempt to obtain an approximate solution for optimization problems over the nonconvex SOC, in this article, we use a heuristic algorithm based on the alternating direction method of multipliers to solve them, which is the core result of our study. More specifically, the approach is built in two steps: a convex optimization problem comes first, followed by a nonconvex conic optimization. The problem in the second phase can lead to an inexact solution. Our strategy makes use of an approximate projection onto the nonconvex cone. The question of convergence remains open.

MSC 2010: 90C26; 30F45; 52A27; 41A50; 90C59

1 Introduction

Nonconvex optimization [15] is the study of the optimization problem in which the convexity is not satisfied in at least one of the constraints or the objective function. Branching, which divides the feasible region into smaller parts and solves subproblems across these parts, and convex relaxations are common solution techniques employed to handle nonconvex programs. Heuristics [5], such as randomized techniques, can be utilized to find good, workable solutions, but they do not offer lower bounds on the objective value and do not, therefore, demonstrate optimality. By heuristic, we mean that the method does not realize an optimal solution, or, in fact, even a feasible solution if one exists. A heuristic is significantly quicker but does not guarantee an optimal solution.

The alternating direction method of multipliers (ADMM) is a well-known and classic approach in the optimization community [69], and it dates back to the 1970s. Due to its simplicity of use and successful empirical application to a variety of situations, it has gained popularity in a short time. The ADMM has received a great deal of attention and has been broadly considered to minimize the augmented Lagrangian function for optimization problems. This technique divides the variables into several blocks based on their functions, and then, by fixing the other blocks at each iteration, the augmented Lagrangian function is minimized with regard to each block. The approach was initially addressed as an iterative method for handling convex minimization problems using parallelization [10] and was also developed for distributed processing [11]. This algorithm shares many similarities with other well-known algorithms from the literature, including Bregman’s iterative algorithm [12] and Dykstra’s alternating projection approach [13]. ADMM can be seen as an effort to combine the advantages of dual decomposition and augmented Lagrangian strategies for constrained optimization. For review, historical information, and references on ADMM, we refer to Boyd et al. [11].

The success of the ADMM in solving convex programming problems led to its extension to nonconvex programming problems [1416]. At first glance, applying ADMM to nonconvex programming problems would appear to be a round rejection of the convexity assumptions that underpin ADMM’s derivation. But in fact, ADMM often proves to be a potent heuristic method, even for NP-hard nonconvex problems [17]. The idea of employing ADMM as an initiative heuristic to resolve nonconvex problems was discussed in [11, Chapter 9]. Examples of the application of ADMM to nonconvex problems include phase retrieval [18], matrix completion and separation [19,20], optimal power flow [14], tensor factorization [21], conformal mapping construction, directional field correction, and color image restoration [22]. The objective function in these applications may be nonconvex, nonsmooth, or both.

In this article, we are interested in applying the ADMM to solve an optimization problem over the intersection of both the nonconvex second-order cone (nonconvex SOC) and the nonnegative orthant cone R + m + n . The ( m + n ) th-dimensional nonconvex SOC is defined as [23]

(1) R + ( m , n ) x = x ˆ x ¯ R m × R n : x ˆ x ¯ .

Clearly, the nonconvex SOC generalizes the convex SOC

+ n + 1 x = x 1 x ¯ R × R n : x 1 x ¯ = R + ( 1 , n ) .

We write R + + ( m , n ) to mean the cone R + ( m , n ) R + m + n .

The nonconvex cone R + ( m , n ) can be used to reformulate nonconvex quadratic programming and nonconvex quadratically constrained quadratic programming in conic format. This cone can also arise in real-world applications, such as facility location problems and Voronoi diagram problems, or any applications when there is a necessity for one distance to be greater than or equal to another distance, or multiple distances [23]. Alzalg and Benakkouche [23,24] studied the algebraic structure of the nonconvex SOCs and their associated functions and inequalities. More specifically, they extended numerous algebraic properties that already exist in the framework of the convex SOC + n + 1 to the framework of the nonconvex SOC R + ( m , n ) .

Projection is a fundamental concept in geometry but also has many applications outside this specialty; it is indeed an essential notion in numerous disciplines. The topic of projection is of great interest because it has a variety of implementations in both pure and applied mathematics, including optimization (see, for example, previous studies [2527]). Projecting onto the nonconvex SOC is an underpinning idea and a critical instrument for solving optimization problems over this cone using an ADMM. A nonconvex SOC is a nonconvex set, and it is this that complicates the problem. Mainly, notwithstanding that the body is closed and, therefore, projection is well assured conceptually and theoretically, but unsolved (except for some special problems [28]), and the most complicated one is how to compute projection onto this collection. The projection onto the nonconvex SOC is challenging. We suggest an approximate formula for projecting onto this set. Anyway, the obtained points are not necessarily projected points; they are elements of the set and also have several excellent properties. For some nonconvex sets, the projection problem is, however, rather simple. The set of matrices of some fixed rank is an excellent illustration and a prime example of a nonconvex set that is simple to project onto and amounts to explicit or conventional computations; given a matrix’s singular value decomposition, it is easy to project onto this set.

This article is structured as follows: Section 2 introduces some notations and recaps the algebraic structure related to the nonconvex SOC discussed in the study of Alzalg and Benakkouche [23] and used in this article. The ADMM heavily relies on projections, and Section 3 contains general projection properties that are applied throughout the study. An inexact projection onto the nonconvex SOC and some results about it are also provided. In Section 4, the ADMM-based heuristic algorithm for our problem setting is described. The final conclusions are drawn and discussed in Section 5.

2 Review of the algebraic structure of the nonconvex SOC

In this section, after introducing some notations, we review a few basic results related to the algebraic structure of the ambient space of the nonconvex SOC.

2.1 Notations

This section is devoted to introducing the notations and terminology that we use throughout this article. We bring to the attention of the readers that our notations closely follow those of the study of Alzalg and Benakkouche [23]. In this article, we will look at a Euclidean vector space (in plain English, we mean a finite-dimensional real vector space equipped with a standard inner product symbolized by , and the induced norm expressed by ).

Lowercase symbols, such as x , are always used to represent scalars, lowercase boldface characters, such as x , are always employed to denote vectors, and uppercase characters, such as X , are always used to indicate matrices. We designate a zero vector and a vector of all ones with appropriate dimensions by 0 and 1 , respectively, and zero and identity matrices with the relevant sizes by O and I . When joining matrices and vectors in a row, we use a comma “,”, and when joining them in a column, we use a semicolon “;”. Therefore, for the column vectors x and y , we have ( x ; y ) = ( x T , y T ) T .

The Euclidean inner product in R n is defined by the mapping

R n × R n ( x , y ) x , y x T y R ,

and the related norm is defined by

R n x x x T x R .

Let m and n be positive integers. For each vector x R m + n , we denote by x ˆ the sub-vector starting from the first entry to the m th entry and denote by x ¯ the sub-vector of x starting from the ( m + 1 ) st entry to the ( m + n ) th entry; therefore, x = ( x ˆ ; x ¯ ) R m × R n . By R ( m , n ) , we mean the ( m + n ) -dimensional real vector space R m × R n equipped with a standard inner product, i.e., R ( m , n ) { x = ( x ˆ ; x ¯ ) : x ˆ R m , x ¯ R n } . Therefore, the nonconvex SOC defined in (1) is given by R + ( m , n ) = { x R ( m , n ) : x ˆ x ¯ } . The set int R + ( m , n ) { x R ( m , n ) : x ˆ > x ¯ } represents the interior of the nonconvex SOC.

We denote by R + n { x R n : x i 0 , i = 1 , , n } the nonnegative orthant. We mean by x 0 that x R + n . By x y , we mean x i y i , i = 1 , , n . For a real number α R , we express α + max { α , 0 } . For a vector x R n , we denote x + ( x 1 + ; ; x n + ) , and x ( x 1 ; ; x n ) , i.e., x is the vector x with every i th coordinate being replaced by x i . The dual of a cone K is denoted by K . Whenever convenient, we use P C ( x ) , or simply p x , to symbolize that p x is a projection of x onto C . We highlight that p x P C ( x ) only states that p x is a projection of x onto C and does not require any affirmation concerning uniqueness.

Let x R ( m , n ) . If x ˆ = 0 , the element x ˆ x ˆ is considered to be any vector in R m of Euclidean norm one. Similarly, if x ¯ = 0 , the element x ¯ x ¯ is regarded to be any vector in R n of Euclidean norm one. For x R , we define x x sgn ( x ) , where sgn ( x ) 1 if x > 0 , sgn ( x ) 1 if x < 0 , and sgn ( x ) 0 if x = 0 .

2.2 Algebraic structure of the nonconvex SOC

In this subsection, we briefly recall some key results that are useful for the rest of our work. We will not seek to be exhaustive. Developments on the subject can be found in the study of Alzalg and Benakkouche [23].

In R ( m , n ) , each vector x is associated with a crane-shaped matrix, Crn ( x ) , which is defined as

Crn ( x ) x ˆ I m x ˆ x ˆ x ¯ T x ¯ x ˆ x ˆ T x ˆ I n .

Note that the symmetric matrix Crn ( x ) is positive definite (and hence invertible) if and only if x int R + ( m , n ) . It is not hard to verify that 1 2 ( Crn ( x ) y + Crn ( y ) x ) = x y x , y R ( m , n ) , where

(2) x y 1 2 ( x ˆ y ˆ + x ¯ T y ¯ ) x ˆ x ˆ + y ˆ y ˆ x ˆ + x ˆ T y ˆ y ˆ y ¯ + y ˆ + y ˆ T x ˆ x ˆ x ¯ .

The product : R ( m , n ) × R ( m , n ) R ( m , n ) defined in (2) is not bilinear, and consequently, the structure ( R ( m , n ) , ) does not form an algebra. The structure ( R ( m , n ) , ) is a power-associative and commutative space, i.e., x p x q = x p + q for any positive integers p and q and x y = y x , for any x , y R ( m , n ) .

The fact that we can specify the eigen-decomposition of any element x R ( m , n ) makes the aforementioned structure ( R ( m , n ) , ) so appealing. Associated with the nonconvex SOC R + ( m , n ) , each x R ( m , n ) can be factorized as

(3) x λ 1 ( x ) c 1 ( x ) + λ 2 ( x ) c 2 ( x ) ,

for i = 1 , 2 , λ i ( x ) x ˆ + ( 1 ) i + 1 x ¯ are called the eigenvalues of x , and c i ( x ) 1 2 ( x ˆ x ˆ ; ( 1 ) i + 1 x ¯ x ¯ ) are called the eigenvectors of x , and (3) is known as the spectral decomposition or spectral factorization of x . The determinant, and the trace of x are defined in terms of the eigenvalues as

det ( x ) λ 1 ( x ) λ 2 ( x ) = x ˆ 2 x ¯ 2 , trace ( x ) λ 1 ( x ) + λ 2 ( x ) = 2 x ˆ .

We state the following proposition [23] about the eigenvalues { λ 1 ( x ) , λ 2 ( x ) } and eigenvectors { c 1 ( x ) , c 2 ( x ) } .

Proposition 2.1

For any x R ( m , n ) , the eigenvalues λ 1 ( x ) and λ 2 ( x ) and the eigenvectors c 1 ( x ) and c 2 ( x ) have the following properties:

  1. λ 1 ( x ) and λ 2 ( x ) are nonnegative (respectively, positive) if and only if x R + ( m , n ) (respectively, x int R + ( m , n ) ).

  2. λ 1 ( x ) and λ 2 ( x ) are eigenvalues of Crn ( x ) . Moreover, if λ 1 ( x ) λ 2 ( x ) , then each one has multiplicity one; the corresponding eigenvectors are c 1 ( x ) and c 2 ( x ) . Furthermore, the remaining m + n 2 eigenvalues of Crn ( x ) are x ˆ when x 0 .

  3. c 1 ( x ) and c 2 ( x ) have length 1 2 and are orthogonal with respect to the multiplication ”, which means

    c 1 ( x ) = c 2 ( x ) = 1 2 , and c 1 ( x ) c 2 ( x ) = 0 .

  4. c 1 ( x ) and c 2 ( x ) are idempotent under the product ”, i.e.,

    c i 2 ( x ) = c i ( x ) c i ( x ) = c i ( x ) , more generally c i p ( x ) = c i ( x ) , i = 1 , 2 and p is any positive integer .

In R ( m , n ) , we consider the nonconvex SOC, R + ( m , n ) , and let f be any function ranging from R to R . By applying f to the spectral values of the eigen-decomposition (3) of x with respect to R + ( m , n ) , one can appoint an associated vector-valued function f R + ( m , n ) on R ( m , n ) . For example, for any x R ( m , n ) , we have [23]

(4) x 2 = ( λ 1 ( x ) c 1 ( x ) + λ 2 ( x ) c 2 ( x ) ) 2 = λ 1 2 ( x ) c 1 ( x ) + λ 2 2 ( x ) c 2 ( x ) ,

and, more generally,

(5) x p = ( λ 1 ( x ) c 1 ( x ) + λ 2 ( x ) c 2 ( x ) ) p = λ 1 p ( x ) c 1 ( x ) + λ 2 p ( x ) c 2 ( x ) , for any nonnegative integer p .

We have the following definition [24].

Definition 2.1

For any function f : R R , we define a corresponding vector-valued function f R + ( m , n ) on R ( m , n ) associated with R + ( m , n ) ( n , m 1 ) by

(6) f R + ( m , n ) ( x ) f ( λ 1 ( x ) ) c 1 ( x ) + f ( λ 2 ( x ) ) c 2 ( x ) , x R ( m , n ) ,

where λ i ( x ) and c i ( x ) ( i = 1 , 2 ) are the eigenvalues and eigenvectors of x , respectively.

The result in the following proposition is based on the study of Alzalg and Benakkouche [24].

Proposition 2.2

Suppose f : R R admits a power series expansion f ( α ) = p = 0 a p α p for some real coefficients a 0 , a 1 , . Then, the function f R + ( m , n ) : R ( m , n ) R ( m , n ) given by (6) has the power series expansion as

f R + ( m , n ) ( x ) = p = 0 a p ( λ 1 p ( x ) c 1 ( x ) + λ 2 p ( x ) c 2 ( x ) ) = p = 0 a p x p x R ( m , n ) ,

where λ i ( x ) and c i ( x ) ( i = 1 , 2 ) are the eigenvalues and eigenvectors of x .

3 Inexact projection onto the nonconvex SOC

Best approximation in inner product spaces is covered in many excellent introductory books on approximation theory. An illustration would be Achieser [29], Cheney [30], Davis [31], and Rivlin [32]. According to our study, the expression “metric projection” is at least related to the work of Aronszajn and Smith [33]. We commence the following with some preparatory pieces of knowledge and results of the theory of projection before discussing our own work. We outline the fundamental mathematical conceptions that are instrumental in establishing the theory of projection. The reader is directed to the study of Deutsch [34] for further in-depth discussions of these concepts. By exploiting the special structure of the nonconvex SOC, we provide a closed-approximate form for the projection onto it acquired from the spectral decomposition. More generally, we find that the projection mapping has the form (6). We have the following definition [34].

Definition 3.1

Let V be an inner product space, x V , and C be a non-empty subset of V . An element p x C is called a best approximation, or nearest point, to x from C if

x p x = d ( x , C ) ,

where d ( x , C ) inf y C x y is the distance from x to C (also called the error in approximating x by C ).

All best approximations from x to C are denoted by P C ( x ) , i.e.,

(7) P C ( x ) = y C x y = d ( x , C ) .

This defines a mapping P C from V into the subsets of C called the metric projection onto C . Other names for metric projection include projector, nearest point mapping, proximity map, and best approximation operator. In other words, the projection operator projecting onto a closed set in a finite-dimensional real vector space maps a point outside the set onto the closest point in the set.

Recall that if C is empty, we have d ( x , C ) = + for all x R n , and the projection of x R onto the nonnegative real numbers equals max { x , 0 } .

If each x V has at least (respectively, exactly) one best approximation in C , then C is called a proximinal (respectively, Chebyshev) set. Thus, C is proximinal (respectively, Chebyshev) if and only if P C ( x ) (respectively, P C ( x ) is a singleton) for each x V . By the projection operator onto a Chebyshev subset C of V , we mean the function P C from V onto C that maps every point x into its unique projection. By the projection map onto a proximinal subset C of V , we mean a vector-valued map. We state without proofs the following theorems concerning the proximinal and Chebyshev sets, which are based on the study of Deutsch [34].

Theorem 3.1

(Uniqueness of best approximations) Let C be a convex subset of V . Then, each x V has at most one best approximation in C . In particular, every convex proximinal set is Chebyshev.

Theorem 3.2

(Finite-dimensional subspaces are Chebyshev) Let V be an inner product space. Then:

  1. Every closed subset of a finite-dimensional subspace of V is proximinal.

  2. Every closed convex subset of a finite-dimensional subspace of V is Chebyshev.

  3. Every finite-dimensional subspace of V is Chebyshev.

Theorem 3.3

(Proximinal sets are closed) Every proximinal subset of an inner product space V is closed.

The following lemma is needed in our discussion. For a proof, we refer to the study of Guo [35].

Lemma 3.1

Let C be a closed non-empty subset of R n . Then, P C ( x ) exists for every x R n .

Any point in a closed convex set of Euclidean space has a single (unique) nearest point, and certainly, the converse also holds. For nonconvex sets, however, the projection mapping is no longer single-valued and may be challenging to calculate. We have the following theorem [36].

Theorem 3.4

(Convexity of Chebyshev sets) A subset of a Euclidean space is a Chebyshev set if and only if it is non-empty, closed, and convex.

For algorithmic reasons, closeness is crucial. Frequently, one wants the limit of a sequence to acquire beneficial qualities from the sequence’s terms. Undoubtedly, closeness is a necessary requirement for projections to exist. Closeness is also a sufficient condition to ensure the presence of projections in finite-dimensional spaces. However, in infinite-dimensional spaces, this is not true in general (for a counterexample, see item (b) in [37, Example 4.3.2]).

Examining (7) brings to light the evidence that a projection operator is derived as a result of solving a constrained minimization problem. The Karush-Kuhn-Tucker (KKT) conditions, or the calculus of variations, are used to find an analytic structure for the projection operator in the majority of closed convex instances. On the other hand, there are cases where the projection operator is practically evident and may be discovered by applying the fundamental definition in a straightforward manner without the need for laborious optimization techniques. In the study of Sezan [38], some examples are provided, arguing that it is typically a challenging problem to find an exact and precise projection of a point on a closed set and is in general a difficult problem by itself. We make the following assumption.

Assumption 3.1

R + ( m , n ) R + m + n .

As mentioned before, we write R + + ( m , n ) to mean the cone R + ( m , n ) R + m + n . The projection of a point x in R ( m , n ) onto the closed, non-empty cone R + + ( m , n ) , denoted P R + + ( m , n ) ( x ) , is obtained as the solution of the following minimization problem in the variable y , where x is considered fixed:

(8) min 1 2 x y 2 s.t. y R + + ( m , n ) ,

i.e., we are concentrating on the points of R + + ( m , n ) that are closest to x in the sense of the Euclidean distance.

The problem of projecting onto the nonconvex cone R + + ( m , n ) is a specific conic optimization problem with regard to this cone. The geometrical projection problem has an obvious analytical formulation as a least-squares problem, namely, minimizing the squared norm subject to the conic constraints.

Let us define the function h x : R ( m , n ) R by

(9) h x ( y ) 1 2 x y 2 ,

for each y R ( m , n ) . For s R + + ( m , n ) , we consider the sublevel compact set S { y R ( m , n ) : h x ( y ) h x ( s ) } . Then, (8) is plainly equivalent to

min { h x ( y ) : y R + + ( m , n ) S } .

This provides a solution because h x is continuous and R + + ( m , n ) S is compact. We conclude that there is a point(s) that is closest to x in R + + ( m , n ) .

Note that the lack of convexity of R + + ( m , n ) does not cause any problems with regard to the existence of the solution. The thing that must be clarified is that the convexity is related to the uniqueness of the solution and not its existence in this situation. Figure 1 shows that projection onto a nonconvex set may not be unique.

Figure 1 
               Projection of points 
                     
                        
                        
                           x
                        
                        x
                     
                   and 
                     
                        
                        
                           y
                        
                        y
                     
                   onto a nonconvex closed set 
                     
                        
                        
                           C
                        
                        {\mathcal{C}}
                     
                  .
Figure 1

Projection of points x and y onto a nonconvex closed set C .

As a result, we have defined a vector-valued map P R + + ( m , n ) ( x ) onto the nonconvex cone R + + ( m , n ) , which associates solution(s) with Problem (8) for each x in R ( m , n ) . We now have all the ingredients needed to state the following lemma.

Lemma 3.2

The projection P R + + ( m , n ) ( x ) exists for every x R ( m , n ) . Thus, the cone R + + ( m , n ) is proximinal.

The proof of Lemma 3.2 follows directly from Theorem 3.2 and Lemma 3.1.

A thorough discussion is neither conceivable nor analyzable because there is no one strategy that can be used to address the quadratic minimization problem (8) efficiently. There has been no closed-form expression for metric projection onto a general closed, nonconvex cone until now. Reasonably, the projection onto a taken-into-consideration set should be treated according to the facts of each situation (i.e., on a case-by-case basis).

Convexification [39] is a common strategy to manage nonconvexity by turning (transforming) the nonconvex problem into a convex one. The problem of projecting a point x into the nonconvex cone R + + ( m , n ) can be relaxed to projecting onto a nonnegative orthant. Relaxation of the problem can be performed by eliminating the nonconvex conic constraint y R + ( m , n ) in (8). This yields the following nonnegative orthant projection problem:

(10) min 1 2 x y 2 s.t. y 0 .

We solve the convex optimization problem (10) derived from (8). The KKT conditions (the optimality conditions with the variable y ) of (10) are given by

y 0 , x = y z , z 0 , z T y = 0 .

To project the point x onto the nonnegative orthant R + m + n , we need to decompose it into the difference of two orthogonal members y and z , one of which is nonnegative with respect to R + m + n , and the other of which is nonnegative with respect to ( R + m + n ) = R + m + n . For clarity, to project an element x onto R + m + n , we need just to replace each negative component of x with zero and keep the positive part as it is. By substituting 0 for each negative component, the Euclidean projection of a vector x onto the nonnegative orthant can be determined and given by

P R + m + n ( x ) = x + .

In our strategy, we propose to take the solution of Problem (10) as an approximate solution for Problem (8). As a result, we conclude that

(11) P R + + ( m , n ) ( x ) = P R + m + n ( x ) = x + .

It is well known that the projection onto the nonnegative orthant is a simple and easy problem. We present the following remark [40]:

Remark 3.1

Projection onto a box or hyper-rectangle Box [ l , u ] = { x R n : l x u } takes a simple form:

( P Box ( v ) ) k = l k v k l k , v k l k v k u k , u k v k u k ,

i.e., we threshold the values at the boundary of the box. Here, l [ , + ) n and u ( , + ] n (so the box needs not be bounded) and k = 1 , , n .

Note that the projection onto R + n is a particular case of projection onto a box with Box [ 0 , + 1 ] = R + n . We find a map that will, in some ways, be a suitable replacement for a projection map for R + + ( m , n ) . Indeed, as an alternative to a real projection map onto R + + ( m , n ) , we now suggest the following mapping. In the following theorem, we give an approximate formula for the projection onto the nonconvex cone R + + ( m , n ) .

Theorem 3.5

The projection mapping can be expressed by utilizing the eigenvalues { λ 1 ( x ) , λ 2 ( x ) } and eigenvectors { c 1 ( x ) , c 2 ( x ) } of x in the following closed-aporoximate form expression:

(12) P R + + ( m , n ) ( x ) = λ 1 + ( x ) c 1 ( x ) + λ 2 + ( x ) c 2 ( x ) = 1 2 ( x + x ) .

Proof

We know that for all α R , α + = ( α + α ) 2 . Now, by the spectral decomposition of x given in (3) and the formula of x described in Definition 2.1, we have

x + x 2 = λ 1 ( x ) c 1 ( x ) + λ 2 ( x ) c 2 ( x ) 2 + λ 1 ( x ) c 1 ( x ) + λ 2 ( x ) c 2 ( x ) 2 = λ 1 ( x ) + λ 1 ( x ) 2 c 1 ( x ) + λ 2 ( x ) + λ 2 ( x ) 2 c 2 ( x ) = max { 0 , λ 1 ( x ) } c 1 ( x ) + max { 0 , λ 2 ( x ) } c 2 ( x ) = λ 1 + ( x ) c 1 ( x ) + λ 2 + ( x ) c 2 ( x ) f .

Thus, the last equality in (12) holds. We now show the first equality. First, we consider the case when x R + + ( m , n ) . In this case, P R + + ( m , n ) ( x ) = x . Note that λ 1 ( x ) , λ 2 ( x ) 0 ; hence, λ 1 + ( x ) = λ 1 ( x ) and λ 2 + ( x ) = λ 2 ( x ) , and therefore, λ 1 + ( x ) c 1 ( x ) + λ 2 + ( x ) c 2 ( x ) = x = P R + + ( m , n ) ( x ) . Now, we examine the case when x R + + ( m , n ) . Note that λ 1 ( x ) 0 and λ 2 ( x ) 0 . In this case, λ 1 + ( x ) = λ 1 ( x ) and λ 2 + ( x ) = 0 . The desired result can then be obtained by considering the projection problem (8) and using (11). The proof is complete.□

Using (12), and keeping in mind that λ 1 ( x ) is always nonnegative, we have

P R + + ( m , n ) ( x ) = x , if λ 1 ( x ) 0 , λ 2 ( x ) 0 , 1 2 x ˆ 1 + x ¯ x ˆ ; x ¯ 1 + x ˆ x ¯ , if λ 1 ( x ) > 0 , λ 2 ( x ) < 0 , 0 , if λ 1 ( x ) = 0 , λ 2 ( x ) 0 .

In relation to the projection onto the nonconvex cone R + + ( m , n ) , we note that this projection is not an exact formula but just an approximation one because there is no method to solve Problem (8) exactly. We remark that the nearest points we establish in this manner are possibly not unique.

A pleasant feature of the nonconvex cone R + + ( m , n ) is that it permits an explicit approximation representation of the projection mapping, which has the form (6). Note that P R + + ( m , n ) is positively homogeneous, i.e., for any t 0 , P R + + ( m , n ) ( t x ) = t P R + + ( m , n ) ( x ) .

Numerous authors (see previous studies [41,42] for instance) have looked into the differentiability characteristics of the metric projection onto convex sets. Furthermore, Shapiro [43] made some approaches to the nonconvex situation. In light of our study on nonconvex cone R + + ( m , n ) , we calculate the derivative of the projection onto it. In Lemma 3.3, we provide the Jacobian of the projection mapping at a point when it is differentiable. The proof is simple and, therefore, omitted.

Lemma 3.3

The Jacobian of the projection mapping P R + + ( m , n ) at a point x = ( x ˆ , x ¯ ) R ( m , n ) with x ˆ 0 and x ¯ 0 is given by

J P R + + ( m , n ) ( x ) = I m + n , if λ 1 ( x ) 0 , λ 2 ( x ) 0 , 1 2 I m + x ¯ x ˆ I m x ˆ x ˆ T x ˆ 2 x ˆ x ˆ x ¯ T x ¯ x ¯ x ¯ x ˆ T x ˆ I n + x ˆ x ¯ I n x ¯ x ¯ T x ¯ 2 , if λ 1 ( x ) > 0 , λ 2 ( x ) < 0 , O , if λ 1 ( x ) = 0 , λ 2 ( x ) 0 .

The following theorem [44] describes a well-known characterization of the projection of a point onto a closed convex set K . In our turn, based on it, we make some modifications to obtain results for the nonconvex cone R + + ( m , n ) in the form of Theorem 3.7 and Corollary 3.1.

Theorem 3.6

Let K be a closed convex set. A point p x K is the projection P K ( x ) if and only if x p x , p p x 0 , for all p K .

This leads us to the following results.

Theorem 3.7

Let p x R + + ( m , n ) be the projection P R + + ( m , n ) ( x ) . For p R + + ( m , n ) , if there exist α ] 0 , 1 [ such that α p + ( 1 α ) p x R + + ( m , n ) , then x p x , p p x 0 .

Proof

Assume that p x R + + ( m , n ) is the projection P R + + ( m , n ) ( x ) . Then, p x is a solution of (8). Let p R + + ( m , n ) be such that there exist α ] 0 , 1 [ satisfying α p + ( 1 α ) p x R + + ( m , n ) . Using h x as defined in (9), we have

h x ( p x ) h x ( α p + ( 1 α ) p x ) = 1 2 α p + ( 1 α ) p x x 2 = 1 2 p x x + α ( p p x ) 2 = 1 2 p x x 2 + α 2 2 p p x 2 + α p x x , p p x .

Since h x ( p x ) = 1 2 p x x 2 , we obtain

α 2 2 p p x 2 + α p x x , p p x 0 .

Now, dividing both sides by α , we also obtain

α 2 p p x 2 + p x x , p p x 0 .

By letting α 0 , we obtain

p x x , p p x 0 or x p x , p p x 0 .

The proof is complete.□

Corollary 3.1

If x p x , p p x 0 for all p R + + ( m , n ) , then a point p x R + + ( m , n ) is the projection P R + + ( m , n ) ( x ) .

Proof

Assume that p x R + + ( m , n ) satisfies x p x , p p x 0 for all p R + + ( m , n ) . Then, we have two cases: If p x = x , then p x certainly solves (8). If p x x , let p be an arbitrary point in R + + ( m , n ) , then

0 x p x , p p x = x p x , p x + x p x = x p x , p x + x p x , x p x = x p x , p x + x p x 2 x p x p x + x p x 2 ( Cauchy-Schwarz inequality ) = p x + x p x ( dividing by x p x > 0 ) .

As a result, x p x p x , and so p x solves (8). The proof is complete. □

4 ADMM

In this section, we are interested in solving the optimization problem that seeks to minimize a convex objective function over the intersection of the nonconvex SOC R + ( m , n ) and the nonnegative orthant cone R + m + n using a heuristic method based on the ADMM. Note that the algorithm heavily depends on the projection mapping presented in Section 3. We start by reviewing the classical ADMM before considering it for our setting.

We review the classical ADMM, which is used to solve convex, separable, two-block problems as in the study of Boyd et al. [11]. The ADMM is dedicated to handling convex optimization problems of the form:

(13) min f ( x ) + g ( z ) s.t. A x + B z = c ,

where x R n and z R m are decision variables, A R p × n , B R p × m , c R p , and f , g are convex functions. Note that the objective function is separable beyond the splitting of the decision variable into two parts, called x and z . The augmented Lagrangian of Problem (13) is given as

L ρ ( x , z , u ) f ( x ) + g ( z ) + u T ( A x + B z c ) + ρ 2 A x + B z c 2 ,

where u is the Lagrange multiplier or the dual variable for A x + B z c and ρ > 0 is called the penalty parameter, or more precisely, it represents the primal penalty parameter. Note that L 0 is the standard Lagrangian for Problem (13). The ADMM iterations are (in the unscaled form)

x ( k + 1 ) argmin x L ρ ( x , z ( k ) , u ( k ) ) , z ( k + 1 ) argmin z L ρ ( x ( k + 1 ) , z , u ( k ) ) , u ( k + 1 ) u ( k ) + ρ ( A x ( k + 1 ) + B z ( k + 1 ) c ) .

The superscript is the iteration counter. Decomposition is only possible when f or g is separable, and this is accomplished by splitting the minimization over x and z into two phases.

In the scaled form of the ADMM, we have

(14) x ( k + 1 ) argmin x f ( x ) + ρ 2 A x + B z ( k ) c + y ( k ) 2 , z ( k + 1 ) argmin z g ( z ) + ρ 2 A x ( k + 1 ) + B z c + y ( k ) 2 , y ( k + 1 ) y ( k ) + A x ( k + 1 ) + B z ( k + 1 ) c ,

where y = ( 1 ρ ) u is the scaled dual variable.

The appealing theory and widespread application of the ADMM for convex programming problems make it alluring to experiment with counterpart heuristics for nonconvex problems. Now, we introduce and define the nonconvex SOC programming problem and derive a heuristic algorithm for solving it via ADMM. Before doing this, we review the concepts of the proximal operator [40,45,46] and the proximal algorithm [40].

Definition 4.1

Let f : R n R { + } be a given extended-real-valued function. The proximal operator prox f : R n R n of f is defined as

prox f ( v ) argmin x f ( x ) + 1 2 x v 2 .

The proximal operator of f with parameter γ ( γ > 0 ) is defined as

(15) prox f , γ ( v ) argmin x f ( x ) + 1 2 γ x v 2 .

The term “proximal operator,” in its current form, and its characteristics were first used in Moreau’s key work in the 1960s [47,48], which is why it is also known as “Moreau’s proximal mapping.” The mapping prox f or the proximal operator maps a vector x R n into a subset of R n , which could be a set with multiple vectors, singletons, or empty.

The proximal minimization algorithm, also called proximal iteration or the proximal point algorithm, uses

x ( k + 1 ) prox f , γ ( x ( k ) ) ,

where f : R n R { + } is a closed proper convex function, k is the iteration counter, and x ( k ) denotes the k th iterate of the algorithm. Some authors have studied proximal algorithms in the nonconvex case [49,50]. Theorem 4.1, referred to as the “first prox theorem,” asserts that if f is proper, closed, and convex, then prox f is always a singleton, i.e., the prox exists and is unique. We have the following theorems [46].

Theorem 4.1

(First prox theorem) If f : R n R { + } is a proper closed and convex function, then prox f ( x ) is a singleton for any x R n .

The indicator function δ C of a set C is defined as

δ C ( z ) = 0 z C , + z C .

Note that if C is non-empty, closed, and convex, then the indicator function δ C is closed, convex, and proper.

Theorem 4.2

If C R n is non-empty, then prox δ C ( x ) = P C ( x ) for any x R n .

As a result, according to the first prox theorem, the projection mapping, which coincides with the proximal mapping, exists and is unique. We also have the following remark [40].

Remark 4.1

Many problems of substantial current interest in areas, such as machine learning, high-dimensional statistics, statistical signal processing, compressed sensing, and others, are often more natural to solve using proximal algorithms rather than converting them to symmetric cone programs and using interior-point methods. Proximal operators and proximal algorithms thus comprise an important set of tools that we believe should be familiar to everyone working in such fields.

4.1 Optimization problem over the nonconvex SOC

We consider a problem that minimizes a convex function f : R n R { + } over the intersection of the nonconvex SOC R + ( m , n ) and the nonnegative orthant cone R + m + n . Our problem looks like this:

(16) min f ( x ) s.t. x R + + ( m , n ) ,

where R + + ( m , n ) = R + ( m , n ) R + m + n , as defined earlier. It is clear that ( R + + ( m , n ) ) { 0 } . To prove this claim, suppose, on the contrary, that ( R + + ( m , n ) ) = { 0 } . Then, there is no nonzero vector x that belongs to ( R + + ( m , n ) ) . Let y R + + ( m , n ) be arbitrary, and take, for example, x = 1 = ( 1 ˆ ; 1 ¯ ) . Now

x , y = x ˆ T y ˆ + x ¯ T y ¯ = 1 ˆ T y ˆ + 1 ¯ T y ¯ 0 ,

hence x ( R + + ( m , n ) ) , which is a contradiction.

We assume that Assumption 3.1 still holds for the remaining part of this article. To solve Problem (16) using ADMM, we first need to reformulate it into an appropriate form (which is called the consensus form) by introducing the artificial or auxiliary variable z . Problem (16) becomes

(17) min f ( x ) + g ( z ) s.t. x z = 0 ,

where g is the indicator function of R + + ( m , n ) .

The constraint, or condition x = z , is the so-called copy constraint that joins the variables x and z . It enables us to shift the nonconvex conic constraint that was previously placed on x to z . Copy constraints are frequently employed in decomposition techniques to divide the original problem into parts. Here, the variable has been divided into two variables, x and z , and the requirement that they concur by consensus has been added.

ADMM solves Problem (17) by utilizing its augmented Lagrangian function

L ρ ( x , z , u ) = f ( x ) + g ( z ) + u T ( x z ) + ρ 2 x z 2 ,

where u R n is a dual variable connected to the consensus constraint and ρ > 0 is the penalty parameter. This is the standard Lagrangian augmented with an extra quadratic penalty on the equality constraint function.

The fundamental idea underlying the ADMM technique is to successively accomplish the minimization of L ρ ( x , z , u ) with respect to x with z and u fixed, then with respect to z with x and u fixed, and then an adjustment (update) to the multiplier u . With this strategy, separability is maintained while removing the challenge of joint minimization in the key variables, x and z . This approach can be seen as a variant of the sequential decomposition method.

Considering that at iteration k , we have computed z ( k ) and u ( k ) , the ( k + 1 )st iteration of ADMM are

(18) x ( k + 1 ) argmin x L ρ ( x , z ( k ) , u ( k ) ) , z ( k + 1 ) argmin z L ρ ( x ( k + 1 ) , z , u ( k ) ) , u ( k + 1 ) u ( k ) + ρ ( x ( k + 1 ) z ( k + 1 ) ) .

The step-size of the dual variable update is equal to the augmented Lagrangian parameter ρ . Since the dual update occurs after the z -update but before the x -update, the functions of x and z are almost symmetric, but not entirely. The ADMM in (18) is called the unscaled form for Problem (17).

In the ADMM, the algorithm state is made up of z ( k ) and u ( k ) . Alternatively, we take ( z ( k + 1 ) ; u ( k + 1 ) ) to be a function of ( z ( k ) ; u ( k ) ) . The variable x ( k ) is a computed intermediate result from the preceding state ( z k 1 ; u k 1 ) , not a component of the current state.

By merging the quadratic and linear terms in the augmented Lagrangian and scaling the dual variable, the ADMM can be expressed in a slightly different manner that is frequently more practical. To write the ADMM in scaled form, we define the residual as r = x z . In this case, we have

u T r + ρ 2 r 2 = ρ 2 r + 1 ρ u 2 1 2 ρ u 2 = ρ 2 r 2 + 2 ρ r T u + 1 ρ 2 u 2 1 2 ρ u 2 = ρ 2 r 2 + r T u + 1 2 ρ u 2 1 2 ρ u 2 = ρ 2 r 2 + ρ r T y + ρ 2 y 2 ρ 2 y 2 = ρ 2 r + y 2 ρ 2 y 2 ,

where y = ( 1 ρ ) u is the scaled dual variable.

Using the scaled dual variable, the augmented Lagrangian function of Problem (17) is

L ρ ( x , z , y ) = f ( x ) + g ( z ) + ρ 2 x z + y 2 ρ 2 y 2 .

After dropping the constant terms and using the scaled dual variable, the ADMM has the following form:

(19) x ( k + 1 ) argmin x ( f ( x ) + ρ 2 x z ( k ) + y ( k ) 2 ) ,

(20) z ( k + 1 ) argmin z ( g ( z ) + ρ 2 x ( k + 1 ) z + y ( k ) 2 ) ,

y ( k + 1 ) y ( k ) + x ( k + 1 ) z ( k + 1 ) .

Since (19) and (20) merely call for the minimization of a quadratic perturbation of f and g , respectively, the ADMM effectively separates the functions f and g . Note that the expression “alternating direction” refers to how x and z are updated. They are updated in an alternating or sequential manner.

4.2 ADMM heuristic algorithm

Our ADMM heuristic is made up of an x -minimization step, a z -minimization step, and a dual variable update. While the z -minimization step is not a convex problem, the x -minimization step is a convex one. When we separated the nonconvex conic constraint by defining an artificial variable and employed the ADMM to solve this reformulation form of the nonconvex SOC problem (16), we found that there is an approximate closed-form solution to manipulate the nonconvexity via the projection onto the cone. In other words, in the first phase, we need to solve a convex problem without the nonconvex conic constraint, and in the second step, the nonconvex requirement is attempted to be satisfied using the projection.

4.2.1 x -minimization step

The first phase of the algorithm is described in (19) and is exactly the proximal operator defined in (15), which means that this step is required to solve a convex optimization problem or, more precisely, evaluate prox f , 1 ρ , where

x ( k + 1 ) argmin x ( f ( x ) + ρ 2 x z ( k ) + y ( k ) 2 ) = argmin x ( f ( x ) + ρ 2 x ( z ( k ) y ( k ) ) 2 ) = prox f , 1 ρ ( z ( k ) y ( k ) ) .

If, in addition to being convex, f is closed and proper, then by the first prox theorem (Theorem 4.1), the function on the right-hand side of (19) has a unique minimizer. We now go through the proximal operator evaluation for some functions (see previous studies [11,40,46] for more details). The simplest method is to utilize generic optimization techniques, taking advantage of the general structure of the problem because, as we already mentioned, evaluating the proximal operator (19) requires solving a convex optimization problem. There are frequently more efficient specialized methods, or even analytical solutions, to manipulate this convex optimization problem. However, Parikh and Boyd [40] emphasized that proximal methods can still be quite useful even in situations where a closed-form solution for the proximal operator is not obtainable. We should be aware that there are numerous methods for determining the proximal operator of a specific function according to the type of problem (constrained or unconstrained) and the nature of the function (smooth or nonsmooth). For instance of generic techniques, if we encounter a constrained problem, we can, for example, utilize an interior-point technique or a gradient projection method if f is smooth and a projected subgradient method if f is nonsmooth. On the other hand, if we face an unconstrained optimization problem (as in our situation), we can utilize a subgradient approach to resolve the problem if f is a general nonsmooth function. We can employ a gradient method, Newton method, quasi-Newton method, or other techniques if f is smooth. These and many additional techniques are covered, for instance, in the Nocedal and Wright [51].

If f is a convex quadratic function defined by f ( x ) = 1 2 x T A x + b T x + c , where A S + m + n (the set of ( m + n ) × ( m + n ) symmetric positive semidefinite matrices), b R m + n , and c R . We note that A + ρ I is always invertible, the analytical expression of the proximal operator prox f , 1 ρ ( z ( k ) y ( k ) ) for f at z ( k ) y ( k ) is given as

prox f , 1 ρ ( z ( k ) y ( k ) ) argmin x ( 1 2 x T A x + b T x + c + ρ 2 x ( z ( k ) y ( k ) ) 2 ) = ( A + ρ I ) 1 ( ρ ( z ( k ) y ( k ) ) b ) .

This result has a number of significant special instances. For example, if f ( x ) = b T x + c , i.e., if f is affine, then prox f , 1 ρ ( z ( k ) y ( k ) ) = ρ ( z ( k ) y ( k ) ) b . If f is a constant function, so f ( x ) = c , then prox f , 1 ρ ( z ( k ) y ( k ) ) = z ( k ) y ( k ) . If f ( x ) = 1 2 x 2 , then

prox f , 1 ρ ( z ( k ) y ( k ) ) argmin x ( 1 2 x 2 + ρ 2 x ( z ( k ) y ( k ) ) 2 ) = ρ 1 + ρ ( z ( k ) y ( k ) )

and is called a shrinkage operator.

4.2.2 z -minimization step

Now, let us specifically restate the second phase, which is given by

z ( k + 1 ) argmin z ( g ( z ) + ρ 2 x ( k + 1 ) z + y ( k ) 2 )

= argmin z ( g ( z ) + ρ 2 z ( x ( k + 1 ) + y ( k ) ) 2 ) = prox g , 1 ρ ( x ( k + 1 ) + y ( k ) ) .

Recall that g is the indicator function of R + + ( m , n ) , thus from Theorem 4.2, the proximal operator of g reduces to the Euclidean projection onto R + + ( m , n ) , and hence, the z -minimization is corresponding to the minimization problem

(21) min 1 2 z ( x ( k + 1 ) + y ( k ) ) 2 s.t. z R + + ( m , n ) .

Problem (21) is exactly the definition of the projection of x ( k + 1 ) + u ( k ) onto R + + ( m , n ) . It is well known that the projection of a point onto a nonconvex set is often NP-hard, but our study proposes an approximate solution to this particular problem discussed in Section 3, thanks to the particular structure of the cone. The resulting points are members of the nonconvex cone R + + ( m , n ) and share a variety of other desirable qualities, even though they are not necessarily projected points. Note that we can find more than one point, i.e., multiple points, and in this situation, we pick one of them at random.

As a result, the x -update requires minimizing f plus a convex quadratic function, or more precisely, the evaluation of the proximal operator prox f , 1 ρ ( z ( k ) y ( k ) ) , while z update involves the Euclidean projection onto the nonconvex SOC P R + + ( m , n ) ( x ( k + 1 ) + y ( k ) ) .

The benefit of this strategy is that the objective terms are treated entirely separately. In fact, the functions are accessible through their proximal operators. When the proximal operators of f and g can be evaluated efficiently, the approach is most advantageous.

Using the scaled dual variable, the heuristic ADMM reduces to the proximal version as follows:

(22) x ( k + 1 ) prox f , 1 ρ ( z ( k ) y ( k ) ) ,

(23) z ( k + 1 ) P R + + ( m , n ) ( x ( k + 1 ) + y ( k ) ) ,

(24) y ( k + 1 ) y ( k ) + x ( k + 1 ) z ( k + 1 ) ,

which is the final ADMM form for Problem (17). This last structure of the ADMM is known as the scaled form because it is represented in terms of a scaled version of the dual variable.

The scaled and unscaled forms are obviously identical, but the formulas in the scaled form of the heuristic ADMM (22)-(24) are shorter than in the unscaled form (18); therefore, we will utilize the scaled form in the sequel.

The ADMM-based heuristic for solving Problem (16) is stated in Algorithm 1. Algorithm 1 is also visualized in Figure 2.

Figure 2 
                     Flowchart of Algorithm 1.
Figure 2

Flowchart of Algorithm 1.

Algorithm 1: An ADMM-based heuristic algorithm for Problem (16)
begin
INPUT : The number of iterations n iter Arbitrary feasible starting points z ( 0 ) , y ( 0 ) PROCEDURE : Step 1 : Initialization ; Set k = 0 , ρ = ρ 0 > 0 , n iter = n max , y ( 0 ) = 0 and z ( 0 ) N ( 0 , σ 2 I ) . Step 2 : Solve the x -minimization ; Evaluate x ( k + 1 ) according to (22). Step 3 : Find the approximate z -minimization ; Determine z ( k + 1 ) using (23) and ((11) or (12)) . Step 4 : Compute the dual variable y -update ; Update y ( k + 1 ) via (24) . Step 5 : if k = n max then S t o p else Step 6 : Set k = k + 1 and go Step 2.

Note that the accuracy of the approximation is significantly impacted by the choice of z ( 0 ) . The variable y ( 0 ) is always set to 0 but z ( 0 ) is selected randomly from a normal distribution N ( 0 , σ 2 I ) , where σ > 0 is an algorithm parameter.

4.3 On the convergence of the proposed algorithm

While the convergence of the standard ADMM (i.e., the 2-block ADMM) for convex objective functions has long been supported, its convergence for nonconvex objective functions has only recently been proven [5254].

The ADMM can be directly extended to the case of a multi-block convex minimization problem where the objective function is the sum of more than two separable convex functions. This is both highly desirable and practically advantageous. Chen et al. [55] demonstrated that the direct extension of the ADMM is not always convergent. They provided a necessary requirement to guarantee the convergence of the direct extension of the ADMM and gave an illustration of its divergence. A theoretical result on the extension of the ADMM for solving linearly constrained separable convex programming, whose objective function is separable into n ( with n 3 ) individual functions without coupled variables, was presented by Han and Yuan [56]. They demonstrated that the convergence of this extension is legitimate under the additional assumption that the objective function is strongly convex. In the study of Cai et al. [57], the authors examined the convergence of the three-block instance of such a model. They demonstrated that if the objective function is strongly convex, the penalty parameter is appropriately restricted, and a few presumptions about the operators in the constraints hold, the convergence of the three-block direct extension of ADMM may be guaranteed. Hong and Luo [58] have analyzed the convergence and the rate of convergence of the classical ADMM in the absence of strong convexity when the number of variable blocks is more than two.

Wang et al. [59] examined the convergence of the ADMM for minimizing a nonconvex and potentially nonsmooth objective function under linked linear equality constraints. Taking advantage of the result in the study of Wang et al. [59], Mei et al. [60] applied an ADMM to solve a nonconvex variational optimization problem with a convergence guarantee.

It has been discovered that the multi-block ADMM, a natural extension of ADMM, is a very effective strategy for resolving various nonconvex optimization problems. Thus, it is anticipated that the multi-block ADMM convergence theory will be established under nonconvex frameworks. A Bregman modification of 3-block ADMM and proof of its convergence for a large family of nonconvex functions were reported by Wang et al. [61]. They further expanded the convergence findings to the n -block case ( with n 3 ) , demonstrating the viability of multi-block ADMM applications in nonconvex settings. Yashtini [62] studied the convergence and convergence rates of a multi-block proximal ADMM for solving linearly constrained separable nonconvex nonsmooth optimization problems within the framework of Kurdyka-ŁĄojasiewicz property.

Bartz et al. [63] studied an adaptive version of ADMM that incorporated generalized notions of convexity and varying penalty parameters adapted to the convexity constants of the functions. The authors proved convergence under natural assumptions and illustrated their approach through numerical experiments on a signal denoising problem.

Yang et al. [64] introduced a distributed algorithm tailored for nonconvex and nonsmooth problems characterized by both separate and composite objective components, local bounded convex constraints, and coupled linear constraints. The authors proposed a proximal ADMM variant that updated dual variables in a discounted manner, establishing convergence to approximate stationary points. The efficacy of this method was demonstrated through applications such as multi-zone heating, ventilation, and air-conditioning control in smart buildings.

Barber and Sidky [65] provided convergence guarantees for ADMM under a restricted strong convexity assumption, without requiring smoothness or differentiability. The authors validate their theoretical results with simulated examples, including computed tomography image reconstruction problems, where both objective functions are nondifferentiable.

Zeng et al. [66] proposed a unified framework of inexact stochastic ADMM for solving nonconvex problems subject to linear constraints, where the objective comprised an average of finite-sum smooth functions and a nonsmooth but possibly nonconvex function. The framework covered several existing algorithms and guided the design of a novel accelerated hybrid stochastic ADMM algorithm. Xue et al. [67] presented a refined iteration of the distributed Bregman ADMM, designed to tackle nonconvex consensus issues, particularly those with multiple blocks. They established robust convergence under specific conditions and demonstrated notable efficiency enhancements in the refined algorithm.

Now, we discuss some fundamental yet extremely general convergence results of ADMM for solving Problem (13). Boyd et al. [11] demonstrated that the convex ADMM converges under the following two conditions.

Assumption 4.1

The functions f : R n R { + } and g : R n R { + } are closed, proper, and convex.

Assumption 4.2

The unaugmented Lagrangian L 0 has a saddle point.

But Chen et al. [68] indicated that this is incorrect. They actually provided a counterexample of a convex problem that satisfies both Assumptions 4.1 and 4.2 to demonstrate that the claim made by Boyd et al. [11] regarding the convergence of the alternate direction method of multipliers for solving linearly constrained convex optimization problems is false. They found that the subproblems in (14) are unsolvable for the proposed example. This suggests that Boyd et al.’s [11] convergence analysis is flawed. Chen et al. [68] claimed that the following extra assumption is required.

Assumption 4.3

The subproblems of the ADMM are solvable and have non-empty bounded solution sets.

Assumption 4.2 indicates that there exists ( x * , z * , y * ) , not necessarily unique, such that

L 0 ( x * , z * , y ) L 0 ( x * , z * , y * ) L 0 ( x , z , y * )

holds true for all x , z , y .

As a result, the convergence of the ADMM is guaranteed by the presence of the three assumptions. Nevertheless, it is unclear and still an open question whether the ADMM heuristic will converge for optimization problems over nonconvex sets (see the following remark [11]).

Remark 4.2

We explore the use of the ADMM for nonconvex problems, focusing on cases in which the individual steps in the ADMM, i.e., the x - and z -updates, can be carried out exactly. Even in these cases, the ADMM need not converge, and when it does converge, it need not converge to an optimal point; it must be considered just another local optimization method. The hope is that it will possibly have better convergence properties than other local optimization methods, where “better convergence” can mean faster convergence or convergence to a point with better objective value. For nonconvex problems, the ADMM can converge to different (and in particular, nonoptimal) points, depending on the initial variables and the parameter ρ .

5 Conclusion

In this article, we have exploited the special algebraic structure of the nonconvex SOC to present a closed-approximate form for the projection onto this cone and to express it via spectral decomposition. In order to solve the nonconvex SOC programming problem, which is the main concern of this article, we have employed a heuristic technique based on the ADMM. The approach is constructed in two stages: more specifically, a convex optimization problem comes first, followed by a nonconvex optimization problem.

The topic of convergence is still up for debate. When the ADMM is used to solve convex programming problems, a sequence of convex problems along with a dual update, are produced. It is guaranteed that the ADMM algorithm will converge to a global optimal solution. In contrast, there is no affirmation of convergence for a heuristic ADMM applied to optimization problems over nonconvex sets since each step demands a difficult-to-solve nonconvex problem. Although the convergence of the ADMM-based heuristic algorithm for nonconvex SOC programming is not theoretically assured, our upcoming research will keep trying to examine and enhance the convergence feature of the proposed method.

Acknowledgments

We thank the anonymous expert referees for their careful reading of this article and their suggestions. These comments helped to significantly improve the presentation of this article.

  1. Funding information: The authors state no funding involved.

  2. Author contributions: LB: conceptualization, methodology, algorithm design, formal analysis, writing – original draft. BA: supervision, project administration, writing – review, and editing.

  3. Conflict of interest: The authors state no conflict of interest.

  4. Data availability statement: Data sharing does not apply to this article as no datasets were generated or analyzed during this study.

References

[1] R. Pytlak, Conjugate gradient algorithms in nonconvex optimization. Springer Science & Business Media, Berlin, Heidelberg, 2008, vol. 89. Search in Google Scholar

[2] S. K. Mishra, Topics in nonconvex optimization. Springer, Berlin, Heidelberg, 2011. 10.1007/978-1-4419-9640-4Search in Google Scholar

[3] R. G. Strongin and Y. D. Sergeyev, Global optimization with non-convex constraints: Sequential and parallel algorithms. Springer Science & Business Media, Berlin, Heidelberg, 2013, vol. 45. Search in Google Scholar

[4] A. J. Zaslavski, Nonconvex optimal control and variational problems. Springer, Berlin, Heidelberg, 2013. 10.1007/978-1-4614-7378-7Search in Google Scholar

[5] E. S. Mistakidis and G. E. Stavroulakis, Nonconvex optimization in mechanics: algorithms, heuristics and engineering applications by the FEM. Springer Science & Business Media, Berlin, Heidelberg, 2013, vol. 21. Search in Google Scholar

[6] M. Fukushima, “Application of the alternating direction method of multipliers to separable convex programming problems,” Comput. Optim. Appl., vol. 1, pp. 93–111, 1992. 10.1007/BF00247655Search in Google Scholar

[7] J. Eckstein, “Parallel alternating direction multiplier decomposition of convex programs,” J. Optim. Theory Appl., vol. 80, no. 1, pp. 39–62, 1994. 10.1007/BF02196592Search in Google Scholar

[8] M. K. Ng, P. Weiss, and X. Yuan, “Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods,” SIAM J. Sci. Comput., vol. 32, no. 5, pp. 2710–2736, 2010. 10.1137/090774823Search in Google Scholar

[9] J. Yang and Y. Zhang, “Alternating direction algorithms for ℓ1-problems in compressive sensing,” SIAM J. Sci. Comput., vol. 33, no. 1, pp. 250–278, 2011. 10.1137/090777761Search in Google Scholar

[10] D. Bertsekas and J. Tsitsiklis, Parallel and distributed computation: numerical methods. Athena Scientific, Belmont, Massachusetts, USA, 2015. Search in Google Scholar

[11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends® Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011. Search in Google Scholar

[12] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, “Bregman iterative algorithms for ℓ1-minimization with applications to compressed sensing,” SIAM J. Imag. Sci., vol. 1, no. 1, pp. 143–168, 2008. 10.1137/070703983Search in Google Scholar

[13] H. H. Bauschke and J. M. Borwein, “Dykstra’s alternating projection algorithm for two sets,” J. Approx. Theory, vol. 79, no. 3, pp. 418–443, 1994. 10.1006/jath.1994.1136Search in Google Scholar

[14] S. You and Q. Peng, “A non-convex alternating direction method of multipliers heuristic for optimal power flow,” in: 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm). IEEE, 2014, pp. 788–793. 10.1109/SmartGridComm.2014.7007744Search in Google Scholar

[15] S. Diamond, R. Takapoui, and S. Boyd, “A general system for heuristic minimization of convex functions over non-convex sets,” Optim. Methods Softw., vol. 33, no. 1, pp. 165–193, 2018. 10.1080/10556788.2017.1304548Search in Google Scholar

[16] C. Moreira Costa, D. Kreber, and M. Schmidt, “An alternating method for cardinality-constrained optimization: A computational study for the best subset selection and sparse portfolio problems,” INFORMS J. Comput., vol. 34, no. 6, pp. 2968–2988, 2022. 10.1287/ijoc.2022.1211Search in Google Scholar

[17] R. Chartrand and B. Wohlberg, “A nonconvex admm algorithm for group sparsity with sparse groups,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 6009–6013. 10.1109/ICASSP.2013.6638818Search in Google Scholar

[18] Z. Wen, C. Yang, X. Liu, and S. Marchesini, “Alternating direction methods for classical and ptychographic phase retrieval,” Inverse Prob., vol. 28, no. 11, p. 115010, 2012. 10.1088/0266-5611/28/11/115010Search in Google Scholar

[19] Y. Xuuu, W. Yin, Z. Wen, and Y. Zhang, “An alternating direction algorithm for matrix completion with nonnegative factors,” Front. Math. China, vol. 7, pp. 365–384, 2012. 10.1007/s11464-012-0194-5Search in Google Scholar

[20] Y. Shen, Z. Wen, and Y. Zhang, “Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization,” Optim. Methods Softw., vol. 29, no. 2, pp. 239–263, 2014. 10.1080/10556788.2012.700713Search in Google Scholar

[21] A. P. Liavas and N. D. Sidiropoulos, “Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers,” IEEE Trans. Signal Process, vol. 63, no. 20, pp. 5450–5463, 2015. 10.1109/TSP.2015.2454476Search in Google Scholar

[22] R. Lai and S. Osher, “A splitting method for orthogonality constrained problems,” J. Sci. Comput., vol. 58, pp. 431–449, 2014. 10.1007/s10915-013-9740-xSearch in Google Scholar

[23] B. Alzalg and L. Benakkouche, “The nonconvex second-order cone: Algebraic structure toward optimization,” J. Optim. Theory Appl., vol. 201, pp. 631–667, 2024. 10.1007/s10957-024-02406-5Search in Google Scholar

[24] B. Alzalg and L. Benakkouche, “Functions and inequalities associated with the nonconvex second-order cone,” Optim. Lett., 2025. 10.1007/s11590-025-02243-zSearch in Google Scholar

[25] H. H. Bauschke and J. M. Borwein, “On projection algorithms for solving convex feasibility problems,” SIAM Rev., vol. 38, no. 3, pp. 367–426, 1996. 10.1137/S0036144593251710Search in Google Scholar

[26] M. Ujvári, “On the projection onto a finitely generated cone,” Acta Cybern., vol. 22, no. 3, pp. 657–672, 2016. 10.14232/actacyb.22.3.2016.7Search in Google Scholar

[27] X. Hu, “An exact algorithm for projection onto a polyhedral cone,” Aust. N. Z. J. Stat., vol. 40, no. 2, pp. 165–170, 1998. 10.1111/1467-842X.00018Search in Google Scholar

[28] R. Orsi, “Numerical methods for solving inverse eigenvalue problems for nonnegative matrices,” SIAM J. Matrix Anal. Appl., vol. 28, no. 1, pp. 190–212, 2006. 10.1137/050634529Search in Google Scholar

[29] N. I. Achieser, Theory of approximation, Ungar, New York, 1956. Search in Google Scholar

[30] E. W. Cheney, Introduction to approximation theory. McGraw-Hill Book Company, New York, USA, 1966. Search in Google Scholar

[31] P. J. Davis, Interpolation and approximation. Blaisdell, New York, 1963. Search in Google Scholar

[32] T. J. Rivlin, An introduction to the approximation of functions. Blaisdell, Waltham, MA, 1969. 10.2307/2004443Search in Google Scholar

[33] N. Aronszajn and K. T. Smith, “Invariant subspaces of completely continuous operators,” Ann. Math., vol. 60, no. 2, pp. 345–350, 1954. 10.2307/1969637Search in Google Scholar

[34] F. Deutsch, Best approximation in inner product spaces. Springer, Berlin, Heidelberg, 2001, vol. 7. 10.1007/978-1-4684-9298-9Search in Google Scholar

[35] Y. Guo, Cq algorithms: theory, computations and nonconvex extensions, Ph.D. dissertation, University of British Columbia, 2014. Search in Google Scholar

[36] J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization: Theory and examples, ser. CMS Books in Mathematics. Springer, New York, 2005. [Online.] Available: https://books.google.jo/books?id=TXWzqEkAa7IC. 10.1007/978-0-387-31256-9Search in Google Scholar

[37] C. Tisseron, Notions de topologie. Introduction aux espaces fonctionnels. Hermann, Paris, 1985. Search in Google Scholar

[38] I. M. Sezan, “An overview of convex projections theory and its application to image recovery problems,” Ultramicroscopy, vol. 40, no. 1, pp. 55–67, 1992. 10.1016/0304-3991(92)90234-BSearch in Google Scholar

[39] M. Tawarmalani and N. V. Sahinidis, Convexification and global optimization in continuous and mixed-integer nonlinear programming: theory, algorithms, software, and applications. Springer Science & Business Media, Berlin, Heidelberg, 2013, vol. 65. Search in Google Scholar

[40] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends® Optim., vol. 1, no. 3, pp. 127–239, 2014. 10.1561/2400000003Search in Google Scholar

[41] R. B. Holmes, “Smoothness of certain metric projections on Hilbert space,” Trans. Am. Math. Soc., vol. 184, pp. 87–100, 1973. 10.1090/S0002-9947-1973-0326252-2Search in Google Scholar

[42] S. Fitzpatrick and R. R. Phelps, “Differentiability of the metric projection in Hilbert space,” Trans. Am. Math. Soc., vol. 270, no. 2, pp. 483–501, 1982. 10.1090/S0002-9947-1982-0645326-5Search in Google Scholar

[43] A. Shapiro, “Existence and differentiability of metric projections in Hilbert spaces,” SIAM J. Optim., vol. 4, no. 1, pp. 130–141, 1994. 10.1137/0804006Search in Google Scholar

[44] J.-B. Hiriart-Urruty and C. Lemaréchal, Convex analysis and minimization algorithms I: Fundamentals. Springer Science & Business Media, Berlin, Heidelberg, 2013, vol. 305. Search in Google Scholar

[45] T. R. Rockafellar and R. J.-B. Wets, Variational analysis. Springer Science & Business Media, Berlin, Heidelberg, 2009, vol. 317. Search in Google Scholar

[46] A. Beck, First-order methods in optimization. SIAM, Philadelphia, Pennsylvania, USA, 2017. 10.1137/1.9781611974997Search in Google Scholar

[47] J. J. Moreau, “Fonctions convexes duales et points proximaux dans un espace Hilbertien,” C. R. Hebd. Séances. Acad. Sci., vol. 255, pp. 2897–2899, 1962. Search in Google Scholar

[48] J.-J. Moreau, “Proximité et dualité dans un espace Hilbertien,” Bull. Soc. Math. Fr., vol. 93, pp. 273–299, 1965. 10.24033/bsmf.1625Search in Google Scholar

[49] M. Fukushima and H. Mine, “A generalized proximal point algorithm for certain non-convex minimization problems,” Int. J. Syst. Sci., vol. 12, no. 8, pp. 989–1000, 1981. 10.1080/00207728108963798Search in Google Scholar

[50] A. Kaplan and R. Tichatschke, “Proximal point methods and nonconvex optimization,” J. Global Optim., vol. 13, pp. 389–406, 1998. 10.1023/A:1008321423879Search in Google Scholar

[51] J. Nocedal and S. J. Wright, Numerical optimization. Springer, 1999. 10.1007/b98874Search in Google Scholar

[52] G. Liii and T. K. Pong, “Global convergence of splitting methods for nonconvex composite optimization,” SIAM J. Optim., vol. 25, no. 4, pp. 2434–2460, 2015. 10.1137/140998135Search in Google Scholar

[53] M. Hong, Z.-Q. Luo, and M. Razaviyayn, “Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems,” SIAM J. Optim., vol. 26, no. 1, pp. 337–364, 2016. 10.1137/140990309Search in Google Scholar

[54] R. I. Boţ and D.-K. Nguyen, “The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates,” Math. Oper. Res., vol. 45, no. 2, pp. 682–712, 2020. 10.1287/moor.2019.1008Search in Google Scholar

[55] C. Chen, B. He, Y. Ye, and X. Yuan, “The direct extension of admm for multi-block convex minimization problems is not necessarily convergent,” Math. Program., vol. 155, no. 1–2, pp. 57–79, 2016. 10.1007/s10107-014-0826-5Search in Google Scholar

[56] D. Han and X. Yuan, “A note on the alternating direction method of multipliers,” J. Optim. Theory Appl., vol. 155, pp. 227–238, 2012. 10.1007/s10957-012-0003-zSearch in Google Scholar

[57] X. Cai, D. Han, and X. Yuan, “On the convergence of the direct extension of admm for three-block separable convex minimization models with one strongly convex function,” Comput. Optim. Appl., vol. 66, pp. 39–73, 2017. 10.1007/s10589-016-9860-ySearch in Google Scholar

[58] M. Hong and Z.-Q. Luo, “On the linear convergence of the alternating direction method of multipliers,” Math. Program., vol. 162, no. 1–2, pp. 165–199, 2017. 10.1007/s10107-016-1034-2Search in Google Scholar

[59] Y. Wang, W. Yin, and J. Zeng, “Global convergence of admm in nonconvex nonsmooth optimization,” J. Sci. Comput., vol. 78, pp. 29–63, 2019. 10.1007/s10915-018-0757-zSearch in Google Scholar

[60] J.-J. Mei, Y. Dong, T.-Z. Huang, and W. Yin, “Cauchy noise removal by nonconvex admm with convergence guarantees,” J. Sci. Comput., vol. 74, pp. 743–766, 2018. 10.1007/s10915-017-0460-5Search in Google Scholar

[61] F. Wang, W. Cao, and Z. Xuuu, “Convergence of multi-block bregman admm for nonconvex composite problems,” Sci. China Inform. Sci., vol. 61, pp. 1–12, 2018. 10.1007/s11432-017-9367-6Search in Google Scholar

[62] M. Yashtini, Multi-block nonconvex nonsmooth proximal ADMM: Convergence and rates under Kurdyka-Łojasiewicz property, arXiv e-prints, arXiv:2009.04014, Sep 2020. 10.1007/s10957-021-01919-7Search in Google Scholar

[63] S. Bartz, R. Campoy, and H. M. Phan, “An adaptive alternating direction method of multipliers,” J. Optim. Theory Appl., vol. 195, no. 3, pp. 1019–1055, 2022. 10.1007/s10957-022-02098-9Search in Google Scholar

[64] Y. Yang, Q.-S. Jia, Z. Xu, X. Guan, and C. J. Spanos, “Proximal admm for nonconvex and nonsmooth optimization,” Automatica, vol. 146, p. 110551, 2022. 10.1016/j.automatica.2022.110551Search in Google Scholar

[65] R. F. Barber and E. Y. Sidky, “Convergence for nonconvex admm, with applications to ct imaging,” J. Mach. Learn. Res., vol. 25, no. 38, pp. 1–46, 2024. Search in Google Scholar

[66] Y. Zeng, J. Bai, S. Wang, and Z. Wang, A unified inexact stochastic admm for composite nonconvex and nonsmooth optimization, 2024. arXiv: http://arXiv.org/abs/arXiv:2403.02015. 10.1016/j.automatica.2024.111554Search in Google Scholar

[67] Z. Xue, Q. Ma, and Y. Dang, “A modified approach to distributed bregman admm for a class of nonconvex consensus problems,” J. Math., vol. 2025, pp. 1–12, 2025. 10.1155/jom/9558795Search in Google Scholar

[68] L. Chen, D. Sun, and K.-C. Toh, “A note on the convergence of admm for linearly constrained convex optimization problems,” Comput. Optim. Appl., vol. 66, pp. 327–343, 2017. 10.1007/s10589-016-9864-7Search in Google Scholar

Received: 2024-08-06
Revised: 2025-03-14
Accepted: 2025-07-04
Published Online: 2025-09-24

© 2025 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Review Article
  2. Enhancing IoT network security: a literature review of intrusion detection systems and their adaptability to emerging threats
  3. Research Articles
  4. Intelligent data collection algorithm research for WSNs
  5. A novel behavioral health care dataset creation from multiple drug review datasets and drugs prescription using EDA
  6. Speech emotion recognition using long-term average spectrum
  7. PLASMA-Privacy-Preserved Lightweight and Secure Multi-level Authentication scheme for IoMT-based smart healthcare
  8. Basketball action recognition by fusing video recognition techniques with an SSD target detection algorithm
  9. Evaluating impact of different factors on electric vehicle charging demand
  10. An in-depth exploration of supervised and semi-supervised learning on face recognition
  11. The reform of the teaching mode of aesthetic education for university students based on digital media technology
  12. QCI-WSC: Estimation and prediction of QoS confidence interval for web service composition based on Bootstrap
  13. Line segment using displacement prior
  14. 3D reconstruction study of motion blur non-coded targets based on the iterative relaxation method
  15. Overcoming the cold-start challenge in recommender systems: A novel two-stage framework
  16. Optimization of multi-objective recognition based on video tracking technology
  17. An ADMM-based heuristic algorithm for optimization problems over nonconvex second-order cone
  18. A multiscale and dual-loss network for pulmonary nodule classification
  19. Artificial intelligence enabled microgrid power generation prediction
  20. Special Issue on AI based Techniques in Wireless Sensor Networks
  21. Blended teaching design of UMU interactive learning platform for cultivating students’ cultural literacy
  22. Special Issue on Informatics 2024
  23. Analysis of different IDS-based machine learning models for secure data transmission in IoT networks
  24. Using artificial intelligence tools for level of service classifications within the smart city concept
  25. Applying metaheuristic methods for staffing in railway depots
  26. Interacting with vector databases by means of domain-specific language
  27. Data analysis for efficient dynamic IoT task scheduling in a simulated edge cloud environment
  28. Analysis of the resilience of open source smart home platforms to DDoS attacks
  29. Comparison of various in-order iterator implementations in C++
Downloaded on 29.1.2026 from https://www.degruyterbrill.com/document/doi/10.1515/comp-2025-0039/html
Scroll to top button