Refining uniform approximation algorithm for low-rank Chebyshev embeddings

Stanislav Morozov; Dmitry Zheltkov; Alexander Osinsky

doi:10.1515/rnam-2024-0027

40% Rabatt

auf Fachbücher bei De Gruyter Brill *

Artikel

Refining uniform approximation algorithm for low-rank Chebyshev embeddings

Stanislav Morozov , Dmitry Zheltkov und Alexander Osinsky

Veröffentlicht/Copyright: 31. Oktober 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Russian Journal of Numerical Analysis and Mathematical Modelling Band 39 Heft 5

Abstract

Nowadays, low-rank approximations are a critical component of many numerical procedures. Traditionally the problem of low-rank approximation of matrices is solved in unitary invariant norms such as Frobenius or spectral norm due to the existence of efficient methods for constructing approximations. However, recent results discover the potential of low-rank approximations in the Chebyshev norm, which naturally arises in many applications. In this paper, we investigate the problem of uniform approximation of vectors, which is the main component in the low-rank approximations of matrices in the Chebyshev norm. The principal novelty of this paper is the accelerated algorithm for solving uniform approximation problems. We also analyze the iterative procedure of the proposed algorithm and demonstrate that it has a geometric convergence rate. Finally, we provide an extensive numerical evaluation, which demonstrates the effectiveness of the proposed procedures.

Keywords: Chebyshev norm; uniform approximation; alternating minimization

MSC 2010: 65F30

Funding statement: This work was supported by the Moscow Center of Fundamental and Applied Mathematics at INM RAS (Agreement with the Ministry of Education and Science of the Russian Federation No. 075-15-2022-286).

References

[1] S. Budzinskiy, Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors. arXiv preprint arXiv:2308.16097, 2023.Suche in Google Scholar

[2] S. Budzinskiy, On the distance to low-rank matrices in the maximum norm. Linear Algebra and its Applications 688 (2024), 44–58.10.1016/j.laa.2024.02.012Suche in Google Scholar

[3] V. Daugavet, Uniform approximation of a function of two variables, tabulated as the product of functions of a single variable. USSR Computational Mathematics and Mathematical Physics 11 (1971), No. 2, 1–16.10.1016/0041-5553(71)90160-1Suche in Google Scholar

[4] V. K. Dzyadyk, On the approximation of functions on sets consisting of a finite number of points. Theory of Function Approximation and its Applications, 1974, pp. 69–80.Suche in Google Scholar

[5] V. K. Dzyadyk, Introduction to the Theory of Uniform Approximation of Functions by Polynomials. Nauka, Moscow, 1977 (in Russian).Suche in Google Scholar

[6] N. Gillis and Y. Shitov, Low-rank matrix approximation in the infinity norm. Linear Algebra and its Applications 581 (2019), 367–382.10.1016/j.laa.2019.07.017Suche in Google Scholar

[7] G. H. Golub and C. F. Van Loan, Matrix Computations. JHU Press, Baltimore–London, 2013.10.56021/9781421407944Suche in Google Scholar

[8] X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua, Fast matrix factorization for online recommendation with implicit feedback. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 549–558.10.1145/2911451.2911489Suche in Google Scholar

[9] S. A. Matveev, A. P. Smirnov, and E. Tyrtyshnikov, A fast numerical method for the Cauchy problem for the Smoluchowski equation. Journal of Computational Physics 282 (2015), 23–32.10.1016/j.jcp.2014.11.003Suche in Google Scholar

[10] S. Morozov, M. Smirnov, and N. Zamarashkin, On the optimal rank-1 approximation of matrices in the Chebyshev norm. Linear Algebra and its Applications 679 (2023), 4–29.10.1016/j.laa.2023.09.007Suche in Google Scholar

[11] A. Osinsky, Rectangular maximum volume and projective volume search algorithms. arXiv preprint arXiv:1809.02334, 2018.Suche in Google Scholar

[12] T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, 6655–6659.10.1109/ICASSP.2013.6638949Suche in Google Scholar

[13] S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W.-k. Liao, and A. Choudhary, Data compression for the exascale computing era-survey. Supercomputing Frontiers and Innovations 1 (2014), No. 2, 76–88.10.14529/jsfi140205Suche in Google Scholar

[14] M. Udell and A. Townsend, Why are big data matrices approximately low rank? SIAM Journal on Mathematics of Data Science 1 (2019), No. 1, 144–160.10.1137/18M1183480Suche in Google Scholar

[15] N. Zamarashkin, S. Morozov, and E. Tyrtyshnikov, On the best approximation algorithm by low-rank matrices in Chebyshev’s norm. Computational Mathematics and Mathematical Physics 62 (2022), No. 5, 701–718.10.1134/S0965542522050141Suche in Google Scholar

A Proof of the lemma

In this section we provide the proof for Lemma 4.1. We will need several technical lemmas.

Lemma A.1

For any pairwise distinct i, j, and k if

sign(D^jiDjD^kiDk)=−1

then

sign(D^ijDiD^kjDk)=1.

Proof

Let us assume the opposite, that sign(D^ijDiD^kjDk)=−1. Since D^ij=(−1)i−j+1D^ji, we have

sign(D^jiDjD^kiDk)=−1sign(D^jiDiD^kjDk)=(−1)i−j.

Let sign(DkD^ji)=δ, where δ = ± 1. Then

sign(DjD^ki)=−δsign(DiD^kj)=(−1)i−jδsign(DkD^ji)=δ.

Since V is a Chebyshev matrix, the vector vⁱ can be uniquely represented as

vi=αvj+γvk+z1 (A.1)

where z₁ belongs to the linear hull of v¹, …, v^r+1, which do not include vⁱ, v^j, and v^k. We also represent

vj=βvi+z2 (A.2)

where z₂ belongs to the linear hull of v¹, …, v^r+1, v^r+2, which do not include vⁱ, v^j, and v^k.

From (A.1) and the properties of the determinant we obtain that

Dj=α(−1)i−j+1Di.

Similarly, from (A.2) and the properties of the determinant, we can deduce that

D^ki=−βD^kj.

Hence,

DjD^ki=α(−1)i−j+1Di(−1)βD^kj=αβ(−1)i−jDiD^kj.

Taking the signs from both sides we get that

−δ=sign(αβ)(−1)i−j(−1)i−jδ

whence

sign(αβ)=−1.

Note that the coefficients in (A.1) can be expressed by Cramer's formulas, and due to V being the Chebyshev matrix, we have α, γ ≠ 0. Then we can express the vector v^k from (A.1) as

vk=1γvi−αγvj−1γz1.

Then

Dj=−αγ(−1)k−j+1Dk=αγ(−1)k−jDk. (A.3)

From (A.1) and (A.2) we have

vj=β(αvj+γvk+z1)+z2

hence

(1−αβ)vj=βγvk+βz1+z2. (A.4)

Note that βz₁ + z₂ is a linear combination of vectors v¹, …, v^r+1, v^r+2, which do not include vⁱ, v^j, and v^k. Since from the arguments above βγ ≠ 0, on the right-hand side of (A.4), there is a non-trivial linear combination of r linearly independent vectors, whence βγv^k + βz₁ + z₂ ≠ 0. Therefore, 1 – αβ ≠ 0. Then

vj=βγ1−αβvk+βz1+z21−αβ.

and we can derive that

D^ki=βγ1−αβ(−1)k−j+1D^ji (A.5)

so we have from (A.3) and (A.5) that

DjD^ki=αγ(−1)k−jDkβγ1−αβ(−1)k−j+1D^ji=αβ1−αβ(−1)DkD^ji.

Taking the signs from both sides we get that

−δ=sign(αβ)sign(1−αβ)(−1)δ

therefore,

sign(αβ)=sign(1−αβ)=−1

which is the contradiction.□

Lemma A.2

Let all components of the vector w ∈ ℝ^r+2 be non-zero and such that the signs in the sequence

w1D1,w2D2,…,wn+1Dn+1

alternate. Let

sign(w^jiD^jiw^kiD^ki)=(−1)j−k+1.

Then

if i = j, then

sign(w^jkD^jkw^kkD^kk)=(−1)j−k;
if i = k, then

sign(w^kjD^kjw^jjD^jj)=(−1)j−k;
if i ≠ j and i ≠ k, then

sign(w^ijD^ijw^kjD^kj)=(−1)i−k

and

sign(w^ikD^ikw^jkD^jk)=(−1)i−j.

Proof

Let j = k. Then the condition of the lemma cannot be fulfilled, since

sign(w^jiD^jiw^jiD^ji)=1≠(−1)j−k+1=−1.

Let i = j ≠ k. Then the condition of the lemma can be written as

sign(w^iiD^iiw^kiD^ki)=(−1)i−k+1

which by the definitions of D^ii and ŵⁱ writes as

sign(wr+2DiwkD^ki)=(−1)i−k+1.

Since the signs in the sequence

w1D1,w2D2,…,wr+1Dr+1

alternate,

sign(wiDiwkDk)=(−1)i−k.

Let us multiply the last two equations and get that

sign(wr+2wiDkD^ki)=sign(wiDiwkDk)sign(wr+2DiwkD^ki)=(−1)i−k(−1)i−k+1=−1.

Since D^ik=(−1)i−k+1D^ki, we have that

sign(wn+2DkwiD^ik)=(−1)(−1)i−k+1=(−1)i−k.

It remains to note that w^kk=wr+2,w^ik=wi, and D^kk=Dk, whence

sign(w^kkD^kkw^ikD^ik)=(−1)i−k.

Thus, the first part of the lemma is proved. The second part can be obtained from the first one by rearranging the factors.

Let us prove the third part. Let i, j, and k be pairwise distinct. Then the condition of the lemma can be written as

sign(wjD^jiwkD^ki)=(−1)j−k+1. (A.6)

Let us prove that in this case

sign(wiD^ijwkD^kj)=(−1)i−k.

On the contrary, let

sign(wiD^ijwkD^kj)=(−1)i−k+1. (A.7)

Multiplying (A.6) and

sign(wjDjwkDk)=(−1)j−k

we get that

sign(D^jiDjD^kiDk)=sign(wjD^jiwkD^ki)sign(wjDjwkDk)=(−1)j−k+1(−1)j−k=−1. (A.8)

Similarly, multiplying (A.7) and

sign(wiDiwkDk)=(−1)i−k

we get that

sign(D^ijDiD^kjDk)=sign(wiD^ijwkD^kj)sign(wiDiwkDk)=(−1)i−k+1(−1)i−k=−1. (A.9)

It remains to note that (A.8) and (A.9) contradict Lemma A.1, whence

sign(wiD^ijwkD^kj)=(−1)i−k. (A.10)

If we multiply (A.6) and (A.10), we get that

sign(wjD^jiwkD^ki)sign(wiD^ijwkD^kj)=sign(wjD^kiwiD^kjD^ijD^ji)=(−1)i−j+1sign(wjD^kjwiD^ki)

whence

sign(wjD^kjwiD^ki)=(−1)i−j+1(−1)j−k+1(−1)i−k=1.

Due to D^ki=(−1)k−i+1D^ik and D^kj=(−1)j−k+1D^jk, we get that

sign(wjD^jkwiD^ik)=(−1)k−i+1(−1)j−k+1=(−1)j−i. (A.11)

It remains to note that (A.10) and (A.11), up to notation, correspond to the statement of the lemma.□

Lemma A.3

Let a matrix S ∈ ℝ^r×r be such that s_ij ∈ {–1, 1} and satisfy the following property: if for a triple (i, j, k), where 1 ⩽ i, j, k ⩽ r, we have s_ijs_ik = (–1)^j–k+1, then

if i = j, then

skiskk=(−1)i−k;
if i = k, then

sjisjj=(−1)i−j;
if i, j, and k are pairwise distinct, then

sjisjk=(−1)i−k

and

skiskj=(−1)i−j.

Then the matrix S has a row with the alternating signs.

Proof

Let us prove the lemma by induction. Let r = 2. If s₁₁ s₁₂ = –1, then the first row has the alternating signs. Otherwise, s₁₁s₁₂ = 1 and the condition of the lemma is fulfilled for i = 1, j = 1, k = 2. Then

s21s22=−1

and the second row has alternating signs.

Let us assume the statement is true for r – 1 and prove it for the matrix of size r × r. If the condition of the lemma is fulfilled for a matrix, it is also fulfilled for its principal submatrix. By the induction hypothesis, it contains a row t with the alternating signs. If s_t,r–1s_t,r = –1, then the signs alternate in the row t in the whole matrix. Let s_t,r–1s_t,r = 1. Then, due to the alternation of signs in the row t, we have

st,ist,r=(−1)i−r+1,i=1,…,r.

Then for i = t we have

sr,tsr,r=(−1)t−r

and for i ≠ t, we have

sr,tsr,i=(−1)t−i,i≠t

sr,tsr,i=(−1)t−i,i=1,…,r

which corresponds to the alternance of signs in the last row.□

Proof of Lemma 4.1

Let us define the matrix S ∈ ℝ^(r+1)×(r+1) such that ski=sign(w^ikD^ik). It is easy to see that the conditions of Lemma A.2 are met, from which the requirements of Lemma A.3 follow, which in turn implies the statement of Lemma 4.1.□

Received: 2024-08-21

Accepted: 2024-08-29

Published Online: 2024-10-31

Published in Print: 2024-11-26

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/rnam-2024-0027

Schlagwörter für diesen Artikel

Chebyshev norm; uniform approximation; alternating minimization