Polynomial sequences in discrete nilpotent groups of step 2

Alexandru D. Ionescu; Ákos Magyar; Mariusz Mirek; Tomasz Z. Szarek

doi:10.1515/ans-2023-0085

Article Open Access

Polynomial sequences in discrete nilpotent groups of step 2

Alexandru D. Ionescu , Ákos Magyar , Mariusz Mirek and Tomasz Z. Szarek

Published/Copyright: August 7, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Advanced Nonlinear Studies Volume 23 Issue 1

Abstract

We discuss some of our work on averages along polynomial sequences in nilpotent groups of step 2. Our main results include boundedness of associated maximal functions and singular integrals operators, an almost everywhere pointwise convergence theorem for ergodic averages along polynomial sequences, and a nilpotent Waring theorem. Our proofs are based on analytical tools, such as a nilpotent Weyl inequality, and on complex almost-orthogonality arguments that are designed to replace Fourier transform tools, which are not available in the noncommutative nilpotent setting. In particular, we present what we call a nilpotent circle method that allows us to adapt some of the ideas of the classical circle method to the setting of nilpotent groups.

Keywords: discrete nilpotent groups; pointwise ergodic theorems; nilpotent circle method; Weyl inequality

MSC 2010: 37A30; 37A45; 42B25

1 Introduction

The goal of this article is twofold. We first review some recent results on averages of functions along polynomial sequences in discrete nilpotent Lie groups of step 2, and the main ideas in the proofs. Then we use one of the main ingredients, a nilpotent Weyl inequality, to prove a new theorem on a nilpotent version of the Waring problem.

The natural general setting for our analysis consists of a discrete nilpotent group G of step d, which by definition is assumed to be a discrete, co-compact subgroup of a connected and simply connected nilpotent Lie group G # of step d , and a polynomial sequence A : Z → G , which is a map satisfying A ( 0 ) = 1 and D k 0 A ≡ 1 for some k 0 ≥ 1 . Here, D k is the k -fold differencing operator defined recursively by

D 0 A ( n ) ≔ A ( n ) , D k + 1 A ( n ) ≔ D k A ( n ) − 1 D k A ( n + 1 ) , n ∈ Z .

We consider a class of operators defined by taking averages along polynomial sequences in discrete nilpotent groups. As in the continuous case, one can consider discrete maximal operators, which have applications to pointwise ergodic theorems, and discrete Calderón-Zygmund operators.

1.1 The main theorem

Our main theorem in this article concerns L p boundedness of maximal averages along polynomial sequences in discrete nilpotent groups of step 2, L p pointwise ergodic theorems, and L 2 boundedness of singular integrals. More precisely:

Theorem 1.1

(Main result) Assume that G is a discrete nilpotent group G of step 2, and A : Z → G is a polynomial sequence. Then:

( ℓ p boundedness of maximal averages) Assume f : G → C is a function, and let
ℳ f ( g ) ≔ sup N ≥ 0 1 2 N + 1 ∑ ∣ n ∣ ≤ N ∣ f ( A − 1 ( n ) ⋅ g ) ∣ , g ∈ G .
Then, for any p ∈ ( 1 , ∞ ] ,
‖ ℳ f ‖ ℓ p ( G ) ≲ p ‖ f ‖ ℓ p ( G ) .
( L p pointwise ergodic theorems) Assume G acts by measure-preserving transformations on a σ -finite measure space X , f ∈ L p ( X ) , p ∈ ( 1 , ∞ ) , and let
(1.1) A N f ( x ) ≔ 1 2 N + 1 ∑ ∣ n ∣ ≤ N f ( A − 1 ( n ) ⋅ x ) , x ∈ X .
Then the sequence A N f converges pointwise almost everywhere and in the L p norm as N → ∞ .
( ℓ 2 boundedness of singular averages) Assume K : R → R is a Calderón-Zygmund kernel, i.e., a C 1 function satisfying
(1.2) sup t ∈ R [ ( 1 + ∣ t ∣ ) ∣ K ( t ) ∣ + ( 1 + ∣ t ∣ ) 2 ∣ K ′ ( t ) ∣ ] ≤ 1 , sup N ≥ 0 ∫ − N N K ( t ) d t ≤ 1 .

Assume that f : G → C is a (compactly supported) function, and let

H f ( g ) ≔ ∑ n ∈ Z K ( n ) f ( A − 1 ( n ) ⋅ g ) , g ∈ G .

Then

‖ H f ‖ ℓ 2 ( G ) ≲ ‖ f ‖ ℓ 2 ( G ) .

The theorem follows by combining the main results in [35] for parts (i) and (ii) and [37] for part (iii). We discuss now some connections between this theorem and other related results in the literature.

1.1.1 Continuous Radon transforms

The discrete maximal averages and the discrete singular averages defined in Theorem 1.1 can be thought of as discrete analog of the continuous Radon transforms, which are averages along suitable curves or surfaces. The theory of continuous Radon transforms has been extensively studied, motivated mainly by problems at the interface of Fourier analysis and geometry of surfaces in Euclidean spaces or nilpotent groups, and is very well understood. This includes L q estimates for the full range of exponents q > 1 and multidimensional averages, see, e.g., [17,18,51].

1.1.2 The Furstenberg-Bergelson-Leibman conjecture

Discrete averages, both of the maximal and singular type, have been considered motivated mainly by open problems in ergodic theory. A fundamental problem in the ergodic theory is to establish convergence in norm and pointwise almost everywhere for the polynomial ergodic averages as in (1.1) as N → ∞ for functions f ∈ L p ( X ) , 1 ≤ p ≤ ∞ . The problem goes back to at least the early 1930s with von Neumann’s mean ergodic theorem [56] and Birkhoff’s pointwise ergodic theorem [10] and led to profound extensions such as Bourgain’s polynomial pointwise ergodic theorem [11–13] and Furstenberg’s ergodic proof [25] of Szemerédi’s theorem [53] in particular. Furstenberg’s proof was also the starting point of ergodic Ramsey theory, which resulted in many natural generalizations of Szemerédi’s theorem, including a polynomial Szemerédi theorem of Bergelson and Leibman [7].

This motivates the following far reaching conjecture known as the Furstenberg-Bergelson-Leibman conjecture [8, Section 5.5, p. 468].

Conjecture 1.2

Assume that d , k ≥ 1 are integers, ( X , ℬ ( X ) , μ ) is a probability space, and assume that T 1 , … , T d : X → X is a given family of invertible measure-preserving transformations on the space ( X , ℬ ( X ) , μ ) that generates a nilpotent group of step k . Assume that m ≥ 1 is an integer and P 1 , 1 , … , P i , j , … , P d , m : Z → Z are polynomial maps with integer coefficients such that P i , j ( 0 ) = 0 . Then for any f 1 , … , f m ∈ L ∞ ( X ) , the nonconventional multilinear polynomial averages

(1.3) A N ; X , T 1 , … , T d P 1 , 1 , … , P d , m ( f 1 , … , f m ) ( x ) = 1 2 N + 1 ∑ n ∈ [ − N , N ] ∩ Z ∏ j = 1 m f j ( T 1 P 1 , j ( n ) ⋯ T d P d , j ( n ) x )

converge for μ -almost every x ∈ X as N → ∞ .

Conjecture 1.2 is a major open problem in the ergodic theory that was promoted in person by Furstenberg, see [1, p. 6662], before being published in [8]. Our main result, Theorem 1.1 (ii), proves this conjecture in the linear case m = 1 , provided that the family of transformations T 1 , … , T d : X → X generates a nilpotent group of step k = 2 .

1.1.3 Earlier pointwise ergodic theorems

The basic linear case m = d = k = 1 with P 1 , 1 ( n ) = n follows from Birkhoff’s original ergodic theorem [10]. On the other hand, the commutative case m = d = k = 1 with an arbitrary polynomial P = P 1 , 1 with integer coefficients was a famous open problem of Bellow [3] and Furstenberg [26], solved by Bourgain in his breakthrough papers [11–13].

Some particular examples of averages (1.3) with m = 1 and polynomial mappings with degree at most two in the step two nilpotent setting were studied in [36,45].

The multilinear theory m ≥ 2 , in contrast to the linear theory, is widely open even in the commutative case k = 1 . Only a few results in the bilinear m = 2 and commutative d = k = 1 setting are known. Bourgain [14] proved pointwise convergence when P 1 , 1 ( n ) = a n and P 1 , 2 ( n ) = b n , a , b ∈ Z . More recently, Krause et al. [40] established pointwise convergence for the polynomial Furstenberg and Weiss averages [27,28] corresponding to P 1 , 1 ( n ) = n and P 1 , 2 ( n ) = P ( n ) , deg P ≥ 2 .

1.1.4 Norm convergence

Except for these few cases, there are no other results concerning pointwise convergence for the averages (1.3). The situation is completely different, however, for the question of norm convergence, which is much better understood.

A breakthrough article of Walsh [57] (see also [1]) gives a complete picture of L 2 ( X ) norm convergence of the averages (1.3) for any T 1 , … , T d ∈ G , where G is a nilpotent group of transformations of a probability space. Prior to this, there was an extensive body of research toward establishing L 2 ( X ) norm convergence, including groundbreaking works of Host and Kra [31], Ziegler [59], Bergelson [4], and Leibman [42]. See also [2,19,24,32,54] and the survey articles [5,6,23] for more details and references, including a comprehensive historical background.

1.1.5 Additional remarks

Bergelson and Leibman [8] showed that convergence may fail if the transformations T 1 , … , T d generate a solvable group, so the nilpotent setting is probably the appropriate setting for Conjecture 1.2. The restriction p > 1 is necessary in the case of nonlinear polynomials as shown in [15,41].

If ( X , ℬ ( X ) , μ ) is a probability space and the family of measure preserving transformations ( T 1 , … , T d 1 ) is totally ergodic, then Theorem 1.1(ii) implies that

(1.4) lim N → ∞ A N ; X P 1 , … , P d 1 ( f ) ( x ) = ∫ X f ( y ) d μ ( y ) ,

μ -almost everywhere on X . We recall that a family of measure preserving transformations ( T 1 , … , T d 1 ) is called ergodic on X if T j − 1 ( B ) = B for all j ∈ { 1 , … , d 1 } implies μ ( B ) = 0 or μ ( B ) = 1 and is called totally ergodic if the family ( T 1 n , … , T d 1 n ) is ergodic for all n ∈ Z + .

1.2 The universal step-two group G 0

The proof of Theorem 1.1 will follow from our second main result, Theorem 1.3, for averages on universal nilpotent groups of step two. We start with some definitions. For integers d ≥ 1 , we define

Y d ≔ { ( l 1 , l 2 ) ∈ Z × Z : 0 ≤ l 2 < l 1 ≤ d }

and the “universal” step-two nilpotent Lie groups G 0 # = G 0 # ( d )

(1.5) G 0 # ≔ { ( x l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d : x l 1 l 2 ∈ R } ,

with the group multiplication law

(1.6) [ x ⋅ y ] l 1 l 2 ≔ x l 1 0 + y l 1 0 if l 1 ∈ { 1 , … , d } and l 2 = 0 , x l 1 l 2 + y l 1 l 2 + x l 1 0 y l 2 0 if l 1 ∈ { 1 , … , d } and l 2 ∈ { 1 , … , l 1 − 1 } .

Alternatively, we can also define the group G 0 # as the set of elements

(1.7) g = ( g ( 1 ) , g ( 2 ) ) , g ( 1 ) = ( g l 1 0 ) l 1 ∈ { 1 , … , d } ∈ R d , g ( 2 ) = ( g l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ′ ∈ R d ′ ,

where d ′ ≔ d ( d − 1 ) ∕ 2 and Y d ′ ≔ { ( l 1 , l 2 ) ∈ Y d : l 2 ≥ 1 } . Letting

(1.8) R 0 : R d × R d → R d ′ denote the bilinear form [ R 0 ( x , y ) ] l 1 l 2 ≔ x l 1 0 y l 2 0 ,

we notice that the product rule in the group G 0 # is given by

(1.9) [ g ⋅ h ] ( 1 ) ≔ g ( 1 ) + h ( 1 ) , [ g ⋅ h ] ( 2 ) ≔ g ( 2 ) + h ( 2 ) + R 0 ( g ( 1 ) , h ( 1 ) )

if g = ( g ( 1 ) , g ( 2 ) ) and h = ( h ( 1 ) , h ( 2 ) ) . For any g = ( g ( 1 ) , g ( 2 ) ) ∈ G 0 # , its inverse is given by

g − 1 = ( − g ( 1 ) , − g ( 2 ) + R 0 ( g ( 1 ) , g ( 1 ) ) ) .

The second variable of g = ( g ( 1 ) , g ( 2 ) ) ∈ G 0 # is called the central variable. Based on the product structure (1.9) of the group G 0 # , it is not difficult to see that g ⋅ h = h ⋅ g for any g = ( g ( 1 ) , g ( 2 ) ) ∈ G 0 # and h = ( 0 , h ( 2 ) ) ∈ G 0 # .

Let G 0 = G 0 ( d ) denote the discrete subgroup

(1.10) G 0 ≔ G 0 # ∩ Z ∣ Y d ∣ .

Let A 0 : R → G 0 # denote the canonical polynomial map (or the moment curve on G 0 # )

(1.11) [ A 0 ( x ) ] l 1 l 2 ≔ x l 1 if l 2 = 0 , 0 if l 2 ≠ 0 ,

and notice that A 0 ( Z ) ⊆ G 0 . For x = ( x l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ∈ G 0 # and Λ ∈ ( 0 , ∞ ) , we define

(1.12) Λ ∘ x ≔ ( Λ l 1 + l 2 x l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ∈ G 0 # .

Notice that the dilations Λ ∘ are group homomorphisms on the group G 0 that are compatible with the map A 0 , i.e., Λ ∘ A 0 ( x ) = A 0 ( Λ x ) .

Let χ : R → [ 0 , 1 ] be a smooth function supported on the interval [ − 2 , 2 ] . Given any real number N ≥ 1 and a function f : G 0 → C , we can define a smoothed average along the moment curve A 0 by the formula

(1.13) M N χ ( f ) ( x ) ≔ ∑ n ∈ Z N − 1 χ ( N − 1 n ) f ( A 0 ( n ) − 1 ⋅ x ) , x ∈ G 0 .

The main advantage of working on the group G 0 with the polynomial map A 0 is the presence of the compatible dilations Λ ∘ defined in (1.12), which lead to a natural family of associated balls. This can be efficiently exploited by noting that M N χ is a convolution operator on G 0 .

The convolution of functions on the group G 0 is defined by the formula

(1.14) ( f ∗ g ) ( x ) ≔ ∑ y ∈ G 0 f ( y − 1 ⋅ x ) g ( y ) = ∑ z ∈ G 0 f ( z ) g ( x ⋅ z − 1 ) .

Then it is not difficult to see that M N χ ( f ) ( x ) = f * G N χ ( x ) , where

(1.15) G N χ ( x ) ≔ ∑ n ∈ Z N − 1 χ ( N − 1 n ) 1 { A 0 ( n ) } ( x ) , x ∈ G 0 .

We are now ready to state our second main result.

Theorem 1.3

(Boundedness on G 0 ) Let G 0 = G 0 ( d ) , d ≥ 1 , be the discrete nilpotent group defined in (1.10) and A 0 the polynomial sequence defined in (1.11). Then

(i) (Maximal estimates) If 1 < p ≤ ∞ and f ∈ ℓ p ( G 0 ) , then

(1.16) ∥ sup N ≥ 1 ∣ M N χ ( f ) ∣ ∥ ℓ p ( G 0 ) ≲ p ‖ f ‖ ℓ p ( G 0 ) ,

where M N χ is defined as in (1.13).

(ii) (Long variational estimates) If 1 < p < ∞ and ρ > max p , p p − 1 , and τ ∈ ( 1 , 2 ] , then

(1.17) ∥ V ρ ( M N χ ( f ) : N ∈ D τ ) ∥ ℓ p ( G 0 ) ≲ p , ρ , τ ‖ f ‖ ℓ p ( G 0 ) ,

where D τ ≔ { τ n : n ∈ N } . See (1.18) for the definition of the ρ -variation seminorms V ρ .

(iii) (Singular integrals) If K : R → R is a Calderón-Zygmund kernel as in (1.2), f : G 0 → C is a (compactly supported) function, and

H 0 f ( g ) ≔ ∑ n ∈ Z K ( n ) f ( A 0 − 1 ( n ) ⋅ g ) , g ∈ G 0 ,

then

‖ H 0 f ‖ ℓ 2 ( G 0 ) ≲ ‖ f ‖ ℓ 2 ( G 0 ) .

1.3 Remarks and overview of the proof

We discuss now some of the main ideas in the proofs of Theorems 1.1 and 1.3.

1.3.1 The Calderón transference principle

One can show that Theorem 1.1 is a consequence of Theorem 1.3 upon performing lifting arguments and adapting the Calderón transference principle [16]. Indeed, if G # is a connected and simply connected nilpotent Lie group of step 2, with Lie algebra G , then one can choose the so-called exponential coordinates of the second kind associated to a Malcev basis of the Lie algebra G (see [20], Sec. 1.2) in such a way that

G # ≃ { ( x , y ) ∈ R b 1 × R b 2 : ( x , y ) ⋅ ( x ′ , y ′ ) = ( x + x ′ , y + y ′ + R ( x , x ′ ) ) } ,

where b 1 , b 2 ∈ Z + depend on the Lie algebra G and R : R b 1 × R b 1 → R b 2 is a bilinear form.

Moreover, if G ≤ G # is a discrete co-compact subgroup, then one can choose the Malcev basis such that the discrete subgroup G is identified with the integer lattice Z b = Z b 1 × Z b 2 (see [20], Thm. 5.1.6 and Prop. 5.3.2). Recall that A : Z → G is a polynomial sequence satisfying A ( 0 ) = 1 . The main point is that one can choose d sufficiently large and a group morphism T : G 0 → G # such that

A ( n ) = T ( A 0 ( n ) ) for any n ∈ Z .

Then one can use this group morphism to transfer bounds on operators on the universal group G 0 to bounds on operators on the group G . Theorem 1.1 is thus a consequence of Theorem 1.3, and our main goal therefore is to prove Theorem 1.3.

1.3.2 The variation spaces V ρ

For any family ( a t : t ∈ I ) of elements of C indexed by a totally ordered set I , and any exponent 1 ≤ ρ < ∞ , the ρ -variation seminorm is defined by

(1.18) V ρ ( a t : t ∈ I ) ≔ sup J ∈ Z + sup t 0 < ⋯ < t J t j ∈ I ∑ j = 0 J − 1 ∣ a ( t j + 1 ) − a ( t j ) ∣ ρ 1 ∕ ρ ,

where the supremum is taken over all finite increasing sequences in I . It is easy to see that ρ ↦ V ρ is nonincreasing, and for every t 0 ∈ I , one has

(1.19) sup t ∈ I ∣ a t ∣ ≤ ∣ a t 0 ∣ + V ρ ( a t : t ∈ I ) ≤ sup t ∈ I ∣ a t ∣ + V ρ ( a t : t ∈ I ) .

In particular, the maximal estimate (1.16) follows from the variational estimate (1.17). The main point of proving stronger variational estimates such as (1.17), with general parameters τ ∈ ( 1 , 2 ] , is that it gives an elegant path to deriving pointwise ergodic theorems (which would not follow directly just from maximal estimates such as (1.16)). At the same time, the analysis of variational inequalities has many similarities with the analysis of maximal inequalities and is not substantially more difficult. This is due in large part to the Rademacher-Menshov inequality (see [47, Lemma 2.5]): for any 2 ≤ ρ < ∞ and j 0 , m ∈ N so that j 0 < 2 m and any sequence of complex numbers ( a k : k ∈ N ) , we have

(1.20) V ρ ( a j : j 0 ≤ j ≤ 2 m ) ≤ 2 ∑ i = 0 m ∑ j ∈ [ j 0 2 − i , 2 m − i − 1 ] ∩ Z ∣ a ( j + 1 ) 2 i − a j 2 i ∣ 2 1 ∕ 2 .

1.3.3 ℓ p theory

The problem of passing from ℓ 2 estimates to ℓ p estimates in the context of discrete polynomial averages has been investigated extensively in recent years (see, for example, [46], and the references therein).

The full ℓ p ( G 0 ) bounds in Theorem 1.3 rely on first proving ℓ 2 ( G 0 ) bounds. In fact, we first establish (1.17) for p = 2 and ρ > 2 . Then we use the positivity of the operators M N χ (i.e., M N χ ( f ) ≥ 0 if f ≥ 0 ) to prove the maximal operator bounds (1.16) for all p ∈ ( 1 , ∞ ] . Finally, we use vector-valued interpolation between the bounds (1.17) with p = 2 and ρ > 2 and (1.16) with p ∈ ( 1 , ∞ ] to complete the proof of Theorem 1.3.

1.3.4 Some technical remarks

Theorem 1.3 (i) and (ii) extends the results of [46,48] to the noncommutative, nilpotent setting. Its conclusions remain true for rough averages, i.e., when χ = 1 [ − 1 , 1 ] in (1.13), but it is more convenient to work with smooth averages.

The restriction p > 1 in Theorem 1.3 (i) and (ii) is sharp due to [15,41]. However, the range of ρ > max p , p p − 1 is only sharp when p = 2 due to Lépingle’s inequality [43]. One could hope to improve this to the full range ρ > 2 , but we do not address this here since the limited range ρ > max p , p p − 1 is already sufficient for us to establish Theorem 1.1.

The restriction p = 2 in the singular integral bounds in part (ii) is probably not necessary. In the commutative case, one can prove boundedness in the full range p ∈ ( 1 , ∞ ) [38], but the proof depends on exploiting certain Fourier multipliers, and we do not know at this time if a similar definitive result holds in the nilpotent case.

1.4 The main difficulty and a nilpotent circle method

Bourgain’s seminal articles [11–13] generated a large amount of research and progress in the field. Many other discrete operators have been analyzed by many authors motivated by problems in analysis and ergodic theory. See, for example, [15,36,38–41,44–48,49,50,52] for some results of this type and more references. A common feature of all of these results, which plays a crucial role in the proofs, is that one can use Fourier analysis techniques, in particular, the powerful framework of the classical circle method, to perform the analysis.

Our situation in Theorem 1.3 is different. The main conceptual issue is that there is no good Fourier transform on nilpotent groups, compatible with the structure of the underlying convolution operators and at the level of analytical precision of the classical circle method. At a more technical level, there is no good resolution of the delta function compatible with the group multiplication on the group G 0 . This prevents us from using a naive implementation of the circle method. The classical delta function resolution

1 { 0 } ( x − 1 ⋅ y ) = ∫ T d × T d ′ e ( ( y ( 1 ) − x ( 1 ) ) . θ ( 1 ) ) e ( ( y ( 2 ) − x ( 2 ) ) . θ ( 2 ) ) d θ ( 1 ) d θ ( 2 )

does not detect the group multiplication correctly. Here, ( y ( 1 ) − x ( 1 ) ) . θ ( 1 ) and ( y ( 2 ) − x ( 2 ) ) . θ ( 2 ) denote the usual scalar product of vectors in R d and R d ′ , respectively.

These issues lead to very significant difficulties in the proof and require substantial new ideas. Our main new construction in [35] is what we call a nilpotent circle method, an iterative procedure, starting from the center of the group and moving down along its central series. At every stage, we identify “minor arcs,” and bound their contributions using Weyl’s inequalities (the classical Weyl inequality as well as a nilpotent Weyl inequality which was proved in [37]). The final stage involves “major arcs” analysis, which relies on a combination of continuous harmonic analysis on groups G 0 # and arithmetic harmonic analysis over finite integer rings modulo Q ∈ Z + . We outline this procedure in Section 3.

At the implementation level, classical Fourier techniques are replaced with almost orthogonality methods based on exploiting high order T ∗ T arguments for operators defined on the discrete group G 0 . Investigating high powers of T ∗ T (i.e., ( T ∗ T ) r for a large r ∈ Z + ) is consistent with a general heuristic lying behind the proof of Waring-type problems, which says that the more variables that occur in Waring-type equations, the easier it is to find solutions, and we are able to make this heuristic rigorous in our problem. Manipulating the parameter r , by taking r to be very large, we can always decide how many variables we have at our disposal, making our operators “smoother and smoother.”

1.5 General discrete nilpotent groups

The primary goal is, of course, to remove the restriction that the discrete nilpotent groups G in Theorem 1.1 are of step 2 and thus establish the full Conjecture 1.2 in the linear m = 1 case for arbitrary invertible measure-preserving transformations T 1 , … , T d that generate a nilpotent group of any step k ≥ 2 . The iterative argument we outline in Section 3 below could, in principle, be extended to higher step groups, at least as long as the group and the polynomial sequence have suitable “universal”-type structure, as one could try to go down along the central series of the group and prove minor arcs and transition estimates at every stage.

However, this is only possible if one can prove suitable analog of the nilpotent Weyl’s inequalities in Proposition 2.1 on general nilpotent groups of step k ≥ 3 . The point is to have a small (not necessarily optimal, but nontrivial) gain for bounds on oscillatory sums over many variables, corresponding to the kernels of high power ( T ∗ T ) r operators, whenever frequencies are restricted to the minor arcs. In our case, the formulas are explicit, see the identities (2.10), and we can use ideas of Birch [9] and Davenport [21] for Diophantine forms in many variables to control the induced oscillatory sums, but the analysis seems to be more complicated for the higher step nilpotent groups.

1.6 Waring-type problems

The classical Waring problem, solved by Hilbert [30] in 1909, concerns the possibility of writing any positive integer as a sum of finitely many p powers: for any integer p ≥ 1 , there is r = r ( p ) such that any integer y ∈ Z + can be written in the form

(1.21) y = ∑ i = 1 r m i p , for some nonnegative integers m 1 , … , m r .

There is a vast amount of literature on this problem and its many possible extensions. In particular, the symmetric system of equations

(1.22) ∑ j = 1 r ( m j s − n j s ) = 0 ( 1 ≤ s ≤ d ) ,

first studied by Vinogradov [55] in relation to the Waring problem, have been the focus of intense recent research, see [58] for some recent results.

We are interested here in understanding the analogs question on our discrete nilpotent Lie group G 0 , and for our given polynomial sequence A 0 : Can one represent elements g ∈ G 0 in the form

(1.23) g = A 0 ( n 1 ) − 1 ⋅ A 0 ( m 1 ) ⋅ … ⋅ A 0 ( n r ) − 1 ⋅ A 0 ( m r ) ,

for some integers n 1 , m 1 , … , n r , m r , provided that r is large enough? We are, in fact, interested in proving a quantitative statement on the number of such representations, for integers n 1 , m 1 , … , n r , m r ∈ [ N ] ≔ [ − N , N ] ∩ Z .

We notice that many group elements g cannot be written in the form (1.23), due to local obstructions; for instance, if g can be represented in the form (1.23) then necessarily g 10 ≡ g 20 ≡ … ≡ g d 0 ( mod 2 ) , g 10 = g 30 = … ( mod 3 ) etc. We remark also there is a significant difference between the classical Waring problem (1.21) and its nilpotent analog (1.23), namely the positivity of the p -powers which imposes size restrictions on the variables x i in terms of the prescribed output value y .

For integers r , N ≥ 1 and g ∈ G 0 let

(1.24) S r , N ( g ) ≔ ∣ { ( m , n ) ∈ [ N ] 2 r : A 0 ( n 1 ) − 1 ⋅ A 0 ( m 1 ) ⋅ … ⋅ A 0 ( n r ) − 1 ⋅ A 0 ( m r ) = g } ∣ .

A qualitative variant of the Waring problem on nilpotent groups was recently investigated in [33,34], see also the references given there. Our main result in this direction is the following quantitative version on the nilpotent group G 0 :

Theorem 1.4

(i) There is an integer r 0 ( d ) ≥ 1 such that if r ≥ r 0 ( d ) is sufficiently large and g ∈ G 0 , then

(1.25) S r , N ( g ) = N 2 r ( ∏ ( l 1 , l 2 ) ∈ Y d N − ∣ l 1 ∣ − ∣ l 2 ∣ ) S ( g ) ∫ R d + d ′ Φ ( ζ ) e ( − ( N − 1 ∘ g ) . ζ ) d ζ + O r ( N − 1 ∕ 2 ) ,

uniformly in N ∈ N . Here, the singular series S is defined by

(1.26) S ( g ) ≔ ∑ a ∕ q ∈ ℛ ∞ d + d ′ ∩ [ 0 , 1 ) d + d ′ G ( a ∕ q ) ¯ e ( − g . a ∕ q )

and the singular integral Φ is defined by

(1.27) Φ ( ξ ) = ∫ [ − 1 , 1 ] 2 r e ( D ( z , w ) . ξ ) d z d w , ξ ∈ R d + d ′ .

In particular, all elements g ∈ G 0 cannot be represented in the form (1.23) more than a constant times the expected number of representations, i.e.,

(1.28) S r , N ( g ) ≲ r N 2 r ( ∏ ( l 1 , l 2 ) ∈ Y d N − ∣ l 1 ∣ − ∣ l 2 ∣ ) for any g ∈ G 0 .

(ii) For r 0 ( d ) as earlier, if r ≥ r 0 ( d ) and r is even, then there is a sufficiently large integer Q = Q ( r ) such that

(1.29) S r , N ( g ) = N 2 r ( ∏ ( l 1 , l 2 ) ∈ Y d N − ∣ l 1 ∣ − ∣ l 2 ∣ ) [ c r ( g ) + O r , g ( N − 1 ∕ 2 ) ] ,

for any g ∈ H Q (see definition (3.32)), where c r ( g ) ≈ r 1 uniformly in g .

We will provide a complete proof of this theorem in Section 4.

1.7 Organization

The rest of this article is organized as follows: in Section 2, we present several nilpotent Weyl estimates proved in [37], which play a key role in the analysis of minor arcs. In Section 3, we outline our main new method, the nilpotent circle method, developed in [35] to prove maximal and variational estimates on nilpotent groups. In Section 4, we prove Theorem 1.4, the main new result in this article.

2 A nilpotent Weyl inequality on the group G 0

In this section, we derive explicit formulas used in high order T ∗ T arguments and discuss a key ingredient in our analysis, namely, Weyl inequalities on the group G 0 .

2.1 High-order T ∗ T arguments and product kernels

Many of our ℓ 2 ( G 0 ) estimates will be based on high-order T ∗ T arguments. Assume that

S 1 , T 1 , … , S r , T r : ℓ 2 ( G 0 ) → ℓ 2 ( G 0 )

are convolution operators defined by some ℓ 1 ( G 0 ) kernels L 1 , K 1 , … , L r , K r : G 0 → C , i.e., S j f = f ∗ L j and T j f = f ∗ K j for j ∈ { 1 , … , r } . Then the adjoint operators S 1 ∗ , … , S r ∗ are also convolution operators, defined by the kernels L 1 ∗ , … , L r ∗ given by

L j ∗ ( g ) ≔ L j ( g − 1 ) ¯ .

Moreover, by using (1.14), for any f ∈ ℓ 2 ( G 0 ) and x ∈ G 0 , we have

(2.1) ( S 1 ∗ T 1 … S r ∗ T r f ) ( x ) = ∑ h 1 , g 1 , … , h r , g r ∈ G 0 ∏ j = 1 r L j ∗ ( h j ) K j ( g j ) f ( g r − 1 ⋅ h r − 1 ⋅ … ⋅ g 1 − 1 ⋅ h 1 − 1 ⋅ x ) .

In other words ( S 1 ∗ T 1 … S r ∗ T r f ) ( x ) = ( f ∗ A r ) ( x ) , where the kernel A r is given by

(2.2) A r ( y ) ≔ ∑ h 1 , g 1 , … , h r , g r ∈ G 0 ∏ j = 1 r L j ( h j ) ¯ K j ( g j ) 1 { 0 } ( g r − 1 ⋅ h r ⋅ … ⋅ g 1 − 1 ⋅ h 1 ⋅ y ) .

To use these formulas, we decompose h j = ( h j ( 1 ) , h j ( 2 ) ) , g j = ( g j ( 1 ) , g j ( 2 ) ) as in (1.7). Then

(2.3) [ h 1 − 1 ⋅ g 1 ⋅ … ⋅ h r − 1 ⋅ g r ] ( 1 ) = ∑ 1 ≤ j ≤ r ( − h j ( 1 ) + g j ( 1 ) ) ,

(2.4) [ h 1 − 1 ⋅ g 1 ⋅ … ⋅ h r − 1 ⋅ g r ] ( 2 ) = ∑ 1 ≤ j ≤ r { − ( h j ( 2 ) − g j ( 2 ) ) + R 0 ( h j ( 1 ) , h j ( 1 ) − g j ( 1 ) ) } + ∑ 1 ≤ l < j ≤ r R 0 ( − h l ( 1 ) + g l ( 1 ) , − h j ( 1 ) + g j ( 1 ) ) ,

as a consequence of applying (1.9) inductively.

In many of our applications, the operators S 1 , T 1 , … , S r , T r are equal and, more importantly, are defined by a kernel K that has product structure, i.e.,

(2.5) S 1 f = T 1 f = … = S r f = T r f = f ∗ K , K ( g ) = K ( g ( 1 ) , g ( 2 ) ) = K ( 1 ) ( g ( 1 ) ) K ( 2 ) ( g ( 2 ) ) .

In this case, we can derive an additional formula for the kernel A r . We use the identity

1 { 0 } ( x − 1 ⋅ y ) = ∫ T d × T d ′ e ( ( y ( 1 ) − x ( 1 ) ) . θ ( 1 ) ) e ( ( y ( 2 ) − x ( 2 ) ) . θ ( 2 ) ) d θ ( 1 ) d θ ( 2 ) ,

where e ( z ) ≔ e 2 π i z . The formula (2.2) shows that

(2.6) A r ( y ) = ∫ T d × T d ′ e ( y ( 1 ) . θ ( 1 ) ) e ( y ( 2 ) . θ ( 2 ) ) Σ r ( θ ( 1 ) , θ ( 2 ) ) d θ ( 1 ) d θ ( 2 ) ,

where

Σ r ( θ ( 1 ) , θ ( 2 ) ) ≔ ∑ h j , g j ∈ G 0 ∏ j = 1 r K ( h j ) ¯ K ( g j ) ∏ i = 1 2 e ( − [ h 1 − 1 ⋅ g 1 ⋅ … ⋅ h r − 1 ⋅ g r ] ( i ) . θ ( i ) ) .

Recalling the product formula (2.5), we can write

(2.7) Σ r ( θ ( 1 ) , θ ( 2 ) ) = Π r ( θ ( 1 ) , θ ( 2 ) ) Ω r ( θ ( 2 ) ) ,

for any ( θ ( 1 ) , θ ( 2 ) ) ∈ T d × T d ′ , where

(2.8) Π r ( θ ( 1 ) , θ ( 2 ) ) ≔ ∑ h j ( 1 ) , g j ( 1 ) ∈ Z d ∏ j = 1 r K ( 1 ) ( h j ( 1 ) ) ¯ K ( 1 ) ( g j ( 1 ) ) e θ ( 1 ) . ∑ 1 ≤ j ≤ r ( h j ( 1 ) − g j ( 1 ) ) × e − θ ( 2 ) . ∑ 1 ≤ j ≤ r R 0 ( h j ( 1 ) , h j ( 1 ) − g j ( 1 ) ) + ∑ 1 ≤ l < j ≤ r R 0 ( − h l ( 1 ) + g l ( 1 ) , − h j ( 1 ) + g j ( 1 ) )

and

(2.9) Ω r ( θ ( 2 ) ) ≔ ∑ h j ( 2 ) , g j ( 2 ) ∈ Z d ′ ∏ j = 1 r K ( 2 ) ( h j ( 2 ) ) ¯ K ( 2 ) ( g j ( 2 ) ) e θ ( 2 ) . ∑ 1 ≤ j ≤ r ( h j ( 2 ) − g j ( 2 ) ) = ∑ g ( 2 ) ∈ Z d ′ K ( 2 ) ( g ( 2 ) ) e ( − θ ( 2 ) . g ( 2 ) ) 2 r .

2.2 Weyl estimates

After applying high-order T ∗ T arguments, we often need to estimate exponential sums and oscillatory integrals involving polynomial phases. With the notation in Section 1.2, for r ≥ 1 , let D , D ˜ : R r × R r → G 0 # be defined by

(2.10) D ( ( n 1 , … , n r ) , ( m 1 , … , m r ) ) ≔ A 0 ( n 1 ) − 1 ⋅ A 0 ( m 1 ) ⋅ … ⋅ A 0 ( n r ) − 1 ⋅ A 0 ( m r ) , D ˜ ( ( n 1 , … , n r ) , ( m 1 , … , m r ) ) ≔ A 0 ( n 1 ) ⋅ A 0 ( m 1 ) − 1 ⋅ … ⋅ A 0 ( n r ) ⋅ A 0 ( m r ) − 1 .

By definition, we have

[ A 0 ( n ) ] l 1 l 2 = n l 1 if l 2 = 0 , 0 if l 2 ≥ 1 , [ A 0 ( n ) − 1 ] l 1 l 2 = − n l 1 if l 2 = 0 , n l 1 + l 2 if l 2 ≥ 1 .

Thus, by using (2.3) and (2.4), for x = ( x 1 , … , x r ) ∈ R r and y = ( y 1 , … , y r ) ∈ R r , one has

(2.11) [ D ( x , y ) ] l 1 l 2 = ∑ j = 1 r ( y j l 1 − x j l 1 ) if l 2 = 0 , ∑ 1 ≤ j 1 < j 2 ≤ r ( y j 1 l 1 − x j 1 l 1 ) ( y j 2 l 2 − x j 2 l 2 ) + ∑ j = 1 r ( x j l 1 + l 2 − x j l 1 y j l 2 ) if l 2 ≥ 1 ,

and

(2.12) [ D ˜ ( x , y ) ] l 1 l 2 = ∑ j = 1 r ( x j l 1 − y j l 1 ) if l 2 = 0 , ∑ 1 ≤ j 1 < j 2 ≤ r ( x j 1 l 1 − y j 1 l 1 ) ( x j 2 l 2 − y j 2 l 2 ) + ∑ j = 1 r ( y j l 1 + l 2 − x j l 1 y j l 2 ) if l 2 ≥ 1 .

For P ∈ Z + assume ϕ P ( j ) , ψ P ( j ) : R → R , j ∈ { 1 , … , r } , are C 1 ( R ) functions with the properties

(2.13) sup 1 ≤ j ≤ r [ ∣ ϕ P ( j ) ∣ + ∣ ψ P ( j ) ∣ ] ≤ 1 [ − P , P ] , sup 1 ≤ j ≤ r ∫ R ∣ [ ϕ P ( j ) ] ′ ( x ) ∣ + ∣ [ ψ P ( j ) ] ′ ( x ) ∣ d x ≤ 1 .

For θ = ( θ l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ∈ R ∣ Y d ∣ , r ∈ Z + , and P ∈ Z + let

S P , r ( θ ) = ∑ n , m ∈ Z r e ( − D ( n , m ) . θ ) ∏ j = 1 r ϕ P ( j ) ( n j ) ψ P ( j ) ( m j )

and

S ˜ P , r ( θ ) = ∑ n , m ∈ Z r e ( − D ˜ ( n , m ) . θ ) ∏ j = 1 r ϕ P ( j ) ( n j ) ψ P ( j ) ( m j ) ,

where D and D ˜ are defined as in (2.11)–(2.12).

The following key estimates are proved in [37, Proposition 5.1 and Lemma 3.1]:

Proposition 2.1

(i) (Nilpotent Weyl estimate) For any ε > 0 , there is r = r ( ε , d ) ∈ Z + sufficiently large such that for all P ∈ Z + we have

(2.14) ∣ S P , r ( θ ) ∣ + ∣ S ˜ P , r ( θ ) ∣ ≲ ε P 2 r P − 1 ∕ ε ,

provided that there is ( l 1 , l 2 ) ∈ Y d and an irreducible fraction a ∕ q ∈ Q , q ∈ Z + , such that

(2.15) ∣ θ l 1 l 2 − a ∕ q ∣ ≤ 1 ∕ q 2 and q ∈ [ P ε , P l 1 + l 2 − ε ] .

(ii) (Nilpotent Gauss sums) For any irreducible fraction a ∕ q ∈ Q ∣ Y d ∣ , a = ( a l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ∈ Z ∣ Y d ∣ , q ∈ Z + , we define the arithmetic coefficients

(2.16) G ( a ∕ q ) ≔ q − 2 r ∑ v , w ∈ Z q r e ( − D ( v , w ) . ( a ∕ q ) ) , G ˜ ( a ∕ q ) ≔ q − 2 r ∑ v , w ∈ Z q r e ( − D ˜ ( v , w ) . ( a ∕ q ) ) .

Then for any ε > 0 , there is r = r ( ε , d ) ∈ Z + sufficiently large such that

(2.17) ∣ G ( a ∕ q ) ∣ + ∣ G ˜ ( a ∕ q ) ∣ ≲ ε q − 1 ∕ ε .

We also need a related integral estimate, see Lemma 5.4 in [37]:

Proposition 2.2

Given ε > 0 there is r = r ( ε , d ) sufficiently large as in Proposition 2.1, such that

(2.18) ∫ R r × R r ∏ j = 1 r ϕ j ( x j ) ψ j ( y j ) e ( − D ( x , y ) . β ) d x d y ≲ ⟨ β ⟩ − 1 ∕ ε , ∫ R r × R r ∏ j = 1 r ϕ j ( x j ) ψ j ( y j ) e ( − D ˜ ( x , y ) . β ) d x d y ≲ ⟨ β ⟩ − 1 ∕ ε ,

for any β ∈ R ∣ Y d ∣ (here and later on, we use the Japanese bracket notation ⟨ β ⟩ ≔ ( 1 + ∣ β ∣ 2 ) 1 ∕ 2 ) and for any C 1 ( R ) functions ϕ 1 , ψ 1 , … , ϕ r , ψ r : R → C satisfying, for any j ∈ { 1 , … , r } , the bounds

∣ ϕ j ∣ + ∣ ψ j ∣ ≤ 1 [ − 1 , 1 ] ( x ) , ∫ R [ ∣ ∂ x ϕ j ( x ) ∣ + ∣ ∂ x ψ j ( x ) ∣ ] d x ≤ 1 .

These statements should be compared with classical Weyl-type estimates, which are proved, for example, in [52, Proposition 1]:

Proposition 2.3

(i) Assume that P ≥ 1 is an integer and ϕ P : R → R is a C 1 ( R ) function satisfying

(2.19) ∣ ϕ P ∣ ≤ 1 [ − P , P ] , ∫ R ∣ ϕ P ′ ( x ) ∣ d x ≤ 1 .

Assume that ε > 0 and θ = ( θ 1 , … , θ d ) ∈ R d has the property that there is l ∈ { 1 , … , d } and an irreducible fraction a ∕ q ∈ Q with q ∈ Z + , such that

(2.20) ∣ θ l − a ∕ q ∣ ≤ 1 ∕ q 2 and q ∈ [ P ε , P l − ε ] .

Then there is a constant C ¯ = C ¯ d ≥ 1 such that

(2.21) ∑ n ∈ Z ϕ P ( n ) e ( − ( θ 1 n + … + θ d n d ) ) ≲ ε P 1 − ε ∕ C ¯ .

(ii) For any irreducible fraction θ = a ∕ q ∈ ( Z ∕ q ) d , a = ( a 1 , … , a d ) ∈ Z d , q ∈ Z + , we have

(2.22) q − 1 ∑ n ∈ Z q e ( − ( θ 1 n + … + θ d n d ) ) ≲ q − 1 ∕ C ¯ .

Notice a formal similarity between Propositions 2.1 and 2.3. They both involve a small but nontrivial gain of a power of P as soon as one of the coefficients of the relevant polynomials is far from rational numbers with small denominators. These estimates can therefore be used efficiently to estimate minor arcs contributions.

We note, however, that the proof of the nilpotent Weyl estimates in Proposition 2.1 is much more involved than the proof of Proposition 2.3. It relies on some classical ideas of Birch [9] and Davenport [21,22] on treating polynomials in many variables, but one has to identify and exploit suitable nondegeneracy properties of the explicit (but complicated) polynomials D and D ˜ in (2.11)–(2.12) to make the proof work. All the details of the proof are provided in [37, Section 5].

3 A nilpotent circle method

To illustrate our main method, we focus on a particular case of Theorem 1.3, namely, on proving boundedness of the maximal function M N χ on ℓ 2 ( G 0 ) . For simplicity of notation, for k ∈ N and x ∈ G 0 , let

(3.1) ℳ k f ( x ) ≔ M 2 k χ f ( x ) = ∑ n ∈ Z 2 − k χ ( 2 − k n ) f ( A 0 ( n ) − 1 ⋅ x ) = ( f ∗ K k ) ( x ) , K k ( x ) ≔ G 2 k χ ( x ) = ∑ n ∈ Z 2 − k χ ( 2 − k n ) 1 { A 0 ( n ) } ( x ) ,

see (1.13) and (1.15) for the definitions M N χ and G N χ , respectively. With this new notation, our main goal is to prove the following:

Theorem 3.1

For any f ∈ ℓ 2 ( G 0 ) , we have

(3.2) ∥ sup k ≥ 0 ∣ ℳ k f ∣ ∥ ℓ 2 ( G 0 ) ≲ ‖ f ‖ ℓ 2 ( G 0 ) .

In the rest of this section, we outline the proof of this theorem. Our main new construction is an iterative procedure, starting from the center of the group and moving down along its central series, which allows us to use some of the ideas of the classical circle method recursively at every stage. In our case of nilpotent groups of step two, the procedure consists of two basic stages and one additional step corresponding to “major arcs.”

Notice that the kernels K k have a product structure

(3.3) K k ( g ) ≔ L k ( g ( 1 ) ) 1 { 0 } ( g ( 2 ) ) , L k ( g ( 1 ) ) ≔ ∑ n ∈ Z 2 − k χ ( 2 − k n ) 1 { 0 } ( g ( 1 ) − A 0 ( 1 ) ( n ) ) ,

where A 0 ( 1 ) ( n ) ≔ ( n , … , n d ) ∈ Z d and g = ( g ( 1 ) , g ( 2 ) ) ∈ G 0 as in (1.7).

3.1 First stage reduction

We first decompose the singular kernel 1 { 0 } ( g ( 2 ) ) in the central variable g ( 2 ) into smoother kernels. For any s ∈ N and m ∈ Z + , we define the set of rational fractions

(3.4) ℛ s m ≔ { a ∕ q : a = ( a 1 , … , a m ) ∈ Z m , q ∈ [ 2 s , 2 s + 1 ) ∩ Z , gcd ( a 1 , … , a m , q ) = 1 } .

We define also ℛ ≤ a m ≔ ⋃ 0 ≤ s ≤ a ℛ s m . For x ( 1 ) = ( x l 1 0 ( 1 ) ) l 1 ∈ { 1 , … , d } ∈ R d , x ( 2 ) = ( x l 1 l 2 ( 2 ) ) ( l 1 , l 2 ) ∈ Y d ′ ∈ R d ′ and Λ ∈ ( 0 , ∞ ) , we define the partial dilations:

(3.5) Λ ∘ x ( 1 ) = ( Λ l 1 x l 1 0 ( 1 ) ) l 1 ∈ { 1 , … , d } ∈ R d , Λ ∘ x ( 2 ) = ( Λ l 1 + l 2 x l 1 l 2 ( 2 ) ) ( l 1 , l 2 ) ∈ Y d ′ ∈ R d ′ ,

which are induced by the group-dilations defined in (1.12).

We fix η 0 : R → [ 0 , 1 ] a smooth even function such that 1 [ − 1 , 1 ] ≤ η 0 ≤ 1 [ − 2 , 2 ] . For t ∈ R and integers j ≥ 1 , we define

(3.6) η j ( t ) ≔ η 0 ( 2 − j t ) − η 0 ( 2 − j + 1 t ) , 1 = ∑ j = 0 ∞ η j .

For any A ∈ [ 0 , ∞ ) , we define

(3.7) η ≤ A ≔ ∑ j ∈ [ 0 , A ] ∩ Z η j .

By a slight abuse of notation, we also let η j and η ≤ A denote the smooth radial functions on R m , m ≥ 1 , defined by η j ( x ) = η j ( ∣ x ∣ ) and η ≤ A ( x ) = η ≤ A ( ∣ x ∣ ) . We fix also two small constants δ = δ ( d ) ≪ δ ′ = δ ′ ( d ) such that δ ′ ∈ ( 0 , ( 10 d ) − 10 ] and δ ∈ ( 0 , ( δ ′ ) 4 ] , and a large constant D = D ( d ) ≫ δ − 8 , which depend on arithmetic properties of the polynomial sequence A 0 (more precisely on the structural constants in Propositions 2.1–2.2) such that

(3.8) 1 ≪ 1 ∕ δ ′ ≪ 1 ∕ δ ≪ r = r ( δ , δ ′ , d ) ≪ D .

For k ≥ D 2 , we fix two cutoff functions ϕ k ( 1 ) : R d → [ 0 , 1 ] , ϕ k ( 2 ) : R d ′ → [ 0 , 1 ] , such that

(3.9) ϕ k ( 1 ) ( g ( 1 ) ) ≔ η ≤ δ k ( 2 − k ∘ g ( 1 ) ) , ϕ k ( 2 ) ( g ( 2 ) ) ≔ η ≤ δ k ( 2 − k ∘ g ( 2 ) ) .

For k ∈ N so that k ≥ D 2 and for any 1-periodic sets of rationals A ⊆ Q d , ℬ ⊆ Q d ′ , we define the periodic Fourier multipliers by

(3.10) Ψ k , A ( ξ ( 1 ) ) ≔ ∑ a ∕ q ∈ A η ≤ δ ′ k ( 2 k ∘ ( ξ ( 1 ) − a ∕ q ) ) , ξ ( 1 ) ∈ T d , Ξ k , ℬ ( ξ ( 2 ) ) ≔ ∑ b ∕ q ∈ ℬ η ≤ δ k ( 2 k ∘ ( ξ ( 2 ) − b ∕ q ) ) , ξ ( 2 ) ∈ T d ′ .

For k ≥ D 2 and s ∈ [ 0 , δ k ] ∩ Z , we define the periodic Fourier multipliers Ξ k , s : R d ′ → [ 0 , 1 ] ,

(3.11) Ξ k , s ( ξ ( 2 ) ) ≔ Ξ k , ℛ s d ′ ( ξ ( 2 ) ) = ∑ a ∕ q ∈ ℛ s d ′ η ≤ δ k ( 2 k ∘ ( ξ ( 2 ) − a ∕ q ) ) .

For k ≥ D 2 , we write

(3.12) 1 { 0 } ( g ( 2 ) ) = ∫ T d ′ e ( g ( 2 ) . ξ ( 2 ) ) d ξ ( 2 ) = ∑ s ∈ [ 0 , δ k ] ∩ Z ∫ T d ′ e ( g ( 2 ) . ξ ( 2 ) ) Ξ k , s ( ξ ( 2 ) ) d ξ ( 2 ) + ∫ T d ′ e ( g ( 2 ) . ξ ( 2 ) ) Ξ k c ( ξ ( 2 ) ) d ξ ( 2 ) ,

where

(3.13) Ξ k c ≔ 1 − ∑ s ∈ [ 0 , δ k ] ∩ Z Ξ k , s .

Then we decompose K k = K k c + ∑ s ∈ [ 0 , δ k ] ∩ Z K k , s , where, with the notation in (3.3), we have

(3.14) K k , s ( g ) ≔ L k ( g ( 1 ) ) N k , s ( g ( 2 ) ) , K k c ( g ) ≔ L k ( g ( 1 ) ) N k c ( g ( 2 ) ) ,

and

(3.15) N k , s ( g ( 2 ) ) ≔ ϕ k ( 2 ) ( g ( 2 ) ) ∫ T d ′ e ( g ( 2 ) . ξ ( 2 ) ) Ξ k , s ( ξ ( 2 ) ) d ξ ( 2 ) , N k c ( g ( 2 ) ) ≔ ϕ k ( 2 ) ( g ( 2 ) ) ∫ T d ′ e ( g ( 2 ) . ξ ( 2 ) ) Ξ k c ( ξ ( 2 ) ) d ξ ( 2 ) .

We first show that we can bound the contributions of the minor arcs in the central variables.

Lemma 3.2

For any integer k ≥ D 2 and f ∈ ℓ 2 ( G 0 ) , we have

(3.16) ‖ f ∗ K k c ‖ ℓ 2 ( G 0 ) ≲ 2 − k ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 ) .

Then we prove our first transition estimate, i.e., we show that we can bound the contributions of the kernels K k , s corresponding to scales k ≥ 0 not very large. More precisely, for any s ≥ 0 , we define

(3.17) κ s ≔ 2 2 D ( s + 1 ) 2 .

Lemma 3.3

For any integer s ≥ 0 and f ∈ ℓ 2 ( G 0 ) , we have

(3.18) ∥ sup max ( D 2 , s ∕ δ ) ≤ k < 2 κ s ∣ f ∗ K k , s ∣ ∥ ℓ 2 ( G 0 ) ≲ 2 − s ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 ) .

In the commutative setting, minor arcs estimates such as (3.16) follow using Weyl estimates and the Plancherel theorem. As we do not have a useful Fourier transform on the group G 0 , our main tool to prove the bounds (3.16) is a high-order T ∗ T argument. More precisely, we analyze the kernel of the convolution operator { ( K k c ) ∗ K k c } r , where K k c f ≔ f ∗ K k c and r is sufficiently large, and show that its ℓ 1 ( G 0 ) norm is ≲ 2 − k . The main ingredient in this proof is the noncommutative Weyl estimate in Proposition 2.1 (i).

To prove the transition estimates (3.18), we use the Rademacher-Menshov inequality and Khintchine’s inequality (leading to logarithmic losses) to reduce to proving the bounds

(3.19) ∑ k ∈ [ J , 2 J ] ϰ k ( f ∗ H k , s ) ℓ 2 ( G 0 ) ≲ 2 − 4 s ∕ D 2 ∥ f ∥ ℓ 2 ( G 0 )

for any J ≥ max ( D 2 , s ∕ δ ) and any coefficients ϰ k ∈ [ − 1 , 1 ] , where H k , s ≔ K k + 1 , s − K k , s . For this, we use a high order version of the Cotlar-Stein lemma, which relies again on precise analysis of the kernel of the convolution operator { ( ℋ k , s ) ∗ ℋ k , s } r , where ℋ k , s f ≔ f ∗ H k , s and r is sufficiently large. The key exponential gain of 2 − 4 s ∕ D 2 in (3.19) is due to the noncommutative Gauss sums estimate, see Proposition 2.1 (ii).

3.2 Second stage reduction

In view of Lemmas 3.2–3.3, it remains to prove that

(3.20) ∥ sup k ≥ κ s ∣ f ∗ K k , s ∣ ∥ ℓ 2 ( G 0 ) ≲ 2 − s ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 )

for any fixed integer s ≥ 0 . The kernels K k , s are now reasonably well adapted to a natural family of nonisotropic balls in the central variables, at least when 2 s ≈ 1 , and we need to start decomposing in the noncentral variables.

We examine the kernels L k ( g ( 1 ) ) defined in (3.3) and rewrite them in the form

(3.21) L k ( g ( 1 ) ) = ∑ n ∈ Z 2 − k χ ( 2 − k n ) 1 { 0 } ( − A 0 ( 1 ) ( n ) + g ( 1 ) ) = ϕ k ( 1 ) ( g ( 1 ) ) ∫ T d e ( g ( 1 ) . ξ ( 1 ) ) S k ( ξ ( 1 ) ) d ξ ( 1 ) ,

where g ( 1 ) . ξ ( 1 ) denotes the usual scalar product of vectors in R d , and

(3.22) S k ( ξ ( 1 ) ) ≔ ∑ n ∈ Z 2 − k χ ( 2 − k n ) e ( − A 0 ( 1 ) ( n ) . ξ ( 1 ) ) .

For any integers Q ∈ Z + and m ∈ Z + , we define the set of fractions

(3.23) ℛ ˜ Q m ≔ { a ∕ Q : a = ( a 1 , … , a m ) ∈ Z m } .

For any integer s ≥ 0 , we fix a large denominator

(3.24) Q s ≔ ( ⌊ 2 D ( s + 1 ) ⌋ ) ! = 1 ⋅ 2 ⋅ … ⋅ ⌊ 2 D ( s + 1 ) ⌋ ,

and using (3.10) define the periodic multipliers

(3.25) Ψ k , s low ( ξ ( 1 ) ) ≔ Ψ k , ℛ ˜ Q s d ( ξ ( 1 ) ) = ∑ a ∕ q ∈ ℛ ˜ Q s d η ≤ δ ′ k ( 2 k ∘ ( ξ ( 1 ) − a ∕ q ) ) , Ψ k , s , t ( ξ ( 1 ) ) ≔ Ψ k , ℛ t d \ ℛ ˜ Q s d ( ξ ( 1 ) ) = ∑ a ∕ q ∈ ℛ t d \ ℛ ˜ Q s d η ≤ δ ′ k ( 2 k ∘ ( ξ ( 1 ) − a ∕ q ) ) , Ψ k c ( ξ ( 1 ) ) ≔ 1 − Ψ k , s low ( ξ ( 1 ) ) − ∑ t ∈ [ 0 , δ ′ k ] ∩ Z Ψ k , s , t ( ξ ( 1 ) ) = 1 − ∑ a ∕ q ∈ ℛ ≤ δ ′ k d η ≤ δ ′ k ( τ k ∘ ( ξ ( 1 ) − a ∕ q ) ) .

Since k ≥ κ s = 2 2 D ( s + 1 ) 2 , we see that Q s ≤ 2 δ 2 k . Therefore, the supports of the cutoff functions η ≤ δ ′ k ( 2 k ∘ ( ξ ( 1 ) − a ∕ q ) ) are all disjoint, and the multipliers Ψ k , s low , Ψ k , s , t , Ψ k c take values in the interval [ 0 , 1 ] . Notice also that Ψ k , s , t ≡ 0 unless t ≥ D ( s + 1 ) and that the cutoffs used in these definitions depend on δ ′ k not on δ k as in the case of the central variables.

We examine formula (3.21) and define the kernels L k , s low , L k , s , t , L k c : Z d → C by

(3.26) L ∗ ( g ( 1 ) ) = ϕ k ( 1 ) ( g ( 1 ) ) ∫ T d e ( g ( 1 ) . ξ ( 1 ) ) S k ( ξ ( 1 ) ) Ψ ∗ ( ξ ( 1 ) ) d ξ ( 1 ) ,

where ( L ∗ , Ψ ∗ ) ∈ { ( L k , s low , Ψ k , s low ) , ( L k , s , t , Ψ k , s , t ) , ( L k c , Ψ k c ) } . For any k ≥ κ s , we obtain K k , s = G k , s low + ∑ t ≤ δ ′ k G k , s , t + G k , s c , where the kernels G k , s low , G k , s , t , G k , s c : Z ∣ Y d ∣ → C are defined by

(3.27) G k , s low ( g ) ≔ L k , s low ( g ( 1 ) ) N k , s ( g ( 2 ) ) , G k , s , t ( g ) ≔ L k , s , t ( g ( 1 ) ) N k , s ( g ( 2 ) ) , G k , s c ( g ) ≔ L k c ( g ( 1 ) ) N k , s ( g ( 2 ) ) .

Our next step is to show that the contributions of the minor arcs corresponding to the kernels G k , s c can be suitably bounded:

Lemma 3.4

For any integers s ≥ 0 and k ≥ κ s , and for any f ∈ ℓ 2 ( G 0 ) , we have

(3.28) ‖ f ∗ G k , s c ‖ ℓ 2 ( G 0 ) ≲ 2 − k ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 ) .

Then we prove our second transition estimate, bounding the contributions of the operators defined by the kernels G k , s , t for intermediate values of k .

Lemma 3.5

For any integers s ≥ 0 , and t ≥ D ( s + 1 ) , and f ∈ ℓ 2 ( G 0 ) , we have

(3.29) ∥ sup max ( κ s , t ∕ δ ′ ) ≤ k < 2 κ t ∣ f ∗ G k , s , t ∣ ∥ ℓ 2 ( G 0 ) ≲ 2 − t ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 ) ,

where κ t = 2 2 D ( t + 1 ) 2 as in (3.17).

The proofs of these estimates are similar to the proofs of the corresponding first stage estimates (3.16) and (3.18), using high-order T ∗ T arguments. However, instead of using the nilpotent oscillatory sums estimates in Proposition 2.1, we use the classical estimates from Proposition 2.3 here. We emphasize, however, that the underlying nilpotent structure is very important and that these estimates are only possible after performing the two reductions in the first stage, namely, the restriction to major arcs corresponding to denominators ≈ 2 s and the restriction to parameters k ≥ κ s . We finally remark that the circle method could not have been applied simultaneously to both central and noncentral variables, as we would not have been able to control efficiently the phase functions arising in the corresponding exponential sums and oscillatory integrals, especially on major arcs.

3.3 Final stage: major arcs contributions

After these reductions, it remains to bound the contributions of the “major arcs” in both the central and the noncentral variables. More precisely, we prove the following bounds:

Lemma 3.6

(i) For any integer s ≥ 0 and f ∈ ℓ 2 ( G 0 ) , we have

(3.30) ∥ sup k ≥ κ s ∣ f ∗ G k , s low ∣ ∥ ℓ 2 ( G 0 ) ≲ 2 − s ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 ) .

(ii) For any integers s ≥ 0 , t ≥ D ( s + 1 ) , and f ∈ ℓ 2 ( G 0 ) , we have

(3.31) ∥ sup k ≥ κ t ∣ f ∗ G k , s , t ∣ ∥ ℓ 2 ( G 0 ) ≲ 2 − t ∕ D 2 ‖ f ‖ ℓ 2 ( G 0 ) .

The main idea here is different: we write the kernels G k , s low and G k , s , t as tensor products of two components up to acceptable errors. One of these components is essentially a maximal average operator on a continuous group, which can be analyzed using the classical method of Christ [17]. The other component is an arithmetic operator-valued analog of the classical Gauss sums, which leads to the key factors 2 − s ∕ D 2 and 2 − t ∕ D 2 in (3.30) and (3.31).

More precisely, for any integer Q ≥ 1 , we define the subgroup

(3.32) H Q ≔ { h = ( Q h l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ∈ G 0 : h l 1 , l 2 ∈ Z } .

Clearly, H Q ⊆ G 0 is a normal subgroup. Let J Q denote the coset

(3.33) J Q ≔ { b = ( b l 1 l 2 ) ( l 1 , l 2 ) ∈ Y d ∈ G 0 : b l 1 , l 2 ∈ Z ∩ [ 0 , Q − 1 ] } ,

with the natural induced group structure. Notice that

(3.34) the map ( b , h ) ↦ b ⋅ h defines a bijection from J Q × H Q to G 0 .

Assume that Q ≥ 1 and 2 k ≥ Q . For any a ∈ Z d and ξ ∈ R d , let

(3.35) J k ( ξ ) ≔ 2 − k ∫ R χ ( 2 − k x ) e [ − A 0 ( 1 ) ( x ) . ξ ] d x = ∫ R χ ( y ) e [ − A 0 ( 1 ) ( y ) . ( 2 k ∘ ξ ) ] d y , S ( a ∕ Q ) ≔ Q − 1 ∑ n ∈ Z Q e [ − A 0 ( 1 ) ( n ) . a ∕ Q ] .

The point is that the kernels G k , s low and G k , s , t can be decomposed as tensor products. Indeed, to decompose G k , s , t (the harder case), we set Q ≔ Q t = ( ⌊ 2 D ( t + 1 ) ⌋ ) ! as in (3.24). Then we show that if k ≥ κ t (so 2 k ≫ Q t 4 ), h ∈ H Q t and b 1 , b 2 ∈ G 0 satisfy ∣ b 1 ∣ + ∣ b 2 ∣ ≤ Q 4 , then

(3.36) G k , s , t ( b 1 ⋅ h ⋅ b 2 ) ≈ W k , Q t ( h ) V ℛ t d \ ℛ ˜ Q s d , ℛ s d ′ , Q t ( b 1 ⋅ b 2 ) ,

up to acceptable summable errors. Here,

W k , Q ( h ) ≔ Q d + d ′ ϕ k ( h ) ∫ R d × R d ′ η ≤ δ ′ k ( 2 k ∘ ξ ) η ≤ δ k ( 2 k ∘ θ ) e ( h . ( ξ , θ ) ) J k ( ξ ) d ξ d θ ,

V A , ℬ , Q ( b ) ≔ Q − d − d ′ ∑ σ ( 1 ) ∈ A ∩ [ 0 , 1 ) d S ( σ ( 1 ) ) e [ b ( 1 ) . ( σ ( 1 ) ) ] ∑ σ ( 2 ) ∈ ℬ ∩ [ 0 , 1 ) d ′ e [ b ( 2 ) . ( σ ( 2 ) ) ] ,

and ϕ k ( h ) ≔ ϕ k ( 1 ) ( h ( 1 ) ) ϕ k ( 2 ) ( h ( 2 ) ) , h = ( h ( 1 ) , h ( 2 ) ) ∈ H Q , is defined in (3.9), b = ( b ( 1 ) , b ( 2 ) ) ∈ G 0 , and the functions J k and S are defined in (3.35).

Finally, we show that the kernels V s , t ≔ V ℛ t d \ ℛ ˜ Q s d , ℛ s d ′ , Q t (which can be interpreted as an operator-valued Gauss sums) define bounded operators on ℓ 2 ( J Q t ) ,

‖ f ∗ J Q t V s , t ‖ ℓ 2 ( J Q t ) ≲ 2 − t ∕ D ‖ f ‖ ℓ 2 ( J Q t ) .

Moreover, the kernels W k , Q t are close to classical maximal operators and one can show that

∥ sup k ≥ κ t ∣ f ∗ H Q t W k , Q t ∣ ∥ ℓ 2 ( H Q t ) ≲ ‖ f ‖ ℓ 2 ( H Q t ) .

The desired bounds (3.31) follow using the approximation formula (3.36).

4 A nilpotent Waring theorem on the group G 0 : proof of Theorem 1.4

We prove now Theorem 1.4. Observe that D ( n , m ) = A 0 ( n 1 ) − 1 ⋅ A 0 ( m 1 ) ⋅ … ⋅ A 0 ( n r ) − 1 ⋅ A 0 ( m r ) . By using the classical delta function, we can write

(4.1) S r , N ( g ) = ∑ m , n ∈ [ N ] r ∫ T d + d ′ e ( D ( n , m ) . ξ ) e ( − g . ξ ) d ξ .

Step 1. We start by decomposing the integration in ξ into major and minor arcs. For any integer m ≥ 1 and any positive number M > 0 , we define the set of rational fractions

(4.2) ℛ ≤ M m ≔ { a ∕ q : a = ( a 1 , … , a m ) ∈ Z m , q ∈ [ 1 , M ] ∩ Z , gcd ( a 1 , … , a m , q ) = 1 } .

Notice that we use a bit different definition of ℛ ≤ M m than in (3.4). We fix a small constant δ = δ ( d ) ≪ 1 and a smooth radial function η 0 : R ∣ Y d ∣ → [ 0 , 1 ] such that 1 ∣ x ∣ ≤ 1 ≤ η 0 ( x ) ≤ 1 ∣ x ∣ ≤ 2 , x ∈ R ∣ Y d ∣ . For A > 0 , let η ≤ A ( x ) ≔ η 0 ( A − 1 x ) , x ∈ R ∣ Y d ∣ ; here, we use a bit different definition of η ≤ A than in (3.7). Then, we introduce the projections

(4.3) Ξ N ( ξ ) ≔ ∑ a ∕ q ∈ ℛ ≤ N δ d + d ′ η ≤ N δ ( N ∘ ( ξ − a ∕ q ) ) , ξ ∈ T d + d ′ , N ∈ N ,

and decompose the integration in (4.1) into major and minor arcs, i.e., we define

(4.4) S r , N , maj ( g ) ≔ ∑ m , n ∈ [ N ] r ∫ T d + d ′ e ( D ( n , m ) . ξ ) e ( − g . ξ ) Ξ N ( ξ ) d ξ ,

(4.5) S r , N , min ( g ) ≔ ∑ m , n ∈ [ N ] r ∫ T d + d ′ e ( D ( n , m ) . ξ ) e ( − g . ξ ) ( 1 − Ξ N ( ξ ) ) d ξ .

Notice that S r , N ( g ) = S r , N , min ( g ) + S r , N , maj ( g ) . Moreover,

(4.6) ∣ S r , N , min ( g ) ∣ ≲ r N 2 r − 1 ( ∏ ( l 1 , l 2 ) ∈ Y d N − ∣ l 1 ∣ − ∣ l 2 ∣ ) , N ∈ N , g ∈ G 0 ,

provided that r is sufficiently large, as a consequence of Proposition 2.1 (i) and the Dirichlet principle; in fact, we use Proposition 2.1(i) with ϕ N ( j ) = ψ N ( j ) = 1 [ N ] , 1 ≤ j ≤ r , which is still valid as can be seen by a careful reading of the proof of this result contained in [37]. Therefore, the contribution of the minor arcs S r , N , min ( g ) can be absorbed by the error term in (1.25).

Step 2. Next, we deal with the major arcs contributions. Notice that

(4.7) S r , N , maj ( g ) = ∑ a ∕ q ∈ ℛ ≤ N δ d + d ′ ∩ [ 0 , 1 ) d + d ′ e ( − g . a ∕ q ) ∫ R d + d ′ η ≤ N δ ( N ∘ ξ ) I r , N , a ∕ q ( ξ ) e ( − g . ξ ) d ξ ,

where

(4.8) I r , N , a ∕ q ( ξ ) = ∑ m , n ∈ [ N ] r e ( D ( n , m ) . ( a ∕ q ) ) e ( D ( n , m ) . ξ ) .

Observe that for a ∕ q ∈ ℛ ≤ N δ d + d ′ ∩ [ 0 , 1 ) d + d ′ and ∣ N ∘ ξ ∣ ≲ N δ , we have

I r , N , a ∕ q ( ξ ) = ∑ m , n ∈ [ N ∕ q ] r ∑ u , v ∈ Z q r e ( D ( v , w ) . ( a ∕ q ) ) e ( D ( q n , q m ) . ξ ) + O ( q N 2 r − 1 + δ ) = N 2 r G ( a ∕ q ) ¯ Φ ( N ∘ ξ ) + O ( q N 2 r − 1 + δ ) ,

where G ( a ∕ q ) is defined in (2.16) and Φ is defined in (1.27).

Therefore, if δ ≤ ( 10 d ) − 4 , then we have

(4.9) S r , N , maj ( g ) = N 2 r ( ∏ ( l 1 , l 2 ) ∈ Y d N − ∣ l 1 ∣ − ∣ l 2 ∣ ) × ∑ a ∕ q ∈ ℛ ≤ N δ d + d ′ ∩ [ 0 , 1 ) d + d ′ G ( a ∕ q ) ¯ e ( − g . a ∕ q ) ∫ R d + d ′ η ≤ N δ ( ξ ) Φ ( ξ ) e ( − g . ( N − 1 ∘ ξ ) ) d ξ + O r ( N − 1 ∕ 2 ) .

It follows from Proposition 2.1(ii) and Proposition 2.2 that

(4.10) ∣ G ( a ∕ q ) ∣ ≲ r q − 1 ∕ δ 2 , ( a , q ) = 1 ,

and

(4.11) ∣ Φ ( ζ ) ∣ ≲ r ⟨ ζ ⟩ − 1 ∕ δ 2 , ζ ∈ R d + d ′ ,

provided that r is sufficiently large. Therefore, recalling the definition (1.26),

(4.12) ∣ S ( g ) ∣ ≲ r 1 , S ( g ) − ∑ a ∕ q ∈ ℛ ≤ N δ d + d ′ ∩ [ 0 , 1 ) d + d ′ G ( a ∕ q ) ¯ e ( − g . a ∕ q ) ≲ r ∑ q ≥ N δ q d + d ′ − 1 ∕ δ 2 ≲ r N − 1 ∕ ( 2 δ ) .

Moreover, we have

(4.13) ∫ R d + d ′ η ≤ N δ ( ξ ) Φ ( ξ ) e ( − g . ( N − 1 ∘ ξ ) ) d ξ ≲ r 1 , ∫ R d + d ′ η ≤ N δ ( ξ ) Φ ( ξ ) e ( − g . ( N − 1 ∘ ξ ) ) d ξ − ∫ R d + d ′ Φ ( ξ ) e ( − g . ( N − 1 ∘ ξ ) ) d ξ ≲ r N − 1 ∕ ( 2 δ ) .

It follows from (4.9), (4.12), and (4.13) that

(4.14) S r , N , maj ( g ) = N 2 r ( ∏ ( l 1 , l 2 ) ∈ Y d N − ∣ l 1 ∣ − ∣ l 2 ∣ ) × S ( g ) ∫ R d + d ′ Φ ( ξ ) e ( − g . ( N − 1 ∘ ξ ) ) d ξ + O r ( N − 1 ∕ 2 ) .

The desired conclusion (1.25) follows using also (4.6). This completes the proof of part (i) of the theorem.

Step 3. We analyze now the singular series S defined in (1.26). Observe that

(4.15) S ( h ) = ∑ q ≥ 1 A ( q , h ) , A ( q , h ) ≔ ∑ ( a , q ) = 1 G ( a ∕ q ) ¯ e ( − h . a ∕ q ) ,

for any h ∈ G 0 . Notice that the sequence A ( q , h ) is multiplicative in the sense that A ( q 1 q 2 , h ) = A ( q 1 , h ) A ( q 2 , h ) provided that ( q 1 , q 2 ) = 1 and h ∈ G 0 . Therefore, letting P denote the set of primes,

(4.16) S ( h ) = ∏ p ∈ P B ( p , h ) , B ( p , h ) ≔ 1 + ∑ n ≥ 1 A ( p n , h ) .

For h ∈ G 0 and q ≥ 1 , let

(4.17) M ( q , h ) ≔ ∣ { ( m , n ) ∈ Z q 2 r : D ( n , m ) = h ( mod q ) } ∣ .

We prove that for any h ∈ G 0 , p ∈ P and integer n ≥ 1 , we have

(4.18) 1 + ∑ v = 1 n A ( p v , h ) = M ( p n , h ) p n ( 2 r − d − d ′ ) .

Indeed, for any integer q ≥ 1 , we have

M ( q , h ) = q − d − d ′ ∑ t ∈ Z q d + d ′ ∑ m , n ∈ Z q r e ( ( D ( n , m ) − h ) . ( t ∕ q ) ) = q − d − d ′ ∑ q 1 ∣ q ∑ w ∈ Z q ∕ q 1 d + d ′ , ( w , q ∕ q 1 ) = 1 ∑ m , n ∈ Z q r e ( ( D ( n , m ) − h ) . ( w q 1 ∕ q ) ) = q − d − d ′ ∑ q 2 ∣ q ∑ w ∈ Z q 2 d + d ′ , ( w , q 2 ) = 1 ∑ m , n ∈ Z q r e ( ( D ( n , m ) − h ) . ( w ∕ q 2 ) ) = q − d − d ′ ∑ q 2 ∣ q ∑ w ∈ Z q 2 d + d ′ , ( w , q 2 ) = 1 q 2 r G ( w ∕ q 2 ) ¯ e ( − h . ( w ∕ q 2 ) ) = q 2 r − d − d ′ ∑ q 2 ∣ q A ( q 2 , h ) .

The identity (4.18) follows by applying this with q = p n , p ∈ P . In particular, S ( h ) and B ( p , h ) are real nonnegative numbers,

(4.19) S ( h ) , B ( p , h ) ∈ [ 0 , ∞ ) for any h ∈ G 0 , p ∈ P .

Moreover, by using formulas (4.16) and (4.18), we obtain

B ( p , h ) = M ( p n , h ) p n ( 2 r − d − d ′ ) + O r ( 2 − n ∕ δ )

for any n ≥ 1 , h ∈ G 0 and p ∈ P . We would like to show now that S ( h ) ≳ r 1 for all elements h ∈ H Q , in order to be able to exploit the expansion (1.25); Q will be defined below. We notice first that for any integer r sufficiently large, there is p 0 ( r ) ∈ P such that

(4.20) 1 ∕ 2 ≤ ∏ p ∈ P , p ≥ p 0 ( r ) B ( p , h ) ≤ 3 ∕ 2 ,

for any h ∈ G 0 , due to the rapid decay of the coefficients G ( a ∕ q ) in (4.10).

By Lemma 4.1, (and a comment after its statement) if r is even, then there is a point a 0 = ( z 0 , w 0 ) such that D ( a 0 ) = 0 , and there is a ( d + d ′ ) × ( d + d ′ ) minor J D ( a 0 ) ≠ 0 . By re-indexing the variables, we may assume that this minor is J D ( a 0 ) = det ∂ i D ∂ x j ( a 0 ) i , j = K + 1 N , writing N = 2 r , K = 2 r − d − d ′ . In other words, we may assume that the minor corresponding to the last d + d ′ columns of the Jacobian matrix of D is nonsingular. For a given prime p ≤ p 0 ( r ) , let γ p ∈ N be such that J D ( a 0 ) = p γ p u with u ∈ Z + and p ∤ u . Define Q = Q ( r ) ≔ ∏ p ∈ P , p ≤ p 0 ( r ) p 2 γ p + 1 .

For h ∈ H Q , we have D ( a 0 ) = h ( mod p 2 γ p + 1 ) , but J D ( a 0 ) ≠ 0 ( mod p γ p + 1 ) ; thus, we are in the position to apply Hensel’s lemma (Theorem 4.2) with N = 2 r and K = 2 r − d − d ′ . Then by Corollary 4.3, we have that M ( p n , h ) ≥ p ( n − 2 γ p − 1 ) K for all n > 2 γ p and hence by (4.18) we have B ( p , h ) ≥ p − ( 2 γ p + 1 ) K − 1 . This proves that

(4.21) S ( h ) ≳ r 1 uniformly for h ∈ H Q .

Step 4. Finally, we analyze the contribution of the singular integral. Since

∫ R d + d ′ Φ ( ζ ) e ( − ( N − 1 ∘ g ) . ζ ) d ζ = ∫ R d + d ′ Φ ( ζ ) d ζ + O r , g ( N − 1 ) ,

due to (4.11), to prove the approximate identity (1.29), it suffices to prove that

(4.22) ∫ R d + d ′ Φ ( ζ ) d ζ ≳ r 1 .

We fix a smooth function χ : R d + d ′ → [ 0 , 1 ] , satisfying χ ( x ) = 1 if ∣ x ∣ ≤ 1 ∕ 2 , χ ( x ) = 0 if ∣ x ∣ ≥ 2 , and ∫ R d + d ′ χ ( x ) d x = 1 . For ε ≤ ε ( r ) sufficiently small, we write

(4.23) ∫ R d + d ′ Φ ( ζ ) χ ^ ( ε ζ ) d ζ = ∫ [ − 1 , 1 ] 2 r ε − ( d + d ′ ) χ ( D ( z , w ) ∕ ε ) d z d w ,

using the definition (1.27). In particular, by letting ε → 0 , ∫ R d + d ′ Φ ( ζ ) d ζ is a real nonnegative number. Moreover, the lower bound (4.22) follows from (4.23) provided that we can show that there is a point ( z 0 , w 0 ) ∈ [ − 1 , 1 ] 2 r such that

(4.24) D ( z 0 , w 0 ) = 0 and rank [ ∇ z , w D ( z 0 , w 0 ) ] = d + d ′ .

We notice that this with even r follows easily from Lemma 4.1.

Lemma 4.1

Let r ≥ r 0 ( d ) . Then there exists ( n , m ) ∈ Z 2 r such that

(4.25) rank [ ∇ x , y D ( n , m ) ] = d + d ′ .

Indeed, writing D r ( x , y ) = D ( x , y ) : R 2 r → G 0 # , we have that D r ( x , y ) ⋅ D r ( n , m ) − 1 = D 2 r ( ( x , m ′ ) , ( y , n ′ ) ) with n ′ = ( n r , … , n 1 ) , m ′ = ( m r , … , m 1 ) . Assuming (4.25), it is clear that the map ( ( x , m ′ ) , ( y , n ′ ) ) ↦ D 2 r ( ( x , m ′ ) , ( y , n ′ ) ) has maximal rank at z 0 = ( n , m ′ ) , w 0 = ( m , n ′ ) and (4.24) follows.

The proof of Lemma 4.1 is based on counting points ( n , m ) ∈ [ N ] 2 r at which the rank of the map ∇ x , y D drops. This was also crucial in obtaining the nilpotent Weyl estimate (2.14).

Proof of Lemma 4.1

Let N be sufficiently large with respect to r , d . It is enough to show that there is a constant C d > 0 ( C d = 2 d ( d + d ′ ) 2 works here) such that

(4.26) ∣ { n ∈ [ N ] r : rank [ ∇ x D ( n , m ) ] < d + d ′ } ∣ ≲ d , r N ( r + 1 ) ∕ 2 + C d ,

holds uniformly for m ∈ [ N ] r . Fix m ∈ [ N ] r . If rank [ ∇ x D ( n , m ) ] < d + d ′ , then by Cramer’s rule, there exists b l 1 l 2 ∈ Z , ∣ b l 1 l 2 ∣ ≲ N ( 2 d − 1 ) ( d + d ′ ) with b l 1 l 2 ≠ 0 for at least one 0 ≤ l 2 < l 1 ≤ d , such that

(4.27) ∑ 0 ≤ l 2 < l 1 ≤ d b l 1 l 2 [ ∂ j D ( n , m ) ] l 1 l 2 = 0 for all 1 ≤ j ≤ r .

From (2.11), we have that [ ∂ j D ( n , m ) ] l 1 0 = − l 1 n j l 1 − 1 , and for 1 ≤ l 2 ,

(4.28) [ ∂ j D ( n , m ) ] l 1 l 2 = l 1 n j l 1 − 1 ∑ k > j ( n k l 2 − m k l 2 ) + l 2 n j l 2 − 1 ∑ k < j ( n k l 1 − m k l 1 ) − l 1 n j l 1 − 1 m j l 2 + ( l 1 + l 2 ) n j l 1 + l 2 − 1 .

We want to only include terms k ≤ j and to achieve that we introduce the parameters

T l = T l ( n , m ) = ∑ k = 1 r ( n k l − m k l ) , for 1 ≤ l < d .

Note that T l ∈ [ − 2 r N d − 1 , 2 r N d − 1 ] . For fixed T = ( T l ) 1 ≤ l < d , write

∑ k > j ( n k l 2 − m k l 2 ) = T l 2 ( n , m ) − ∑ k ≤ j ( n k l 2 − m k l 2 ) ,

Substituting into (4.28), we obtain, up to lower degree terms in the variables n = ( n 1 , … , n r ) ,

(4.29) [ ∂ j D ( n , m ) ] l 1 l 2 = − l 1 ∑ k ≤ j n j l 1 − 1 n k l 2 + l 2 ∑ k < j n j l 2 − 1 n k l 1 + ( l 1 + l 2 ) n j l 1 + l 2 − 1 ,

for 1 ≤ l 2 < d . Thus, the system in (4.27) takes the form

(4.30) ∑ 0 ≤ l 2 < l 1 ≤ d b l 1 l 2 P l 1 l 2 j , T , m ( n 1 , … , n j ) = 0 , 1 ≤ j ≤ r .

Notice that for fixed n 1 , … , n 2 j − 2 with j ≤ r ∕ 2 , the left-hand side of (4.30) with j replaced by 2 j contains the monomials − b l 1 0 l 1 n 2 j l 1 − 1 and b l 1 l 2 l 2 n 2 j l 2 − 1 n 2 j − 1 l 1 and hence is nonvanishing in the variables n 2 j − 1 , n 2 j . This, thanks to [37, Lemma 5.3], implies that number of solutions to (4.30) is at most 2 ( d + d ′ ) ( N + 1 ) in the variables n 2 j − 1 , n 2 j .

As the number of choices for parameters b = ( b l 1 l 2 ) 0 ≤ l 2 < l 1 ≤ d and T = ( T l ) 1 ≤ l < d is ≲ r , d N C d (with, say C d = 2 d ( d + d ′ ) 2 ), (4.25) follows.□

We remark that (4.25) together with the argument proving (4.24) also implies that the map D 2 r : R 4 r → G 0 # is surjective. Indeed, the image of the map D r must contain an open ball B ( g , δ ) ; thus, the image of D 2 r must contain an open ball B ( 0 , δ ′ ) ⊆ B ( g , δ ) B ( g , δ ) − 1 centered at the origin, then by homogeneity the whole space G 0 # .

Lemma 4.1 together with Hensel’s lemma is also crucial to show the nonvanishing of the singular series S ( h ) for h ∈ H Q . Recall that, given a prime p , the ring of p -adic integers Z ^ p is defined as the completion of Z with respect to the p -adic metric ∣ m ∣ p = e − k , if m = p k u with u ∈ Z , p ∤ u . Then Z ^ p is a so-called complete valuation ring with a unique maximal ideal I p ≔ p Z ^ p and we will write x = 0 ( mod p k ) if x ∈ p k Z ^ p . We have ∣ x − y ∣ p ≤ max { ∣ x − z ∣ p , ∣ y − z ∣ p } , and hence, a sequence ( x j ) j ∈ N is Cauchy if ∣ x j + 1 − x j ∣ p → 0 as j → ∞ .

It follows that any formal power series g ( x ) converges at x whenever x = 0 ( mod p ) . For a vector x = ( x 1 , … , x N ) ∈ Z ^ p N , we say that x = 0 ( mod p k ) if x j = 0 ( mod p k ) , for all 1 ≤ j ≤ N . Then any power series g ( x ) = g ( x 1 , … , x N ) in N variables also converges whenever x = 0 ( mod p ) . Moreover, one has the inverse and implicit function theorems for power series maps g ( x ) = ( g 1 ( x ) , … , g N ( x ) ) : Z ^ p N → Z ^ p N without constant terms. Namely, if the Jacobian of the system at origin J g ( 0 ) ∉ I P , i.e., is a unit, then g has an inverse power series map h ( x ) = ( h 1 ( x ) , … , h N ( x ) ) , in the sense that h ( g ( x ) ) = g ( h ( x ) ) = x , see [29, Proposition 5.19]. One also has a corresponding version of the implicit function theorem; for a map g ( x ) = ( g K + 1 ( x ) , … , g N ( x ) ) : Z ^ p N → Z ^ p N − K such that det ∂ g i ∂ x j ( 0 ) i , j = K + 1 N ∉ I p , the inverse image V g ≔ g − 1 ( 0 ) can be parameterized as V g = ( t 1 , … , t K , h K + 1 ( t ) , … , h N ( t ) ) with t = ( t 1 , … , t K ) , which can be seen from the inverse function theorem by extending the map with g i ( x ) = x i for i = 1 , … , K . The following extension of the implicit function is often used to show the nonvanishing of the singular series associated to diophantine systems.

Theorem 4.2

(Hensel’s lemma) Let f = ( f K + 1 , … , f N ) : Z ^ p N → Z ^ p N − K be a family of polynomials. Assume there exists an a ∈ Z ^ p N and an integer γ > 0 , such that

(4.31) f ( a ) = 0 ( mod p 2 γ + 1 ) ,

moreover

(4.32) J f ( a ) = p γ u , u ≠ 0 ( mod p ) ,

where J f ( a ) is the Jacobian,

(4.33) J f ( a ) = det ∂ f i ∂ x j ( a ) i , j = K + 1 N .

Then there exist power series h = ( h K + 1 , … , h N ) such that for all t = ( t 1 , … , t K ) = 0 ( mod p ) , one has that

(4.34) f ( a + ( p 2 γ t , p γ h ( t ) ) ) = 0 .

This means that for each j = K + 1 , … , N , one has

f j ( a 1 + p 2 γ t 1 , … , a K + p 2 γ t K , a K + 1 + p γ h K + 1 ( t ) , … , a N + p γ h N ( t ) ) = 0 .

This is proved in [29, Lemma 5.21 and Note 5.22]. In fact, it is shown that all b ∈ Z ^ p N such that b = a ( mod p γ + 1 ) and f ( b ) = 0 can be parameterized this way. We will use it to obtain the following lower bound, assuming conditions (4.31)–(4.32) hold.

Corollary 4.3

Let n > 2 γ . Then

(4.35) ∣ { b ∈ Z p n N : f ( b ) = 0 ( mod p n ) } ∣ ≥ p ( n − 2 γ − 1 ) K .

Proof

Notice that if t 1 ≠ t 2 ( mod p n − 2 γ ) , then c 1 ≠ c 2 ( mod p n ) , where c i = a + ( p 2 γ t i , p γ h ( t i ) ) for i = 1 , 2 . There are p ( n − 2 γ − 1 ) K values of t ∈ Z K such that t = 0 ( mod p ) , which fall into different residue classes mod p n − 2 γ , thus by (4.34), we have at least this many solutions to f ( c ) = 0 in Z ^ p N , which fall into different residue classes mod p n . For each such c , let b ∈ Z N such that b = c ( mod p n ) , Then clearly f ( b ) = 0 ( mod p n ) , and all such b ’s are distinct mod p n .□

Dedicated to David Jerison, on the occasion of his 70th birthday.

Funding information: The first, second and third authors were supported in part by NSF grants DMS-2007008 and DMS-1600840 and DMS-2154712 respectively. The third author was also partially supported by the Department of Mathematics at Rutgers University and by the National Science Centre in Poland, grant Opus 2018/31/B/ST1/00204. The fourth author was partially supported by the National Science Centre of Poland, grant Opus 2017/27/B/ST1/01623, the Juan de la Cierva Incorporación 2019, grant number IJC2019-039661-I, the Agencia Estatal de Investigación, grant PID2020-113156GB-I00/AEI/10.13039/501100011033, the Basque Government through the BERC 2022-2025 program, and by the Spanish Ministry of Sciences, Innovation and Universities: BCAM Severo Ochoa accreditation CEX2021-001142-S.
Conflict of interest: Authors state no conflict of interest.

References

[1] T. Austin, A proof of Walshas convergence theorem using couplings, Int. Math. Res. Not. IMRN 15 (2015), 6661–6674. 10.1093/imrn/rnu145Search in Google Scholar

[2] T. Austin, On the norm convergence of non-conventional ergodic averages, Ergodic Theory Dynam. Systems 30 (2010), 321–338. 10.1017/S014338570900011XSearch in Google Scholar

[3] A. Bellow, Measure Theory Oberwolfach 1981. in: Proceedings of the Conference held at Oberwolfach, June 21–27, 1981. Lecture Notes in Mathematics 945, editors D. Kölzow and D. Maharam-Stone. Springer-Verlag Berlin Heidelberg (1982), Section: Two problems submitted by A. Bellow, pp. 429–431. Search in Google Scholar

[4] V. Bergelson, Weakly mixing PET, Ergodic Theory Dynam. Systems 7 (1987), no. 3, 337–349. 10.1017/S0143385700004090Search in Google Scholar

[5] V. Bergelson, Ergodic Ramsey theory - an update, ergodic theory of Zd-actions, in: M. Pollicott and K. Schmidt (Eds), London Mathematical Society Lecture Note Series, vol. 228, 1996, pp. 1–61. Search in Google Scholar

[6] V. Bergelson, Combinatorial and diophantine applications of ergodic theory (with appendices by A. Leibman and by A. Quas and M. Wierdl), in: Handbook of Dynamical Systems B.Hasselblatt and A. Katok, (Eds.), Vol. 1B, Elsevier, 2006, pp. 745–841. 10.1016/S1874-575X(06)80037-8Search in Google Scholar

[7] V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden and Szemerédi’s theorems, J. Amer. Math. Soc. 9 (1996), 725–753. 10.1090/S0894-0347-96-00194-4Search in Google Scholar

[8] V. Bergelson and A. Leibman, A nilpotent Roth theorem, Invent. Math. 147 (2002), 429–470. 10.1007/s002220100179Search in Google Scholar

[9] B. J. Birch, Forms in many variables, Proc. R. Soc. Lond. A 265 (1962), 245–263. 10.1098/rspa.1962.0007Search in Google Scholar

[10] G. Birkhoff, Proof of the ergodic theorem, Proc. Natl. Acad. Sci. USA 17 (1931), no. 12, 656–660. 10.1073/pnas.17.2.656Search in Google Scholar PubMed PubMed Central

[11] J. Bourgain, On the maximal ergodic theorem for certain subsets of the integers, Israel J. Math. 61 (1988), 39–72. 10.1007/BF02776301Search in Google Scholar

[12] J. Bourgain, On the pointwise ergodic theorem on Lp for arithmetic sets, Israel J. Math. 61 (1988), 73–84. 10.1007/BF02776302Search in Google Scholar

[13] J. Bourgain, Pointwise ergodic theorems for arithmetic sets, with an appendix by the author, H. Furstenberg, Y. Katznelson and D.S. Ornstein, Inst. Hautes Études Sci. Publ. Math. 69 (1989), 5–45. 10.1007/BF02698838Search in Google Scholar

[14] J. Bourgain, Double recurrence and almost sure convergence, J. Reine Angew. Math. 404 (1990), 140–161. 10.1515/crll.1990.404.140Search in Google Scholar

[15] Z. Buczolich and R. D. Mauldin, Divergent square averages, Ann. Math. 171 (2010), no. 3, 1479–1530. 10.4007/annals.2010.171.1479Search in Google Scholar

[16] A. Calderón, Ergodic theory and translation invariant operators, Proc. Natl. Acad. Sci. USA 59 (1968), 349–353. 10.1073/pnas.59.2.349Search in Google Scholar PubMed PubMed Central

[17] M. Christ, Hilbert transforms along curves: I. nilpotent groups, Ann. Math. 122/3 (1985), 575–596. 10.2307/1971330Search in Google Scholar

[18] M. Christ, A. Nagel, E. M. Stein, and S. Wainger, Singular and maximal Radon transforms: analysis and geometry, Ann. Math. 150 (1999), no. 2, 489–577. 10.2307/121088Search in Google Scholar

[19] Q. Chu, N. Frantzikinakis, and B. Host, Ergodic averages of commuting transformations with distinct degree polynomial iterates, Proc. London. Math. Soc. 102 (2011), no. 5, 801–842. 10.1112/plms/pdq037Search in Google Scholar

[20] L. J. Corwin and F. P. Greenleaf, Representations of nilpotent Lie groups and their applications. Part I. Basic theory and examples, Cambridge Studies in Advanced Mathematics, vol. 18, Cambridge University Press, Cambridge, 1990. Search in Google Scholar

[21] H. Davenport, Cubic Forms in Thirty-two Variables, Phil. Trans. R. Soc. Lond. A 251 (1959), 193–232. 10.1098/rsta.1959.0002Search in Google Scholar

[22] H. Davenport, Analytic Methods for Diophantine Equations and Diophantine Inequalities, Cambridge University Press, Cambridge, 1959. Search in Google Scholar

[23] N. Frantzikinakis, Some open problems on multiple ergodic averages, Bull. Hellenic Math. Soc. 60 (2016), 41–90. Search in Google Scholar

[24] N. Frantzikinakis and B. Kra, Polynomial averages converge to the product of integrals, Israel J. Math. 148 (2005), 267–276. 10.1007/BF02775439Search in Google Scholar

[25] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemeredi on arithmetic progressions, J. Anal. Math. 31 (1977), 204–256. 10.1007/BF02813304Search in Google Scholar

[26] H. Furstenberg, Problems Session, Conference on Ergodic Theory and Applications, University of New Hampshire, Durham, NH, June 1982, . Search in Google Scholar

[27] H. Furstenberg, Nonconventional ergodic averages, The legacy of John von Neumann (Hempstead, NY, 1988), Proceedings of Symposia in Pure Mathematics, vol. 50, Amer. Math. Soc. Providence, RI, 1990, pp. 43–56. 10.1090/pspum/050/1067751Search in Google Scholar

[28] H. Furstenberg and B. Weiss, A mean ergodic theorem for 1N∑n=1Nf(Tnx)g(Tn2x), Convergence in Ergodic Theory and Probability (Columbus, OH, 1993), Ohio State Univ. Math. Res. Inst. Publ., vol. 5, De Gruyter, Berlin, 1996, pp. 193–227. 10.1515/9783110889383.193Search in Google Scholar

[29] M. J. Greenberg, Lectures on Forms in Many Variables, W.A. Benjamin, Inc., New York, New York, 1969. Search in Google Scholar

[30] D. Hilbert, Beweis fü r die Darstellbarkeit der ganzen zahlen durch eine feste Anzahl n-ter Potenzen (Waringsches Problem), Math. Ann. 67 (1909), 281–300. 10.1007/BF01450405Search in Google Scholar

[31] B. Host and B. Kra, Non-conventional ergodic averages and nilmanifolds, Ann. Math. 161 (2005), 397–488. 10.4007/annals.2005.161.397Search in Google Scholar

[32] B. Host and B. Kra, Convergence of polynomial ergodic averages, Israel J. Math. 149 (2005), 1–19. 10.1007/BF02772534Search in Google Scholar

[33] Y.-Q. Hu, Polynomial maps and polynomial sequences in groups, Available at arXiv:2105.08000. Search in Google Scholar

[34] Y.-Q. Hu, Waring’s problem for locally nilpotent groups: the case of discrete Heisenberg groups, Available at arXiv:2011.06683. Search in Google Scholar

[35] A. Ionescu, A. Magyar, M. Mirek, and T. Z. Szarek, Polynomial averages and pointwise ergodic theorems on nilpotent groups, Invent. Math. 231 (2023), 1023–1140. 10.1007/s00222-022-01159-0Search in Google Scholar

[36] A. Ionescu, A. Magyar, E. M. Stein, and S. Wainger, Discrete Radon transforms and applications to ergodic theory, Acta Math. 198 (2007), 231–298. 10.1007/s11511-007-0016-xSearch in Google Scholar

[37] A. Ionescu, A. Magyar, and S. Wainger, Averages along polynomial sequences in discrete nilpotent Lie groups: Singular Radon transforms, In: Advances in Analysis: the Legacy of Elias M. Stein, Princeton Mathematical Series, vol. 50, Princeton University Press, Princeton, NJ, 2014, pp. 146–188. 10.1515/9781400848935-008Search in Google Scholar

[38] A. D. Ionescu and S. Wainger, Lp boundedness of discrete singular Radon transforms, J. Amer. Math. Soc. 19 (2005), no. 2, 357–383. 10.1090/S0894-0347-05-00508-4Search in Google Scholar

[39] B. Krause, Discrete analogoues in harmonic analysis: maximally monomially modulated singular integrals related to Carleson’s theorem, Available at arXiv:1803.09431. Search in Google Scholar

[40] B. Krause, M. Mirek, and T. Tao, Pointwise ergodic theorems for non-conventional bilinearpolynomial averages, Ann. Math. 195 (2022), no. 3, 997–1109. 10.4007/annals.2022.195.3.4Search in Google Scholar

[41] P. LaVictoire, Universally L1-bad arithmetic sequences, J. Anal. Math. 113 (2011), no. 1, 241–263. 10.1007/s11854-011-0006-ySearch in Google Scholar

[42] A. Leibman, Convergence of multiple ergodic averages along polynomials of several variables, Israel J. Math. 146 (2005), 303–315. 10.1007/BF02773538Search in Google Scholar

[43] D. Lépingle, La variation d’ordre p des semi-martingales, Z. Wahrscheinlichkeitstheorie Verw. Gebiete 36 (1976), 295–316. 10.1007/BF00532696Search in Google Scholar

[44] A. Magyar, E. M. Stein, and S. Wainger, Discrete analog in harmonic analysis: spherical averages, Ann. Math. 155 (2002), 189–208. 10.2307/3062154Search in Google Scholar

[45] A. Magyar, E. M. Stein, and S. Wainger, Maximal operators associated to discrete subgroups of nilpotent Lie groups, J. Anal. Math. 101 (2007), 257–312. 10.1007/s11854-007-0010-4Search in Google Scholar

[46] M. Mirek, E. M. Stein, and B. Trojan, ℓp(Zd) -estimates for discrete operators of Radon type: Variational estimates, Invent. Math. 209 (2017), no. 3, 665–748. 10.1007/s00222-017-0718-4Search in Google Scholar

[47] M. Mirek, E. M. Stein, and P. Zorin-Kranich, A bootstrapping approach to jump inequalities and their applications, Anal. PDE 13 (2020), no. 2, 527–558. 10.2140/apde.2020.13.527Search in Google Scholar

[48] M. Mirek, E. M. Stein, and P. Zorin-Kranich, Jump inequalities for translation-invariant operators of Radon type on Zd, Adv. Math. 365 (2020), Article ID 107065, pp. 57. 10.1016/j.aim.2020.107065Search in Google Scholar

[49] L. Pierce, Discrete fractional Radon transforms and quadratic forms, Duke Math. J. 161 (2012), 69–106. 10.1215/00127094-1507288Search in Google Scholar

[50] L. Pierce and P.-L. Yung, A polynomial Carleson operator along the paraboloid, Rev. Mat. Iberoam. 35 (2019), 339–422. 10.4171/rmi/1057Search in Google Scholar

[51] F. Ricci and E. M. Stein, Harmonic analysis on nilpotent groups and singular integrals I. Oscillatory integrals, J. Funct. Anal. 73 (1987), 179–194. 10.1016/0022-1236(87)90064-4Search in Google Scholar

[52] E. M. Stein and S. Wainger, Discrete analog in harmonic analysis, I: ℓ2 estimates for singular Radon transforms, Amer. J. Math. 121 (1999), 1291–1336. 10.1353/ajm.1999.0046Search in Google Scholar

[53] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 199–245. 10.4064/aa-27-1-199-245Search in Google Scholar

[54] T. Tao, Norm convergence of multiple ergodic averages for commuting transformations, Ergodic Theory Dynam. Syst. 28 (2008), 657–688. 10.1017/S0143385708000011Search in Google Scholar

[55] I. M. Vinogradov, The method of trigonometrical sums in the theory of numbers. (Russian), Trav. Inst. Math. Stekloff 23 (1947). Search in Google Scholar

[56] J. von Neumann, Proof of the quasi-ergodic hypothesis, Proc. Natl. Acad. Sci. USA 18 (1932), 70–82. 10.1073/pnas.18.1.70Search in Google Scholar PubMed PubMed Central

[57] M. Walsh, Norm convergence of nilpotent ergodic averages, Ann. Math. 175 (2012), no. 3, 1667–1688. 10.4007/annals.2012.175.3.15Search in Google Scholar

[58] T. D. Wooley, Nested efficient congruencing and relatives of Vinogradov’s mean value theorem, Proc. London Math. Soc. 118 (2019), no. 4, 942–1016. 10.1112/plms.12204Search in Google Scholar

[59] T. Ziegler, Universal characteristic factors and Furstenberg averages, J. Amer. Math. Soc. 20 (2007), 53–97. 10.1090/S0894-0347-06-00532-7Search in Google Scholar

Received: 2022-12-05

Revised: 2023-07-01

Accepted: 2023-07-02

Published Online: 2023-08-07

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/ans-2023-0085

Keywords for this article

discrete nilpotent groups; pointwise ergodic theorems; nilpotent circle method; Weyl inequality

Creative Commons

BY 4.0