Analog nanophotonic computing going practical: silicon photonic deep learning engines for tiled optical matrix multiplication with dynamic precision
-
George Giamougiannis
, Apostolos Tsakyridis
Abstract
Analog photonic computing comprises a promising candidate for accelerating the linear operations of deep neural networks (DNNs), since it provides ultrahigh bandwidth, low footprint and low power consumption computing capabilities. However, the confined photonic hardware size, along with the limited bit precision of high-speed electro-optical components, impose stringent requirements towards surpassing the performance levels of current digital processors. Herein, we propose and experimentally demonstrate a speed-optimized dynamic precision neural network (NN) inference via tiled matrix multiplication (TMM) on a low-radix silicon photonic processor. We introduce a theoretical model that relates the noise figure of a photonic neuron with the bit precision requirements per neural layer. The inference evaluation of an NN trained for the classification of the IRIS dataset is, then, experimentally performed over a silicon coherent photonic neuron that can support optical TMM up to 50 GHz, allowing, simultaneously, for dynamic-precision calculations. Targeting on a high-accuracy and speed-optimized classification performance, we experimentally applied the model-extracted mixed-precision NN inference scheme via the respective alteration of the operational compute rates per neural layer. This dynamic-precision NN inference revealed a 55% decrease in the execution time of the linear operations compared to a fixed-precision scheme, without degrading its accuracy.
1 Introduction
The exponential increase of compute-demanding applications, along with their need for time-of-flight and near-zero energy consumption, has rekindled the analog computing paradigm [1–4] as a way to overcome the digital energy wall. Analog computing pares down the data movement requirements by exploiting the memory cells both as storage and computation elements. Additionally, analog computing engines comprise highly promising approaches for AI processing since they operate with much less power at a higher speed compared to their digital counterparts [5]. The latter becomes even more pronounced when the computing hardware exploits the prodigious primitives of light i.e., ultrahigh bandwidth, low footprint and high energy efficiency, with optical neural networks (ONNs) being at the forefront of research and industrial activities within the last decade [5–7] and promising to accelerate matrix multiplication operations, which form typically the most time- and energy-consuming tasks within inference applications of deep neural networks (DNNs) [8].
Yet, ONNs and analog computing engines in general, come with the price of (i) confined size of hardware implementable computational models [9–18] and (ii) limited bit precision [5, 19], [20], [21]. More specifically, as the complexity of the neural network (NN) models proliferates, so does their size and in turn their total number of required multiply-accumulate (MAC) operations. However, the spatial distribution of the NN parameters encoding devices cannot expand relentlessly, imposing a hardware limitation in the number of encodable parameters the ONN can host. To this end, the latter need to follow the lead traced by today’s TPU and GPU computational models [22, 23], where a limited amount of hardware resources can execute DNNs with significantly higher dimensions. In particular, based on the hardware characteristics, i.e., size, parameters updating speed, and the application requirements, i.e., sample-/batch-wise inference, these computational modes split the matrices into smaller tiles and unroll the complete matrix multiplication operations in the time domain. The tiled matrix multiplication (TMM), performed by means of time division multiplexing (TDM), entails the high-speed update of the matrix element encoding devices, calling for ONNs with high-bandwidth constituent building blocks. Towards this direction, ONNs have to strike the balance between operational speed and scaling, with the majority of the integrated photonic solutions leaning their efforts mainly on the second. On top of the above, the digital-to-ONN computing transition includes the employment of digital-to-analog (DAC) and analog-to-digital (ADC) converters along with the parameters encoding, amplification and processing devices, i.e., modulators, photodiodes (PDs), amplifiers etc., that, inevitably, introduce degradation to the analog accuracy during the inference, since each constituent introduces a relevant noise source that impacts the electro-optic link’s bit resolution properties. The limited bit precision effect can be mitigated during the training process or alternatively, via post-training inference techniques. The former can be accomplished either by incorporating the hardware impairments i.e., noise figures, bit quantization limitations etc., into the training model [24–27], or via the employment of rigid rules in the training phase i.e., low-precision training, binarization of the NN parameters etc. [28, 29]. Yet even though these techniques lead to accuracy improvements, they impose additional complexity and energy trade-offs since the NN need to be retrained in order to be tailored to the employed hardware constraints. On the other hand, in pre-trained networks, analog optical processors can step in effectively when operations can be executed at low bit precision [5, 30]. However, the bit resolution requirements of the NNs are, typically, more rigorous. To this end, post-training techniques i.e., inference averaging, dynamic precision inference etc. [19, 31], [32], [33] need to be employed in order to compensate for the “noisy” analog computations.
In this paper, we demonstrate a speed-optimized dynamic precision NN inference via TMM on a silicon-integrated neuromorphic processor. The 2-input SiPho neuron supports high-rate update of the NN parameters (inputs, weights) encoding, allowing for the effective application of TDM. Towards the speed- and accuracy-inference optimization of a hardware-aware trained NN for the classification of the IRIS dataset, we distinguished and modeled the noise figures of the ONN link and the bit precision requirements of each neural layer. After the model-aware correlation of the required bit precision per layer with the ONN axon bandwidth, we experimentally performed the dynamic-precision NN inference revealing a 55% decrease in the execution time of the linear operations compared to a fixed-precision scheme, without significantly (<1%) degrading its accuracy. Additionally, we validated and quantified the impact of the dynamic-precision post-training inference into the NN accuracy, via the operation of the photonic hardware at different compute rates in the two neural layers. Specifically, we performed the inference of the constituent neural layers via TMM, recording the accuracy of the NN, when the linear operations of its two layers were performed at 2, 16, and 50 Gbaud. As a consequence of the high bit precision tolerance of the 1st neural layer, the software accuracy of 96.6% was obtained during its experimental inference at compute rates up to 50 Gbaud. On the other hand, an accuracy degradation was observed in the noise-sensitive output layer, with the accuracy values of 93.1%, 86.4%, and 68.6% being calculated when it was executed at 2, 16, and 50 Gbaud, respectively, validating the dynamic precision significance within the NN inference.
2 Photonic aware techniques towards high speed and high accuracy neural networks inference
The exploitation of light primitives for the high-speed and high-accuracy execution of the space- and time-demanding matrix multiplication operations is, typically, accompanied with multiple requirements on the development of the ONN hardware. In particular, the limitations that are imposed by the analog nature of the data movement and processing within an ONN and the finite number of parameters that a practical silicon photonic chip can host, along with their update rate, predominantly define these requirements. In this regime, the speed- and accuracy-optimization of the NN inference has to proceed along with hardware aware methodologies. In this section, we delve into the inference of NNs whose dimensions exceed the ONN dimensions and present the time division unrolling of its execution via the employment of the TMM technique. Additionally, we study and model the noise sources of an ONN link, correlating the operational rate-dependent total noise figure with hardware’s analog precision. Thereafter, we identify the bit precision requirements among the neural layers and propose a dynamic rate regulation method towards the speed-optimization of the NN inference.
2.1 Optical tiled matrix multiplication
The processing speed and accuracy of the NN matrix multiplication linear operations is heavily dependent on the structure, the size and the principles of the employed ONN architecture. Figure 1(a) illustrates a coherent photonic crossbar architecture as proposed in [34]. An n-elements long NN input vector can be encoded via the modulating devices included in the light blue rectangle that follows a 1 × n splitting stage. A crossbar mesh, highlighted within the red rectangle, performs the n × m weight matrix (W) elements encoding, via modulators for the amplitude and phase shifters (PSs) for the sign imprinting. Hence, the linear operations between the input vector X and the weight matrix W produce an m-elements long vector Y, shown in the grey rectangle [34, 35]. The architecture of the photonic crossbar of Figure 1(a), offers: (i) direct elements mapping, that leads to easy programmability and optimal representation fidelity among the experimental and the targeted values, as opposed to complex unitary-based architectures [13, 14, 16, 36] where the fidelity is degraded due to their differential path/node losses, (ii) high insertion loss savings, since each light beam travels only through #2 modulating and #1 phase shifting active devices, allowing this way for high dimensions-scaling and the employment of technologies that can provide high-speed elements imprinting, combined with low energy consumption and/or low footprint attributes. However, irrespective of the deployed technologies, the dimension scaling of the ONN architecture is power budget bounded into practical numbers (n, m), that cannot follow typical NN dimensions (N > n, M > m) [37, 38]. As such, wavelength and time division multiplexing techniques have to be enforced either for maximizing the amount of parallel operations or for time unfolding of the operations, respectively. The former has been widely used in the domain of integrated neuromorphic photonics [9, 15, 17, 39] but has still limitations in the number of employable wavelengths and, as such, in the amount of parallelization it can provide. Therefore, TDM comprises the imperative solution for executing linear operations of an NN via ONN hardware with limited dimensions. Figure 1(b)–(e) illustrates the TMM steps required by an ONN of dimensions (n, m) in order to calculate the linear operations of an NN of size (N, M). More specifically, the n × m elements of the weight matrix W and the n elements of the vector X, highlighted with red and light blue color in Figure 1(b), respectively, are imprinted in the ONN’s respective devices at time slot #1. Figure 1(c) and (d) describe the following TMM stages until the final step that is illustrated in Figure 1(e). Evidently, in order for the ONN to perform the TMM operations, the update rate of the modulating devices for the X and W elements should be synchronized and take place at the highest possible speed in order to provide low-latency calculations and minimize execution time. Therefore, this calls for the development of an ONN architecture that can simultaneously support high-bandwidth active constituents and high scaling credentials [34, 40], since high ONN dimensions minimize the number of tiles of the targeted matrices.

Linear operator architecture and TMM: (a) n × m crossbar architecture for vector (1 × n) matrix (n × m) multiplication. (b)–(e) Tiled vector (1 × N) matrix (N × M) multiplication via a 1 × n vector and n × m matrix encoding hardware.
2.2 Noise-aware neural network inference speed-optimization
Our recent demonstration of the loss-optimized photonic crossbar architecture (Figure 1(a)) [34, 41] that is capable of retaining high fidelity values even for high insertion node losses has highlighted the feasibility of deploying high-bandwidth photonic components with up to 50 GMAC/s/axon rates in high-radix photonic neuromorphic layouts. In this context, we proceed with the development of an analytical framework that is capable of correlating the available opto-electrical bandwidth of the underlying photonic components with the achieved bit resolution equivalent performance of the ONN, towards: (i) identifying the major physical mechanisms that define the relationship between the achievable rate and the ONN bit precision, (ii) revealing the latency-accuracy trade-offs of high speed ONNs and (iii) concluding to a generic model of mixed-precision NN inference, following the paradigm of electronic NN accelerators [29–32].
We begin our analysis by evaluating the fundamental relationships between the available bandwidth and the achieved bit resolution of an ONN link. Figure 2(a) illustrates a detailed breakdown of the dominant noise sources of a multi-axon neuron link that impact the algebraic product of X and W via: the n RIN that corresponds to the aggregated noise contributions of the laser source, the n MM that is related to the matrix multiply electro-photonic link, the shot noise n shot that corresponds to the random fluctuation of the PD’s current owning to the discrete charge of electrons traversing the PIN potential barrier, the n dark that corresponds to the noise term associated with the finite dark current of a photodetector, the n ADC that is correlated with the quantization noise imposed by the limited resolution of the employed ADC components and, finally, the n T that is defined as the dominant thermal noise source of the electro-optic layout. Based on the central limit theorem [42], we consider that the dot product calculated via the ONN matrix multiply electro-photonic link follows a normal distribution, introducing a noise term with a standard deviation σ MM. Additionally, assuming that the shot noise values float above nW levels, the quantization noise is uniformly distributed and the thermal noise is dominated by the input-referred noise of the trans-impedance amplifier (TIA), the aforementioned contributions can be modeled as zero-mean additive Gaussian noise sources and their standard deviations referenced to a photocurrent I avg and a noise bandwidth B can be calculated through:
where RIN corresponds to the relative intensity noise density, q to the electron charge, n PD to the conversion efficiency (responsivity) of the PD, Δ to the quantization interval equal to P max − P min, Ebits to the effective number of bits (ENOB) of the employed converter and i ref to the TIA input referred noise current density. Additionally, we correlate the P max − P min values with the modulators’ extinction ratio (ER) through the equation P Avg = P max − P min × (ER + 1)/2 × (ER − 1). Finally, considering the square law detection at the PD and assuming the dark noise’s contribution to be negligible compared to the photodetector’s shot noise, we approximate the standard deviation of the total noise of an ONN link calculated through:

PNN noise analysis: (a) Electro-optic ONN link noise sources breakdown. (b) Noise equivalent quantization bits versus the bandwidth per ONN’s axon for different values of ONN power budget and matrix multiply noise standard deviation values.
Following Eq. (2), we calculate the noise equivalent quantization bits (NEQB) of the overall ONN link as:
adopting the convention that Gaussian noise sources can be correlated to limited bit precision of NNs [19]. In order to quantify the compute rate’s impact on the ONN bit resolution capabilities based on the developed framework, we considered typical values for state-of-the-art high-bandwidth electro-photonic components that include:
Towards effectively exploiting the NEQB-ONN bandwidth relation for the latency-optimization of the linear operations of the NN inference, we correlate the individual neural layers’ requirements in bit precision with the overall NN accuracy. More specifically, we examine how accurately the NN performs the inference when each neural layer’s linear operations are, individually, performed with predefined quantization bits ranging within the [1, 8] range. Thereafter, after the identification of the network’s “demands”, we extract the minimum bit precision values that can be tolerated by the NN without significantly degrading the final accuracy and select the compute rate of the linear operations of each layer that, based on our previous analysis, can provide this NEQB. This dynamic-rate NN inference leads to significant execution time savings, which can, eventually, turn into respective energy gains. Towards showcasing the proposed method, we evaluated the bit precision requirements of the individual layers of three pre-constructed popular convolutional NNs, the Lenet5 [43], the Alexnet8 [44] and the Resnet9 [45], that comprise 5, 8, and 9 layers, respectively. The dark grey bars of Figure 3(a)–(c) illustrate the minimum bit precision requirements of each neural layer under the condition that the maximum NN accuracy degradation will not exceed 1% with respect to the maximum achievable value defined by the training process, for the examined NNs, respectively. When the NN accuracy degrades by more than 1%, then we consider that the minimum bits required equal to 8. It can be observed, that each layer performs differently under the bit precision relaxation, with the first and last being the less tolerant layers in all three networks, since information loss in one layer cannot be later recovered in the subsequent ones, according to Data Processing Inequality [27, 46]. In order to quantify the achievable savings in execution time, we, also, extract the number of MAC operations that need to be performed per neural layer, shown in the red bars of Figure 3(a)–(c). Consequently, we calculate and compare the NN inference linear operations’ execution time when the compute rate is fixed to the rate that does not lead to NN accuracy degradation by more than 1% and the dynamic-precision aware NN inference. Figure 3(d)–(f) illustrate the computing times of the neural layers of each of the examined NNs, when the inference follows the fixed- (upper stacked bars) and the dynamic- (lower stacked bars) precision inference methods. The NEQB selection in the latter case was realized based on the metrics that were considered for the calculation of the black dashed line of Figure 2(b) that might approximate high-scale ONN architectures characteristics. Indicatively, the bit precision requirements analysis for the Alexnet8 NN revealed that the execution of the 7.96 MMAC operations of the 4th layer requires at least 3.1 bits of precision, that, based on the developed model, correspond to a minimum compute rate of ∼23.8 GMAC/s, resulting to a total time of execution of ∼0.33 ms. Following the proposed mixed compute rate NN inference, the analysis revealed 61%, 76% and 85% decrease in the aggregate execution times for the Lenet5, the Alexnet8 and the Resnet9 NNs, respectively. These latency-reduction rates may lead to significant energy savings or, eventually, compensate for the latency introduced via TMM techniques, where the ONN’s achievable MAC operations per time slot are inferior to the NN required ones. Finally, it is worth noting that as the number of neural layers increase, the importance of the dynamic-rate NN inference will presumably become more pronounced.

Bit resolution required for <1% NN accuracy degradation (grey bars) and number of MMACs (red bars) per layer of (a) the Lenet5, (b) the Alexnet8 and (c) the Resnet9 NNs. Linear operations execution time in ms for fixed and dynamic bit precision inference at (d) the Lenet5, (e) the Alexnet8, and (f) the Resnet9 NNs.
3 Dynamic-precision NN inference: experimental setup
In order to experimentally evaluate the TMM and the dynamic precision in an NN inference, we established the experimental setup shown in Figure 4(a). A light beam at 1560 nm was injected, via a grating coupler with an IL of 3 dB, into the Sipho chip depicted in Figure 4(b), where an electro-absorption modulator (EAM)-based 2:1 single column crossbar processor was designed and fabricated. The optical signal was then split into two identical branches of an MZI, via 3 dB Y-junction multimode interference (MMI) coupler, where, in each branch, two cascaded EAMs, with an IL of 4.4 dBs each, were utilized for transferring to the optical domain the NN input values, while the thermo-optic (TO) PSs were used to statically bias the MZI in the desired operating point. The digital NN inputs were converted in the analog domain via ADCs, using four channels of an arbitrary waveform generator (AWG – Keysight M8194a) and, after amplification, fed to the EAMs in order to be transferred in the optical domain. Specifically, the EAMs, annotated as Xa, Xb in Figure 4(a), were utilized for modulating the input-data of the deployed NN, while the NN weight imprinting was achieved by the EAMs Wa, Wb. Finally, a 3 dB Y-junction MMI was employed for the coherent addition of the two sequences, and then the weighted sum was injected a PD before being captured by to a real time scope (RTO – Keysight DSOZ632a). A digital signal processing stack was utilized both in the transmission and the reception site, including quantization, filtering, resampling, and time recovery. The overall IL of the photonic processor was calculated at ∼15 dB, including the 6 dB losses of the grating coupler based I/O interfaces.

Experimental testbed: (a) Experimental setup established for the inference of the NN for the IRIS dataset classification. (b) SiPho processor employed for the NN inference.
In view of benchmarking our Sipho processor into a real DNN application and evaluate the proposed TMM and dynamic-precision inference schemes’ impact, we designed and trained an NN for the classification of the IRIS dataset, whose topology is illustrated in Figure 5(a), comprising a 4:10:3 fully-connected network. Although our architecture has already validated its credentials to support both positive and negative values of the NN input and weighting elements using the deployed PSs to provide the sign information [35], in this work, we enforced all NN parameters to be positive. This mainly stems from the use of TO PSs for the sign imprinting that can’t follow the high data-rate speed (GHz regime) of the input-data and weight imprinting EAMs, as would be required during the TMM operation. This can be certainly overcome either by replacing the TO PSs with available electro–optic PS technology that can support high-speed operation or, in our case, by adopting non-negative NN training models. However, using only positive NN parameters in DL models, poses significant challenges in the training process that have to be addressed in order to yield high classification accuracies. This constraint, typically, generates outputs that the NN struggles to discriminate when baseline training is employed. In order to counteract this effect, we deployed a label smoothing training process [47], turning the output layer more robust to noise and hence to produce more distinctive output classes. In particular, training with label smoothing encourages the activations of the output layer to be close to the template of the correct class and equally distant to the template of the incorrect classes, targeting to minimize the cross entropy, that is defined as

NN training: (a) 4:10:3 fully-connected NN for the classification of the IRIS dataset, (b) Normalized minimum distance from target label t
c
and
4 Dynamic-precision NN inference: experimental results
In this section we provide our experimental findings on TMM and the adaptable line-rate as they have been obtained during the photonic NN inference of the IRIS classification dataset employing the Sipho processor shown in Figure 4(b). Specifically, the Sipho chip was employed to execute the linear operations of the NN, while the Sigmoid activation function was applied in the software domain. However, in an all-optical implementation the sigmoid activation function could potentially be experimentally deployed using semiconductor optical amplifiers [49]. Within the scope of benchmarking the dynamic precision NN inference, we investigated the NEQB requirements for each neural layer. Specifically, we quantized the NN input and weight parameters of the examined layer in the range [1, 8], with the dashed and solid black lines highlighting the precision requirements of the first and the second neural layer, respectively. Targeting a maximum classification accuracy degradation of up to 1%, i.e., 95.6%, we observed that the 1st layer, being more noise-tolerant, requires low precision calculations of at least 1.6 NEQB to meet the aforementioned condition, as illustrated by the left green dashed line in Figure 6(a). On the other hand, the noise-sensitive output layer requires a NEQB of at least 4.6 to achieve the same performance. Parametrizing the developed theoretical model described in Section 2.2, with the electrical and optical equipment employed in the experiment, we investigated the impact in NEQB as the ONN axon bandwidth gradually increases, as shown with the black curve of Figure 6(b), towards determining the respective compute rate of the constituent neural layers. More specifically, the model parameters included: a RIN = −145 dB/Hz coming from the external laser source utilized (CoBrite-DX Laser type G), a PD with a responsivity of n
pd = 0.12 A/W with a bandwidth of B = 50 GHz, an ER = 10 dB and
![Figure 6:
Experimental – model performance comparison: (a) NN inference accuracy when the first (dashed line) and the second (solid line) layers are quantized with a NEQB in [1,8]. Scatter points correspond to the modeled (rectangles) and experimental (stars) respective values at ONN axons’ bandwidth equal 2 (red), 16 (orange), and 50 (yellow) GHz (b) NEQB versus ONN axon’s bandwidth derived via the software model (solid line) and the experiment (star scatter points).](/document/doi/10.1515/nanoph-2022-0423/asset/graphic/j_nanoph-2022-0423_fig_006.jpg)
Experimental – model performance comparison: (a) NN inference accuracy when the first (dashed line) and the second (solid line) layers are quantized with a NEQB in [1,8]. Scatter points correspond to the modeled (rectangles) and experimental (stars) respective values at ONN axons’ bandwidth equal 2 (red), 16 (orange), and 50 (yellow) GHz (b) NEQB versus ONN axon’s bandwidth derived via the software model (solid line) and the experiment (star scatter points).

Experimental results: (a)–(d) Confusion matrices derived from the NN inference (a) via software, (b)–(d) when the linear part of the first layer is calculated via the SiPho chip at 50 Gbaud/axon and the second layer at (e) 2, (f) 16, and (g) 50 Gbaud/axon.
5 Conclusions
We demonstrated an analog silicon photonic engine and its capabilities to perform TMM and dynamic precision inference among neural layers towards latency-optimized DL accelerators. Initially, we proposed an optical TMM method using TDM with the ultimate target being the execution of high dimension NNs via low-radix ONN hardware. Following, a detailed theoretical model was developed, associating the noise sources and the bandwidth of an end-to-end ONN link with the noise equivalent bits. In the scope of speed- and accuracy-optimizing the inference of NN linear operations, we trained an NN for the classification of the IRIS dataset and experimentally applied a dynamic-precision inference via an integrated SiPho ONN using TMM. After breaking down each neural layer’s bit precision impact on the overall classification accuracy, we extracted the NEQB requirements in order for the NN accuracy not to be degraded by more than 1% and correlated with the ONN’s bandwidth via the developed model. Thereafter, following a dynamic-rate inference we experimentally computed the 1st neural layer at 50 GHz without imposing any degradation at the software acquired classification accuracy of 96.6%. With the output layer being more sensitive to the noise the experiment revealed a compute rate-dependent accuracy that was calculated equal to 68.6%, 86.4% and 93.1% when its linear operations were computed at 50, 16 and 2 Gbaud/axon, respectively, closely matching the modeled-expected values. Finally, we derive that the execution time benefits by the employment of the dynamic-precision NN inference, for the classification of the IRIS dataset, approximate to 55% compared to a fixed-precision scheme, without introducing any degradation to its accuracy.
Funding source: Hellenic Foundation for Research and Innovation
Award Identifier / Grant number: 4233
Funding source: European Commission
Award Identifier / Grant number: 101017194
Award Identifier / Grant number: 871391
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This work was supported by the EC through the H2020 Projects SiPHO-G (101017194) and PLASMONIAC (871391) and by the GSRT through project DeepLight (4233).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
[1] TECHNOLOGY - Mythic [Online]. Available at: https://www.mythic-ai.com/technology/ [accessed: Dec. 19, 2022].Search in Google Scholar
[2] Lightmatter [Online]. Available at: https://lightmatter.co/ [accessed: Dec. 19, 2022].Search in Google Scholar
[3] Lightellingence [Online]. Available at: https://www.lightelligence.ai/ [accessed: Dec. 19, 2022].Search in Google Scholar
[4] R. Stabile, G. Dabos, C. Vagionas, B. Shi, N. Calabretta, and N. Pleros, “Neuromorphic photonics: 2D or not 2D,” J. Appl. Phys., vol. 129, no. 20, p. 200901, 2021. https://doi.org/10.1063/5.0047946.Search in Google Scholar
[5] M. A. Nahmias, T. F. de Lima, A. N. Tait, H. Peng, B. J. Shastri, and P. R. Prucnal, “Photonic multiply-accumulate operations for neural networks,” IEEE J. Sel. Top. Quantum Electron., vol. 26, no. 1, pp. 1–18, 2020. https://doi.org/10.1109/JSTQE.2019.2941485.Search in Google Scholar
[6] A. R. Totović, G. Dabos, N. Passalis, A. Tefas, and N. Pleros, “Femtojoule per MAC neuromorphic photonics: an energy and technology roadmap,” IEEE J. Sel. Top. Quantum Electron., vol. 26, pp. 1–15, 2020. https://doi.org/10.1109/jstqe.2020.2975579.Search in Google Scholar
[7] G. Dabos, D. Bellas, R. Stabile, et al.., “Neuromorphic photonic technologies and architectures: scaling opportunities and performance frontiers,” Opt. Mater. Express, vol. 12, pp. 2343–2367, 2022. https://doi.org/10.1364/ome.452138.Search in Google Scholar
[8] Y. Chen, Y. Xie, L. Song, F. Chen, and T. Tang, “A survey of accelerator architectures for deep neural networks,” Engineering, vol. 6, no. 3, pp. 264–274, 2020. https://doi.org/10.1016/j.eng.2020.01.007.Search in Google Scholar
[9] J. Feldmann, N. Youngblood, M. Karpov, et al.., “Parallel convolutional processing using an integrated photonic tensor core,” Nature, vol. 589, pp. 52–58, 2021. https://doi.org/10.1038/s41586-020-03070-1.Search in Google Scholar PubMed
[10] G. Giamougiannis, A. Tsakyridis, G. Mourgias-Alexandris, et al.., “Silicon-integrated coherent neurons with 32GMAC/sec/axon compute line-rates using EAM-based input and weighting cells,” in 2021 European Conference on Optical Communication (ECOC), 2021.10.1109/ECOC52684.2021.9605987Search in Google Scholar
[11] G. Mourgias-Alexandris, M. Moralis-Pegios, S. Simos, et al.., “A silicon photonic coherent neuron with 10GMAC/sec processing line-rate,” in 2021 Optical Fiber Communications Conference and Exhibition (OFC), 2021, p. Tu5H.1.10.1364/OFC.2021.Tu5H.1Search in Google Scholar
[12] I. Kandel and M. Castelli, “Transfer learning with convolutional neural networks for diabetic retinopathy image classification. A review,” Appl. Sci., vol. 10, no. 6, p. 2021, 2020. https://doi.org/10.3390/app10062021.Search in Google Scholar
[13] F. Shokraneh, S. Geoffroy-Gagnon, M. S. Nezami, and O. Liboiron-Ladouceur, “A single layer neural network implemented by a 4x4 MZI-based optical processor,” IEEE Photon. J., vol. 11, no. 6, pp. 1–12, 2019. https://doi.org/10.1109/jphot.2019.2952562.Search in Google Scholar
[14] Y. Shen, N. C. Harris, S. Skirlo, et al.., “Deep learning with coherent nanophotonic circuits,” Nat. Photonics, vol. 11, no. 7, pp. 441–446, 2017. https://doi.org/10.1038/nphoton.2017.93.Search in Google Scholar
[15] B. Shi, N. Calabretta, and R. Stabile, “Deep neural network through an InP SOA-based photonic integrated cross-connect,” IEEE J. Sel. Top. Quantum Electron., vol. 26, no. 1, p. 7701111, 2020. https://doi.org/10.1109/JSTQE.2019.2945548.Search in Google Scholar
[16] H. Zhang, M. Gu, X. D. Jiang, et al.., “An optical neural chip for implementing complex-valued neural network,” Nat. Commun., vol. 12, p. 457, 2021. https://doi.org/10.1038/s41467-020-20719-7.Search in Google Scholar PubMed PubMed Central
[17] S. Ohno, K. Toprasertpong, S. Takagi, and M. Takenaka, “Si microring resonator crossbar array for on-chip inference and training of optical neural network,” arXiv:2106.04351, 2021.10.35848/1347-4065/ab6d82Search in Google Scholar
[18] F. Ashtiani, A. J. Geers, and F. Aflatouni, “An on-chip photonic deep neural network for image classification,” Nature, vol. 606, pp. 501–506, 2022. https://doi.org/10.1038/s41586-022-04714-0.Search in Google Scholar PubMed
[19] S. Garg, J. Lou, A. Jain, and M. Nahmias, Dynamic Precision Analog Computing for Neural Networks, 2021, [Online]. Available at: https://arxiv.org/abs/2102.06365.Search in Google Scholar
[20] E. Paolini, L. De Marinis, M. Cococcioni, et al.., “Photonic-aware neural networks,” Neural Comput. Appl., vol. 34, pp. 15589–15601, 2022. https://doi.org/10.1007/s00521-022-07243-z.Search in Google Scholar
[21] T. F. de Lima, A. N. Tait, H. Saeidi, et al.., “Noise analysis of photonic modulator neurons,” IEEE J. Sel. Top. Quantum Electron., vol. 26, no. 1, pp. 1–9, 2020. https://doi.org/10.1109/JSTQE.2019.2931252.Search in Google Scholar
[22] G. E. Moon, H. Kwon, G. Jeong, P. Chatarasi, S. Rajamanickam, and T. Krishna, “Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication,” IEEE Trans. Parallel Distr. Syst., vol. 33, no. 4, pp. 1002–1014, 2022. https://doi.org/10.1109/tpds.2021.3104240.Search in Google Scholar
[23] NVIDIA app. Note [Online]. Available at: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html [accessed: Dec. 19, 2022].Search in Google Scholar
[24] M. Moralis-Pegios, G. Mourgias-Alexandris, A. Tsakyridis, et al.., “Neuromorphic silicon photonics and hardware-aware deep learning for high-speed inference,” J. Lightwave Technol., vol. 40, pp. 3243–3254, 2022. https://doi.org/10.1109/JLT.2022.3171831.Search in Google Scholar
[25] G. Mourgias-Alexandris, M. Moralis-Pegios, A. Tsakyridis, et al.., “Channel response-aware photonic neural network accelerators for high-speed inference through bandwidth-limited optics,” Opt. Express, vol. 30, pp. 10664–10671, 2022. https://doi.org/10.1364/oe.452803.Search in Google Scholar
[26] A. Oikonomou, M. Kirtas, N. Passalis, et al.., “A robust, quantization-aware training method for photonic neural networks,” Engineering Applications of Neural Networks. EANN 2022. Communications in Computer and Information Science, 2022, pp. 427–438.10.1007/978-3-031-08223-8_35Search in Google Scholar
[27] M. Kirtas, N. Passalis, G. Mourgias-Alexandris, G. Dabos, N. Pleros, and A. Tefas, “Robust architecture-agnostic and noise resilient training of photonic deep learning models,” IEEE Trans. Emerg. Top. Comput. Intell., pp. 1–10, 2022, https://doi.org/10.1109/TETCI.2022.3182765.Search in Google Scholar
[28] I. Hubura, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: training neural networks with low precision weights and activations,” J. Mach. Learn. Res., vol. 18, pp. 1–30, 2018.Search in Google Scholar
[29] X. Sun, J. Choi, C.-Y. Chen, et al.., Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks, Vancouver, Canada, NeurIPS, 2019.Search in Google Scholar
[30] B. Murmann, “Mixed-signal computing for deep neural network inference,” IEEE Transactions on VLSI Systems, vol. 29, no. 1, pp. 3–13, 2021. https://doi.org/10.1109/TVLSI.2020.3020286.Search in Google Scholar
[31] Z. Dong, Z. Yao, A. Gholami, M. Mahoney, and K. Keutzer, HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision, 2019, [Online]. Available at: https://arxiv.org/abs/1905.03696.10.1109/ICCV.2019.00038Search in Google Scholar
[32] A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, A Survey of Quantization Methods for Efficient Neural Network Inference, 2021, [Online]. Available at: https://arxiv.org/abs/2103.13630.10.1201/9781003162810-13Search in Google Scholar
[33] K. Wang, Z. Liu, Y. Lin, J. Lin, and H. A. Q. Song Han, Hardware-Aware Automated Quantization with Mixed Precision, 2019, [Online]. Available at: https://arxiv.org/abs/1811.08886.10.1109/CVPR.2019.00881Search in Google Scholar
[34] G. Giamougiannis, A. Tsakyridis, Y. Ma, et al.., “A Coherent Photonic Crossbar for Scalable Universal Linear Optics.,” J. Lightwave Technol., 2023.10.1109/JLT.2023.3234689Search in Google Scholar
[35] G. Mourgias-Alexandris, A. Totovic, A. Tsakyridis, et al.., “Neuromorphic photonics with coherent linear neurons using dual-IQ modulation cells,” J. Lightwave Technol., vol. 38, no. 4, pp. 811–819, 2020. https://doi.org/10.1109/JLT.2019.2949133.Search in Google Scholar
[36] A. Tsakyridis, G. Giamougiannis, A. Totovic, and N. Pleros, “Fidelity restorable universal linear optics,” Adv. Photon. Res., vol. 3, no. 10, p. 2200001, 2022. https://doi.org/10.1002/adpr.202200001.Search in Google Scholar
[37] G. Giamougiannis, A. Tsakyridis, M. Moralis-Pegios, et al.., “Universal Linear Optics Revisited: New Perspectives for Neuromorphic Computing with Silicon Photonics,” IEEE J. Sel. Top. Quant. Electron., 2023. https://doi.org/10.1109/JSTQE.2022.3228318.Search in Google Scholar
[38] A. Tsakyridis, G. Giamougiannis, G. Mourgias-Alexandris, et al.., “Silicon photonic neuromorphic computing with 16 GHz input data and weight update line rates,” in Conference on Lasers and Electro-Optics, Technical Digest Series (Optica Publishing Group), 2022.10.1364/CLEO_SI.2022.STh5G.1Search in Google Scholar
[39] A. Tait, T. F. Da lima, M. A. Nahmias, et al.., “Silicon photonic modulator neuron,” Phys. Rev. Appl., vol. 11, no. 6, p. 064043, 2019. https://doi.org/10.1103/physrevapplied.11.064043.Search in Google Scholar
[40] A. Totovic, G. Giamougiannis, A. Tsakyridis, D. Lazovsky, and N. Pleros, “Programmable photonic neural networks combining WDM with coherent linear optics,” Sci. Rep., vol. 12, p. 5605, 2022. https://doi.org/10.1038/s41598-022-09370-y.Search in Google Scholar PubMed PubMed Central
[41] A. Totovic, C. Pappas, M. Kirtas, et al.., “WDM equipped universal linear optics for programmable neuromorphic photonic processors,” Neuromorphic Computing and Engineering, vol. 2, p. 024010, 2022. https://doi.org/10.1088/2634-4386/ac724d.Search in Google Scholar
[42] D. C. Montgomery and G. C. Runger, Applied Statistics and Probability for Engineers, 7th ed. Hoboken, New Jersey, Wiley, 2018, pp. 150–156.Search in Google Scholar
[43] Y. LeCun, B. Boser, J. S. Denker, et al.., “Backpropagation applied to handwritten zip code recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, 1989. https://doi.org/10.1162/neco.1989.1.4.541.Search in Google Scholar
[44] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, 2012. Available at: https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.Search in Google Scholar
[45] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv, 2015, https://doi.org/10.48550/ARXIV.1512.03385.Search in Google Scholar
[46] N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in Proc. IEEE Information Theory Workshop, 2015, pp. 1–5.10.1109/ITW.2015.7133169Search in Google Scholar
[47] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” arXiv, 2015, https://doi.org/10.48550/ARXIV.1512.00567.Search in Google Scholar
[48] W. Adam, I. Loshchilov, and F. Hutter, “Decoupled weight decay regularization,” arXiv, 2017, https://doi.org/10.48550/ARXIV.1711.05101.Search in Google Scholar
[49] G. Mourgias-Alexandris, A. Tsakyridis, N. Passalis, A. Tefas, K. Vyrsokinos, and N. Pleros, “An all-optical neuron with sigmoid activation function,” Opt. Exp., vol. 27, no. 7, pp. 9620–9630, 2019. https://doi.org/10.1364/oe.27.009620.Search in Google Scholar
© 2022 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Editorial
- Neural network learning with photonics and for photonic circuit design
- Reviews
- From 3D to 2D and back again
- Photonic multiplexing techniques for neuromorphic computing
- Perspectives
- A large scale photonic matrix processor enabled by charge accumulation
- Perspective on 3D vertically-integrated photonic neural networks based on VCSEL arrays
- Photonic online learning: a perspective
- Research Articles
- All-optical ultrafast ReLU function for energy-efficient nanophotonic deep learning
- Artificial optoelectronic spiking neuron based on a resonant tunnelling diode coupled to a vertical cavity surface emitting laser
- Parallel and deep reservoir computing using semiconductor lasers with optical feedback
- Neural computing with coherent laser networks
- Optical multi-task learning using multi-wavelength diffractive deep neural networks
- Diffractive interconnects: all-optical permutation operation using diffractive networks
- Photonic reservoir computing for nonlinear equalization of 64-QAM signals with a Kramers–Kronig receiver
- Deriving task specific performance from the information processing capacity of a reservoir computer
- Transfer learning for photonic delay-based reservoir computing to compensate parameter drift
- Analog nanophotonic computing going practical: silicon photonic deep learning engines for tiled optical matrix multiplication with dynamic precision
- A self-similar sine–cosine fractal architecture for multiport interferometers
- Power monitoring in a feedforward photonic network using two output detectors
- Fabrication-conscious neural network based inverse design of single-material variable-index multilayer films
- Multi-task topology optimization of photonic devices in low-dimensional Fourier domain via deep learning
Articles in the same Issue
- Frontmatter
- Editorial
- Neural network learning with photonics and for photonic circuit design
- Reviews
- From 3D to 2D and back again
- Photonic multiplexing techniques for neuromorphic computing
- Perspectives
- A large scale photonic matrix processor enabled by charge accumulation
- Perspective on 3D vertically-integrated photonic neural networks based on VCSEL arrays
- Photonic online learning: a perspective
- Research Articles
- All-optical ultrafast ReLU function for energy-efficient nanophotonic deep learning
- Artificial optoelectronic spiking neuron based on a resonant tunnelling diode coupled to a vertical cavity surface emitting laser
- Parallel and deep reservoir computing using semiconductor lasers with optical feedback
- Neural computing with coherent laser networks
- Optical multi-task learning using multi-wavelength diffractive deep neural networks
- Diffractive interconnects: all-optical permutation operation using diffractive networks
- Photonic reservoir computing for nonlinear equalization of 64-QAM signals with a Kramers–Kronig receiver
- Deriving task specific performance from the information processing capacity of a reservoir computer
- Transfer learning for photonic delay-based reservoir computing to compensate parameter drift
- Analog nanophotonic computing going practical: silicon photonic deep learning engines for tiled optical matrix multiplication with dynamic precision
- A self-similar sine–cosine fractal architecture for multiport interferometers
- Power monitoring in a feedforward photonic network using two output detectors
- Fabrication-conscious neural network based inverse design of single-material variable-index multilayer films
- Multi-task topology optimization of photonic devices in low-dimensional Fourier domain via deep learning