A deep reinforcement learning framework to modify LQR for an active vibration control applied to 2D building models

Emad Zuhair Gheni; Hussein M. H. Al-Khafaji; Hassan M. Alwan

doi:10.1515/eng-2022-0496

Article Open Access

A deep reinforcement learning framework to modify LQR for an active vibration control applied to 2D building models

Emad Zuhair Gheni , Hussein M. H. Al-Khafaji and Hassan M. Alwan

Published/Copyright: January 30, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Open Engineering Volume 14 Issue 1

Abstract

Deep reinforcement learning (DRL) has emerged as a promising approach for optimizing control policies in various fields. In this article, we explore the use of DRL for controlling vibrations in building structures. Specifically, we focus on the problem of reducing vibrations induced by external sources such as wind or earthquakes. We propose a DRL-based control framework that learns to adjust the control signal of a classical adaptive linear quadratic regulator (LQR)-based model to mitigate the vibration of building structures in real-time. The framework combines the proximal policy optimization method and a deep neural network that is trained using a simulation environment. The network takes input sensor readings from the building and outputs signals that work as a corrector to the signals from the LQR model. It demonstrates the approach’s effectiveness by simulating a 3-story building structure. The results show that our DRL-based control approach outperforms the classical LQR model in reducing building vibrations. Moreover, we show that the approach is robust for learning the system’s dynamics. Overall, the work highlights the potential of DRL for improving the performance of building structures in the face of external disturbances. The framework can be easily integrated into existing building control systems and extended to other control problems in structural engineering.

Keywords: vibration control; deep reinforcement learning; proximal policy optimization; LQR model; deep neural network

1 Introduction

Buildings are complex systems constantly subjected to various external forces, such as wind, earthquakes, and human activities. These forces can cause vibrations in the building structure, which can affect the comfort of occupants and potentially cause damage to the building itself. Therefore, controlling building structure vibrations ensures safety, comfort, and durability. Control of building structure vibrations has interested researchers and engineers for many years. Mainly, there are three control methods, namely, passive, semi-active, and active control methods [1]. Passive control methods involve passive devices such as tuned mass dampers, viscoelastic materials, and base isolation systems to dissipate energy and reduce vibration amplitudes. These devices are generally low-cost and easy to install, making them suitable for retrofitting existing structures [2]. For example for fibre-reinforced core insulation systems which are considered as a bonding part for flexible insulation bearings of fibre-reinforced synthetic rubber Bearings (FRSBs) are considered a type of basic insulation system that It has high strength, such as carbon or aramid, into the elastomeric material. The inclusion of fibers enhances the mechanical properties of the bearing, such as strength, stiffness, and energy dissipation capabilities. FREBs provide improved seismic performance and durability compared to traditional elastomeric bearings [3,4]. There is another type of isolation system called unbonded fiber-reinforced elastomeric isolators (UFREIs) for seismic protection of structures. UFREIs are a cost-effective alternative to conventional laminated rubber bearings. Unlike traditional isolators, UFREIs eliminate the need for anchorage bolts by utilizing fiber reinforcement and exploiting friction at the rubber–concrete interface as suggested in the study by De Domenico et al. [5], which investigates the behavior of UFREIs under triaxial loading conditions through experiments, identifies effects related to lateral coupling, and presents a phenomenological model for simulating the biaxial behavior of UFREIs. Active control methods use feedback control systems that measure the structure’s vibration and apply a counteracting force to reduce vibration amplitudes. These systems employ actuators, such as piezoelectric materials, hydraulic actuators, or electromagnetic shakers, to apply the control forces. Here, active control methods can provide high levels of vibration reduction [6,7,8]. Semi-active control methods are a combination of passive and active control methods where the damping force or stiffness of the system is adjusted in real time. These systems use smart materials, such as magnetorheological fluids, to modulate the damping characteristics of the system based on measured vibration levels. Semi-active control methods balance the effectiveness of active systems and the simplicity of passive systems [9,10,11].

It can be observed that passive and semi-active systems, while affordable and dependable, have limited performance as they cannot adjust to varying types of excitations. On the other hand, structural response reduction through active control systems, such as active mass dampers, active tendon systems, and active brace systems, can achieve high-performance results by calculating the necessary actuator control force based on real-time observations [6,8,12]. Various active control algorithms and strategies, including linear quadratic regulator (LQR) [13], pole assignment, and sliding mode control [14], have been applied in the past 60 years to obtain precise control forces with sensor measurements. Nevertheless, most optimal control algorithms require complete knowledge of the system dynamics [15], and their control performance is primarily reliant on the precision of model parameters [16,17].

In recent years, machine learning has emerged as a powerful tool for solving complex problems in various fields, including structural engineering. Machine learning algorithms can be used to analyze large amounts of data collected from sensors installed on the building and identify the best control strategies for reducing vibrations. This approach, known as data-driven control, can potentially improve the efficiency and effectiveness of traditional control methods. The data collected can include information about the building’s structural properties, the external forces acting on it, and the vibrations themselves. Once the data are collected, machine learning algorithms can be used to develop predictive models that can be used to optimize the control system and reduce vibrations. These models can consider various factors, such as the building’s structural properties, the external forces acting on the building, and the effectiveness of different control strategies. The models can also be updated in real-time as new data are collected, allowing the control system to adapt to changing conditions. Recently, deep reinforcement learning (DRL)-based approaches have shown promising results regarding structural vibration control. Herein, deep learning is a subset of machine learning in which neural networks (NNs) with multiple layers are utilized in the model [18]. Chase et al. [19] proposed a deep Q-network (DQN)-based model to improve structural responses using feedback control. Their results showed a remarkable reduction in the structure response when subjected to earthquake excitation. Kim and Kim [20] applied a deep DQN to perform a semi-active control of a sample building structure vibration. They reported an effective reduction in the seismic response of the building structure. Recently, Zhang and Zhu [21] developed a deep deterministic policy gradient-based model to control the vibration of a single-degree-of-freedom (single-DOF) and a multi-DOF shear-building model. Their model showed similar results to the classical LQR, and better results were shown in a partially observed system. Although the aforementioned studies have shown commendable results, further advancements in control models are necessary to achieve even better control performance. While classical methods like the LQR have proven effective in controlling linear systems with known dynamics, they encounter limitations when it comes to handling nonlinearities, uncertainties, and the complex dynamics often encountered in real-world structures. In contrast, DRL offers a promising approach by leveraging deep NNs and reinforcement learning (RL) algorithms to learn control policies directly from data, without requiring explicit knowledge of the system dynamics. This distinctive characteristic enables DRL to potentially handle nonlinearities and uncertainties more effectively compared to classical methods such as the LQR. The key discrepancy between DRL and classical methods lies in DRL’s ability to adapt and learn control policies that are specifically tailored to the unique dynamics of a building and the external disturbances it faces. By training a DRL agent on a wide range of scenarios and data, it can capture the complex interactions and dynamics that may exist within the building. Consequently, the agent can learn to respond and adapt to various situations, optimizing its control strategies for improved performance [22]. Furthermore, DRL offers the advantage of being able to handle uncertainties and model inaccuracies in a more robust manner [23]. Traditional control methods often rely on precise mathematical models, which can be challenging to obtain or may contain inherent uncertainties. DRL, on the other hand, can learn from experience and adapt its control policies accordingly, allowing it to cope with uncertainties and model errors more effectively. Overall, the utilization of DRL in control systems for buildings holds great potential for addressing the limitations of classical methods. By leveraging the power of deep NNs and RL, DRL can offer more adaptive and robust control solutions tailored to the specific dynamics and uncertainties present in real-world structures.

In this context, this article presents a DRL model based on the policy gradient optimization (proximal policy optimization [PPO]) method developed to enhance the performance of a classical LQR control algorithm. The combination of traditional and DRL approaches is used in this study with the aim of not only a practical improvement in the seismic response but also a remarkable reduction in the computational cost of the DRL. The remainder of this article is organized as follows: Section 2 explains the methodology for controlling building vibrations using the proposed DRL model; Section 3 discusses the results of testing the proposed model; and Section 4 presents the conclusions of this study and future perspectives.

2 Methodology

2.1 Concept of DRL

RL differs from supervised and unsupervised learning as it relies on the Markov decision process, an iterative cycle involving an agent interacting with its environment. This process consists of four components: state s, action a, policy π(a|s), and reward r. As shown in Figure 1, in DRL, the agent is represented by a deep NN, which takes action at instant t that alters the state of the environment and receives a reward that provides feedback on the effectiveness of the chosen activity. At each iteration, the agent selects an action based on the policy and learns from the collected states, actions, and rewards over multiple iterations to determine an optimal approach π*(a|s) that maximizes the long-term reward.

Figure 1

Representation of the DRL concept.

2.2 Mathematical model for building structure and seismic excitations

The analytical model used in this study as a simulation environment is a three-story, single-bay model building [24] based on the structure proposed in the study by Dyke et al. [6]. The test structure is a scale model of the prototype building discussed by Chung et al. [25]. One horizontal DOF is used for each story in the model. The height of the structure is 158 cm. The summation of the floor masses of the model is 227 kg, distributed evenly between the three floors, with a structural frame of 77 kg. The natural frequencies of the model are roughly five times higher than the prototypes due to the time scale factor of 0.2. The structural system of the model exhibits its first three shape modes at frequencies of 5.81, 17.68, and 28.53 Hz. The damping ratios associated with these modes are 0.33%, 0.23%, and 0.30%, respectively. The model’s force quantity is 1:60, the mass amount is 1:206, the time quantity is 1:5, the displacement quantity is 4:29, and the acceleration quantity is 7:2, corresponding to the prototype structure. An active mass dumper (AMD), as used in building models, is a mechanism or apparatus that lowers and regulates vibrations in tall buildings or other structures. Numerous dynamic stresses, including wind, seismic activity, and vibrations caused by people, can affect tall buildings. These dynamic loads have the ability to disturb the building, which could cause structural problems as well as discomfort for the residents. In order to reduce these vibrations and improve the building’s efficiency, active mass dumpers are used. The moving mass of the AMD is 1.7% of the total mass of the structure.

The state–space representation can be described as follows:

(1) x ̇ = Ax + B u + E x ̈ g ,

(2) y = C y x + D y u + F y x ̈ g + v ,

(3) z = C z x + D z u + F z x ̈ g ,

where x is the state vector, x ̈ g is the scalar ground acceleration, u is the scalar control input, y is the vector of responses that can be directly measured, and z is the vector of responses that can be regulated. v is the vector of measurement noises. A , B , E , C y , D y , C z , D z , F y , and F z are matrices of appropriate dimensions. The determination of coefficient matrices in equations (1)–(3) is based on the data collected at the SDC/EEL using identification methods presented in the Building and Structural Design/Engineered Energy-efficient Lighting Structural Design and Construction (SDC), a concept linked to energy efficiency and building design, is combined in this expansion. This is the process of creating a building's actual physical structure, whereas engineered energy-efficient lighting, or EEL. This phrase emphasis how crucial energy-efficient lighting fixtures are for buildings [6,26]. The obtained model accounts for the input–output behavior of the structural system, considering the effects of actuator/sensor dynamics and control–structure interaction.

A filtered Gaussian process was used as the excitation to simulate the ground acceleration. The Kanai–Tajimi shaping filter is a popular choice for creating artificial ground motion as it effectively captures the characteristics of seismic ground motions, as outlined in [27,28]. Thus, the Kanai–Tajimi shaping filter is used in this study, and its spectrum can be represented as follows:

(4) F ( s ) = 2 ζ g ω g s + ω g 2 s 2 + 2 ζ g ω g s + ω g 2 .

In the context of the given information, it can be stated that the soil’s frequency is 37.3 rad/s, which refers to the rate at which the soil oscillates during seismic events. In addition, the soil’s damping ratio is 0.3, representing the measure of energy dissipation in the soil system. These values, denoted as ωg and ζg, respectively, provide crucial insights into the dynamic behavior of the soil and its response to seismic forces. The time step is set to 0.005 s.

2.3 DRL framework: A comprehensive overview and implementation guide

This study presents a DRL model that is built on PPO [29,30], which belongs to the family of policy gradient methods. PPO is heavily used in DRL-based control problems for several reasons. One is its simplicity in terms of mathematical complexity and faster execution compared to Trust Region Policy Optimization methods [31]. In addition, this method requires minimal meta-parameter tuning, making it highly adaptable to continuous control problems. It is also more suitable for such problems than DQN [32]. An introduction to the PPO method is briefly presented in Appendix A. In this study, we seek a model that applies an optimal strategy to change the control signal of AMD, represented by voltage values, as shown in Figure 2.

Figure 2

Schematic of the DRL framework for the active vibration control.

Let a t be the value of instantaneous values that are added to the control signal u t from a classical LQR model at iteration step t. An agent that uses deep reinforcement learning follows a policy to decide what to do and how to behave in a given environment. The approach or behavior that the agent uses to choose actions based on its observations of the surroundings is represented by the policy. The structure of a policy in deep reinforcement learning varies based on the particular algorithm and problem being solved, the agent works under a policy π ( a t | s t ) , where a i , j n ∈ A , which is the range of actions [−1,1] u rms (root mean square of the control signal from the LQR model). Each agent obtains the next state, i.e., s t + 1 and reward r t + 1 from the environment by taking the action a t . Here, the states are represented by the accelerations and displacements of the three floors.

The drifts of the three floors are utilized in the reward function, which enables the DRL model to follow an optimal strategy. Hence, the objective of the DRL model is to learn the control policy that maximizes the expected long-term rewards:

(5) π * = arg max π E π ∑ t = 1 N γ ( t − 1 ) r t ,

where γ ( t − 1 ) is the (t–1)-th power of the discount factor γ , which determines the weights of the immediate rewards in the future iteration steps. In this study, the value of γ is set to 0.95. This equation refers to the objective of maximizing the expected cumulative discounted rewards over a sequence of time steps, represented by t = 1 to N, and contains the (arg max) denotes that we are searching for the argument (in this case, the policy). In reinforcement learning, the objective is to maximize the long-term cumulative rewards an agent can obtain while interacting with an environment. However, immediate rewards might be more valuable in many scenarios than future rewards. The discount factor allows the agent to balance the importance of immediate and future rewards.

The primary reward function is defined as follows:

(6) r t = − [ ( D t 2 ) 1 st + ( D t 2 ) 2 nd + ( D t 2 ) 3 rd ] .

In this study, we use 50 iteration steps for each episode in the training process of the DRL model.

3 Results and discussion

Primarily, the learning process of the proposed model is investigated. As shown in Figure 3, the reward values increase gradually and reach a constant level after only 600 episodes, indicating a commendable learning performance of the model.

Figure 3

Reward function progress during the training process.

The instantaneous effect of the control signal is examined utilizing the drifts and accelerations of the three floors. As shown in Figure 4, the drift values tend to decrease with the floor increase. Here, the results from the DRL model show fewer values compared to the classical LQR control method. This indicates that the DRL model could effectively reduce the displacement of each floor. Furthermore, the model shows a remarkable capability in reducing each floor's acceleration (the plotted acceleration is relative to the ground acceleration affected by seismic excitation and normalized by the gravitational acceleration g), as shown in Figure 5. Here, the model keeps its performance along the period of the artificial ground excitation and for all the floors. The model performance is measured statistically by means of the root mean square (RMS) of the accelerations and drifts of the three floors. As shown in Figure 6(a), the RMS of the acceleration of each floor is reduced by about 20% compared with the case when a classical LQR model is used alone. Similarly, the model performance in terms of the RMS of the drifts of the floors is presented in Figure 6(b). Here, it can be observed that the model is learned to correct the signal from the classical LQR model to obtain the optimum reduction in the displacement and acceleration of each floor.

Figure 4

Instantaneous drift plots of the first floor (top), second floor (middle), and third floor (bottom).

Figure 5

Instantaneous acceleration plots of the first floor (top), second floor (middle), and third floor (bottom).

Figure 6

(a) RMS of the acceleration of each floor; (b) RMS of the drift of each floor.

The scatterplot of the control signal and the drift of the third floor are presented in Figure 7. Here it can be observed that the trained DRL model can guarantee the typical correlation between the control signal and the drift, indicating that the model could learn the physical behavior of the model structure without prior knowledge. This behavior is also recently observed by Zhang and Zhu [21].

Figure 7

Scatterplot of the control signal and the third-floor instantaneous drift.

Figure 8 presents an overview of AMD behavior under the DRL model control. It can be observed from the figure that the behavior of the AMD in terms of acceleration is relatively different from the change of position during the period of excitation, as the values of the acceleration show a more random nature. This can be connected to the results from Figure 7 as the DRL model tries to obtain the optimum performance that can tackle the dynamics of the model structure with quasi-random changing of the AMD position.

Figure 8

Instantaneous AMD position (top) and instantaneous AMD acceleration (bottom).

Finally, we can infer potential assumptions of the current study and discuss their potential impact on the model performance. First, the use of a classical LQR-based model as the baseline for comparison with the proposed DRL-based control framework. The assumption here is that the LQR model represents the dynamics of the building structure with linear approximation. However, in reality, building structures can exhibit complex and nonlinear behavior that may not be fully captured by the LQR model. As shown from the results, deviations between the assumed model and the actual behavior of the building could be enhanced by the DRL-based approach. Second, the deep NN used in the control framework is trained using a simulation environment. The assumption here is that the simulation environment approximates the real-world behavior of the building and external disturbances. However, there can be discrepancies between the simulation and real-world dynamics. Assumptions regarding the accuracy of the simulation, modeling techniques, and parameter values can introduce uncertainties that might affect the performance of the DRL-based control approach when applied to real-world scenarios. Third, the DRL-based control approach is demonstrated using a simulation of a three-story building structure. The generalization ability of the NN to handle diverse building structures is an important factor that can influence the final results.

These presumptions suggest that the accuracy of the assumed models, simulation environments, external disturbances, and generalization capabilities of the NN can have an impact on the final results of the proposed DRL-based control framework. Deviations from these assumptions could affect the effectiveness, robustness, and practical applicability of the approach in real-world scenarios.

4 Conclusions

This study proposes a newly developed DRL framework for actively controlling building structure vibration under the seismic effect. The PPO method was applied along with artificial neural network (ANN) to obtain opinion actions that can enhance the signal of a classical LQR model. The training results of the model showed that the model could learn within a relatively few episodes. The instantaneous effects of the drifts and accelerations of the structure model floors revealed a remarkable reduction that outperformed the performance of the LQR model. Furthermore, the statistical results showed a 20% improvement in the reduction of the drifts compared with the LQR model. The correlation between the control signal and the drift of the third floor, as well as the AMD position changing and acceleration, revealed that the DRL model learned to provide optimum control signal without any prior included about the dynamics of the model structure. The results from this study suggest that DRL-based models have a remarkable potential to be practically applied to active vibration control problems, including multi-story building structures. It is necessary to mention that the essential part of the proposed DRL-based control framework for reducing vibrations in building structures is the requirement for a simulation environment for training the deep NN. While simulations provide a controlled and scalable environment for learning, there might be discrepancies between the simulation and the real-world dynamics of the building. These discrepancies can arise from uncertainties in the model parameters, unmodeled dynamics, or inaccuracies in the simulation itself. Therefore, the effectiveness of the DRL-based approach in real-world scenarios needs to be carefully evaluated and validated via experiments to be utilized in real-life applications. In this study, we provided a demonstration of the effectiveness of coupling DRL and classical models to control structure-building vibrations. This paves the way to develop more sophisticated models that can be applied to real-life problems of controlling structure-building vibrations. In future studies, it is recommended to enhance the structural model by incorporating accurate hysteresis models [33]. These models are essential for capturing and simulating the nonlinear behaviors that are commonly observed in real building structures. By incorporating accurate hysteresis models, the structural model will be able to better represent the complex response of real structures under varying loading conditions, including phenomena such as material nonlinearity, energy dissipation, and hysteresis effects. Another much more versatile, accurate, and computationally efficient hysteresis model [34] proposed a unified approach for modeling rate-independent hysteretic behavior in mechanical systems and materials. It classified various types of complex hysteresis loops observed in experiments and introduced a novel exponential model that offers computational efficiency and ease of implementation. Finally, future work can be dependent on a reformulation of the analytical model of hysteresis to simulate a wide range of complex rate-independent mechanical hysteresis phenomena. The reformulation eliminates the need for evaluating internal variables and expresses the closed-form expressions in rate form [35].

Funding information: Also, I declare that the manuscript was done depending on the personal effort of the author, and there is no funding effort from any side or organization.
Conflict of interest: The authors state that there is no conflict of interest.
Data availability statement: Most datasets generated and analyzed in this study are in this submitted manuscript. The other datasets are available on reasonable request from the corresponding author with the attached information.

Appendix A. PPO algorithm

PPO is known for its stability and ability to handle continuous action spaces, which makes it suitable for tasks that involve ongoing control actions. PPO belongs to policy gradient algorithms, where the model aims to obtain the optimal policy directly π * ( a t | s t ) , which maximizes the long-term reward function, R ( t ) = ∑ t = 1 γ t − 1 r t , where γ is the discount factor and ranges between 0 and 1. Unlike in other methods, such as Q-learning, where an indirect description of the policy is represented by the ANN, the policy is directly obtained using the ANN in the policy gradient methods. The goal of training in the policy gradient methods is to obtain the maximum reward such that

(A.1) R max = max Θ E ∑ t = 0 H R ( s t ) | π Θ ,

where π Θ is the policy function, Θ represents the weights of the ANN, and s t represents the state of the system.

If we denote τ as an (s, a, r)-based sequence,

(A.2) τ = ( s 0 , a 0 , r 0 ) , ( s 1 , a 1 , r 1 ) , … ( s H , a H , r H ) ,

then we can define a value function (which is the quantity that should be maximized) as follows:

(A.3) ∇ Θ = E ∑ t = 0 H R ( s t , u t ) | π Θ = ∑ τ P ( τ , Θ ) R ( τ ) .

With mathematical manipulations, one can obtain

(A.4) ∇ Θ V ( Θ ) = ∑ τ ∇ Θ P ( τ , Θ ) R ( τ ) = ∑ τ P ( τ , Θ ) P ( τ , Θ ) ∇ Θ P ( τ , Θ ) R ( τ ) = ∑ τ P ( τ , Θ ) ∇ Θ P ( τ , Θ ) P ( τ , Θ ) R ( τ ) = ∑ τ P ( τ , Θ ) ∇ Θ log ( P ( τ , Θ ) ) R ( τ ) .

Equation (A.4) represents a new expected value, which can be sampled under π Θ and used as the input to the gradient descent. Here, one can estimate the policy-dependent log-prob gradient as follows:

References

[1] Spencer Jr BF, Nagarajaiah S. State of the art of structural control. J Struct Eng. 2003;129(7):845–56.10.1061/(ASCE)0733-9445(2003)129:7(845)Search in Google Scholar

[2] Cheng FY, Jiang H, Lou K. Smart structures: innovative systems for seismic response control. CRC Press; 2008. p. 1–652.10.1201/9781420008173Search in Google Scholar

[3] Ghorbi E, Toopchi-Nezhad H. Annular fiber-reinforced elastomeric bearings for seismic isolation of lightweight structures. Soil Dyn Earthq Eng. 2023;166:107764.10.1016/j.soildyn.2023.107764Search in Google Scholar

[4] Wang H, Mu H, Guo X, Zhang Y, Ji H, Luo C, et al. Experimental and numerical simulation study on mechanical properties of fiber-reinforced plastic seismic isolator. Eng Struct. 2023;275:115108.10.1016/j.engstruct.2022.115108Search in Google Scholar

[5] De Domenico D, Losanno D, Vaiana N. Experimental tests and numerical modeling of full-scale unbonded fiber reinforced elastomeric isolators (UFREIs) under bidirectional excitation. Eng Struct. 2023;274:115118.10.1016/j.engstruct.2022.115118Search in Google Scholar

[6] Dyke SJ, Spencer Jr BF, Quast P, Kaspari Jr DC, Sain MK. Implementation of an active mass driver using acceleration feedback control. Comput Civ Infrastruct Eng. 1996;11(5):305–23.10.1111/j.1467-8667.1996.tb00445.xSearch in Google Scholar

[7] Wang L, Nagarajaiah S, Shi W, Zhou Y. Seismic performance improvement of base-isolated structures using a semi-active tuned mass damper. Eng Struct. 2022;271:114963.10.1016/j.engstruct.2022.114963Search in Google Scholar

[8] Bossens F, Preumont A. Active tendon control of cable‐stayed bridges: a large‐scale demonstration. Earthq Eng Struct Dyn. 2001;30(7):961–79.10.1002/eqe.40Search in Google Scholar

[9] Bagherkhani A, Baghlani A. Reliability assessment of MR fluid dampers in passive and semi-active seismic control of structures. Probab Eng Mech. 2021;63:103114.10.1016/j.probengmech.2020.103114Search in Google Scholar

[10] Karami K, Manie S, Ghafouri K, Nagarajaiah S. Nonlinear structural control using integrated DDA/ISMP and semi-active tuned mass damper. Eng Struct. 2019;181:589–604.10.1016/j.engstruct.2018.12.059Search in Google Scholar

[11] Soto MG, Adeli H. Semi-active vibration control of smart isolated highway bridge structures using replicator dynamics. Eng Struct. 2019;186:536–52.10.1016/j.engstruct.2019.02.031Search in Google Scholar

[12] Reinhorn AM, Soong TT, Lin RC, Riley MA, Wang YP, Aizawa S, et al. Active bracing system: a full scale implementation of active control. Natl Cent Earthq Eng Res. 1992;14:1–122.Search in Google Scholar

[13] Yang J-N. Application of optimal control theory to civil engineering structures. J Eng Mech Div. 1975;101(6):819–38.10.1061/JMCEA3.0002075Search in Google Scholar

[14] Song G, Gu H. Active vibration suppression of a smart flexible beam using a sliding mode based controller. J Vib Control. 2007;13(8):1095–107.10.1177/1077546307078752Search in Google Scholar

[15] Casciati F, Rodellar J, Yildirim U. Active and semi-active control of structures–theory and applications: a review of recent advances. J Intell Mater Syst Struct. 2012;23(11):1181–95.10.1177/1045389X12445029Search in Google Scholar

[16] Ying Z, Ni Y. Optimal control for vibration peak reduction via minimizing large responses. Struct Control Heal Monit. 2015;22(5):826–46.10.1002/stc.1722Search in Google Scholar

[17] Liu Q, Zhang W, Bhatt MW, Kumar A. Seismic nonlinear vibration control algorithm for high-rise buildings. Nonlinear Eng. 2022;10(1):574–82.10.1515/nleng-2021-0048Search in Google Scholar

[18] Goodfellow I, Bengio Y, Courville A. Regularization for deep learning. Deep learning. USA: MIT Press; 2016. p. 216–261Search in Google Scholar

[19] Chase G, Rahmani HR, Wiering M, Könke C. A framework for brain learning-based control of smart structures. Adv Eng Inform. 2019;42(2):10098610.1016/j.aei.2019.100986Search in Google Scholar

[20] Kim H-S, Kim U. Development of a control algorithm for a semi-active mid-story isolation system using reinforcement learning. Appl Sci. 2023;13(4):2053.10.3390/app13042053Search in Google Scholar

[21] Zhang Y-A, Zhu S. Novel model-free optimal active vibration control strategy based on deep reinforcement learning. Struct Control Heal Monit. 2023;2023:14.10.1155/2023/6770137Search in Google Scholar

[22] Chen Y, Zhu J, Liu Y, Zhang L, Zhou J. Distributed hierarchical deep reinforcement learning for large-scale grid emergency control. IEEE Trans Power Syst. 2023:1–13.10.1109/TPWRS.2023.3298486Search in Google Scholar

[23] Zhang Y, Shi X, Zhang H, Cao Y, Terzija V. Review on deep learning applications in frequency analysis and control of modern power system. Int J Electr Power Energy Syst. 2022;136:107744.10.1016/j.ijepes.2021.107744Search in Google Scholar

[24] Lu J, Skelton RR. Covariance control using closed‐loop modelling for structures. Earthq Eng Struct Dyn. 1998;27(11):1367–83.10.1002/(SICI)1096-9845(1998110)27:11<1367::AID-EQE789>3.0.CO;2-QSearch in Google Scholar

[25] Chung LL, Lin RC, Soong TT, Reinhorn AM. Experimental study of active control for MDOF seismic structures. J Eng Mech. 1989;115(8):1609–27.10.1061/(ASCE)0733-9399(1989)115:8(1609)Search in Google Scholar

[26] Dyke SJ, Spencer Jr BF, Quast P, Sain MK, Kaspari Jr DC, Soong TT. Experimental verification of acceleration feedback control strategies for an active tendon system. Nat Cent Earthq Engrg Res, Tech Rep NCEER-94. 1994;24:1–106.Search in Google Scholar

[27] Alotta G, Di Paola M, Pirrotta A. Fractional Tajimi–Kanai model for simulating earthquake ground motion. Bull Earthq Eng. 2014;12(6):2495–506.10.1007/s10518-014-9615-zSearch in Google Scholar

[28] Ramallo JC, Johnson EA, Spencer Jr BF. “Smart” base isolation systems. J Eng Mech. 2002;128(10):1088–99.10.1061/(ASCE)0733-9399(2002)128:10(1088)Search in Google Scholar

[29] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv Prepr arXiv170706347; 2017.Search in Google Scholar

[30] Li R, Zhang C, Xie W, Gong Y, Ding F, Dai H, et al. Deep reinforcement learning empowers automated inverse design and optimization of photonic crystals for nanoscale laser cavities. Nanophotonics. 2023;12(2):319–34.10.1515/nanoph-2022-0692Search in Google Scholar

[31] Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: International Conference on Machine Learning. PMLR; 2015. p. 1889–97.Search in Google Scholar

[32] Rahimi F, Aghayari R, Samali B. Application of tuned mass dampers for structural vibration control: a state-of-the-art review. Civ Eng J. 2020;6(8):1622–51.10.28991/cej-2020-03091571Search in Google Scholar

[33] Du X, Zhang Y, Li J, Liao C, Zhang H, Xie L, et al. Unsteady and hysteretic behavior of a magnetorheological fluid damper: modeling, modification, and experimental verification. J Intell Mater Syst Struct. 2023;34(5):551–68.10.1177/1045389X221111555Search in Google Scholar

[34] Vaiana N, Rosati L. Classification and unified phenomenological modeling of complex uniaxial rate-independent hysteretic responses. Mech Syst Signal Process. 2023;182:109539.10.1016/j.ymssp.2022.109539Search in Google Scholar

[35] Vaiana N, Rosati L. Analytical and differential reformulations of the Vaiana–Rosati model for complex rate-independent mechanical hysteresis phenomena. Mech Syst Signal Process. 2023;199:110448.10.1016/j.ymssp.2023.110448Search in Google Scholar

Received: 2023-05-30

Revised: 2023-07-16

Accepted: 2023-07-18

Published Online: 2024-01-30

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/eng-2022-0496

Keywords for this article

vibration control; deep reinforcement learning; proximal policy optimization; LQR model; deep neural network

Creative Commons

BY 4.0