Abstract
New data centers today demand more scalable and efficient solutions with the rapid advances in deep learning, distributed systems, and high-concurrency data processing. Despite the significant improvements in these areas, existing systems continue to suffer from latency, congestion control, and resource allocation, especially during high workloads and network failures. Two new frameworks for optimizing RDMA-based data flow in remote settings – ConVer and DynoFlow – are introduced in this research. ConVer employs user-space path management and multipath transfer to achieve low latency and high throughput, whereas DynoFlow provides a software-defined, modular approach that efficiently handles network failures and varying traffic loads. Per experimental results, ConVer significantly improves throughput and achieves a latency of 1.3 m s (99th percentile). DynoFlow fares better in coping with network failures and optimizing route utilization. We solve significant problems in modern data centers by demonstrating through extensive experimental evaluation that the frameworks significantly enhance throughput and resilience in distributed systems.
Funding source: China Southern Power Grid Corporation 2024 Science and Technology Project Application Guide (FirstBatch), Research and design of network proto- Col processing engine based on localized remote memory direct access
Award Identifier / Grant number: GXKJXM20240047
-
Research ethics: This article does not contain any studies with human participants performed by any of the authors.
-
Informed consent: Not applicable.
-
Author contributions: Boling Chen, Xiaoying Mo, Dengbin Liao is responsible for designing the framework, analyzing the performance, validating the results, and writing the article. Junbing Pan, Anni Huang is responsible for collecting the information required for the framework, provision of software, critical review, and administering the process.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: Authors do not have any conflicts.
-
Research funding: This work was supported by China Southern Power Grid Corporation 2024 Science and Technology Project Application Guide (FirstBatch), Research and design of network proto- Col processing engine based on localized remote memory direct access (GXKJXM20240047).
-
Data availability: No datasets were generated or analyzed during the current study.
References
1. Lu, K, Zhang, Y, Chen, M, Liu, Z, Li, G, Wan, J, et al.. Scythe: a low-latency RDMA-enabled distributed transaction system for disaggregated memory. ACM Trans Archit Code Optim 2024;21:1–26. https://doi.org/10.1145/3666004.Search in Google Scholar
2. Du, J, Feng, L, Zhao, W, Zhang, M, Cao, Y, Zou, X, et al.. Fast one-sided RDMA-based state machine replication for disaggregated memory. ACM Trans Archit Code Optim 2023;20:1–25. https://doi.org/10.1145/3587096.Search in Google Scholar
3. Wang, Z., Luo, L., Ning, Q., Zeng, C., Li, W., Wan, X., et al.. A scalable architecture for RDMA. In: 20th USENIX symposium on networked systems design and implementation (NSDI 23), Boston. USA; 2023, pp. 1–14.Search in Google Scholar
4. Li, W, Zhao, H, Chen, L, Zhang, Y. Gleam: an RDMA-accelerated multicast protocol for datacenter networks, arXiv preprint arXiv:2307.14074 2023.Search in Google Scholar
5. Pan, Y, Guo, Z, Zhang, M. Design of a fast and scalable FPGA-based bitmap for RDMA networks. Electronics 2024;13:24. https://doi.org/10.3390/electronics13244900.Search in Google Scholar
6. Li, S, Chen, Y, Liu, X. SDLoRe: a loss recovery algorithm based on segment detection in lossy RDMA networks. Comput Netw 2025;258:111019.10.1016/j.comnet.2024.111019Search in Google Scholar
7. Jasny, M, Ziegler, T, Binnig, C. Scalable data management on next-generation data center networks. In: Sattler, K-U, Kemper, A, Neumann, T, Teubner, J, editors. Scalable Data Management for Future Hardware. Cham: Springer; 2025:199–221 pp.10.1007/978-3-031-74097-8_8Search in Google Scholar
8. Wang, Z, Guo, Y, Lu, K, Wan, J, Wang, D, Yao, T, et al.. Rcmp: reconstructing RDMA-based memory disaggregation via CXL. ACM Trans Archit Code Optim 2024;21:15:1–15:26. https://doi.org/10.1145/3634916.Search in Google Scholar
9. Liu, L, Xiao, F, Han, L, Fan, W, He, X. GTCC: a game theoretic approach for efficient congestion control in datacenter networks. IEEE Trans Netw Sci Eng 2024;11:6328–44. https://doi.org/10.1109/tnse.2024.3443099.Search in Google Scholar
10. Hu, J. SRCC: Sub-RTT congestion control for lossless datacenter networks. IEEE Trans Ind Inf 2024:1–10.Search in Google Scholar
11. Sun, Z, Guo, Z, Ma, J, Pan, Y. A high-performance FPGA-based RoCE v2 RDMA packet parser and generator. Electronics 2024;13:20. https://doi.org/10.3390/electronics13204107.Search in Google Scholar
12. Chi, C, Ji, K, Song, P, Marahatta, A, Zhang, S, Zhang, F, et al.. Cooperatively improving data center energy efficiency based on multi-agent deep reinforcement learning. Energies 2021;14:2071. https://doi.org/10.3390/en14082071.Search in Google Scholar
13. Uddin, M, Hamdi, M, Alghamdi, A, Alrizq, M, Memon, MS, Abdelhaq, M, et al.. Server consolidation: a technique to enhance cloud data center power efficiency and overall cost of ownership. Int J Distributed Sens Netw 2021;17. https://doi.org/10.1177/1550147721997218.Search in Google Scholar
14. Tarahomi, M, Izadi, M, Ghobaei-Arani, M. An efficient power-aware VM allocation mechanism in cloud data centers: a micro genetic-based approach. Clust Comput 2021;24:919–34. https://doi.org/10.1007/s10586-020-03152-9.Search in Google Scholar
15. Ran, Y, Zhou, X, Hu, H, Wen, Y. Optimizing data center energy efficiency via event-driven deep reinforcement learning. IEEE Trans Serv Comput 2023;16:1296–309. https://doi.org/10.1109/tsc.2022.3157145.Search in Google Scholar
16. Ran, Y, Hu, H, Wen, Y, Zhou, X. Optimizing energy efficiency for data center via parameterized deep reinforcement learning. IEEE Trans Serv Comput 2023;16:1310–23. https://doi.org/10.1109/tsc.2022.3184835.Search in Google Scholar
17. Singh, AK, Swain, SR, Lee, CN. A metaheuristic virtual machine placement framework toward power efficiency of sustainable cloud environment. Soft Comput 2023;27:3817–28. https://doi.org/10.1007/s00500-022-07578-8.Search in Google Scholar
18. Rajan, SK, Ramakrishnan, B, Alissa, H, Kim, W, Belady, C, Bakir, MS. Integrated silicon microfluidic cooling of a high-power overclocked CPU for efficient thermal management. IEEE Access 2022;10:59259–69. https://doi.org/10.1109/access.2022.3179387.Search in Google Scholar
19. Reddy, KHK, Luhach, AK, Kumar, VV, Pratihar, S, Kumar, D, Roy, DS. Towards energy efficient smart city services: a software defined resource management scheme for data centers. Sustain Comput Inform Syst 2022;35:100776. https://doi.org/10.1016/j.suscom.2022.100776.Search in Google Scholar
20. Li, B, Dong, X, Mi, J, Wang, Y, Wang, L, Chen, W. Flimm: foreground traffic aware data migration manager for distributed storage system. Future Gener Comput Syst 2024;160:140–53. https://doi.org/10.1016/j.future.2024.05.052.Search in Google Scholar
21. Song, Y, Zhao, W, Wang, B. BPR: an erasure coding batch parallel repair approach in distributed storage systems. IEEE Access 2023;11:44509–18. https://doi.org/10.1109/access.2023.3257404.Search in Google Scholar
22. Zhou, Y, Wang, J, Li, Y, Wei, C. A collaborative management strategy for multi-objective optimization of sustainable distributed energy system considering cloud energy storage. Energy 2023;280:128183. https://doi.org/10.1016/j.energy.2023.128183.Search in Google Scholar
23. Yang, L, He, W, Qiang, X, Zheng, J, Huang, F. Research on remote sensing image storage management and a fast visualization system based on cloud computing technology. Multimed Tool Appl 2024;83:59861–86. https://doi.org/10.1007/s11042-023-17858-6.Search in Google Scholar
24. Tang, Q, Hu, M, Bian, Y, Wang, Y, Lei, Z, Peng, X, et al.. Optimal energy efficiency control framework for distributed drive mining truck power system with hybrid energy storage: a vehicle-cloud integration approach. Appl Energy 2024;374:123989. https://doi.org/10.1016/j.apenergy.2024.123989.Search in Google Scholar
25. Li, Z, Huang, J, Wang, S, Wang, J. Achieving low latency for multipath transmission in RDMA based data center network. IEEE Trans.Cloud Comput 2024;12:337–46. https://doi.org/10.1109/tcc.2024.3365075.Search in Google Scholar
26. Shen, D, Luo, J, Dong, F, Guo, X, Chen, C, Wang, K, et al.. Enabling distributed and optimal RDMA resource sharing in large-scale data center networks: modeling, analysis, and implementation. IEEE/ACM Trans Netw 2023;31:2745–60. https://doi.org/10.1109/tnet.2023.3263562.Search in Google Scholar
27. Liao, Y, Wu, J, Lu, W, Li, X, Yan, G. DPU-Direct: unleashing remote accelerators via enhanced RDMA for disaggregated datacenters. IEEE Trans Comput 2024;73:2081–95. https://doi.org/10.1109/tc.2024.3404089.Search in Google Scholar
28. Geng, L, Wang, H, Meng, J, Fan, D, Ben-Romdhane, S, Pichumani, HK, et al.. RR-Compound: RDMA-fused gRPC for low latency, high throughput, and easy interface. IEEE Trans Parallel Distr Syst 2024;35:1488–505. https://doi.org/10.1109/tpds.2024.3404394.Search in Google Scholar
29. He, X, Liang, F, Fan, W, Wang, J, Han, L, Xiao, F, et al.. Accurate and fast congestion feedback in MEC-enabled RDMA datacenters. J Cloud Comput 2024;13:72. https://doi.org/10.1186/s13677-024-00642-8.Search in Google Scholar
30. Allur, NS. Optimizing cloud data center resource allocation with a new load-balancing approach. Int J Inf Technol Comput Eng 2021;9:188–201.Search in Google Scholar
31. Ganesan, T, Devarajan, MV, Yallamelli, ARG, Mamidala, V, Yalla, RKMK, V, KR. Adaptive load balancing algorithm for optimized resource allocation in cloud data centres. Int J Recent Adv Multidiscip Res 2025;12:10929–32. 12 (2025), 10929–10932.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston