Abstract
In today’s push toward smarter and safer cities, having a reliable system for nighttime surveillance has become increasingly important. To address this, our study presents an advanced monitoring setup that uses multiple cameras positioned in different directions to act as perceptual sensors. These cameras work together to provide better situational awareness and enhance public safety, especially during low-light conditions. At the core of this system is the YOLOv5 model, which we have fine-tuned specifically for real-time object detection in challenging urban environments at night. With over 50 security cameras deployed in a distributed network, there is a need for fast and efficient data transmission. To handle this, we use a high-speed optical communication backbone that ensures smooth, low-latency streaming of video footage to a centralized processing unit. This optical network setup allows the system to respond quickly and process visual data effectively, even when under heavy load. In our experiments, a multi-scale version of the YOLOv5 model (YOLOv5MS) achieved a mean average precision (mAP) of 88.7 %, proving its capability to perform well under real-world conditions. By combining cutting-edge computer vision with high-speed optical transmission, this system demonstrates a major step forward in building intelligent, responsive surveillance solutions for modern cities.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: Not applicable.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: Not applicable.
-
Data availability: Not applicable.
References
1. Li, FC, Gupta, A, Sanocki, E, He, L-w., Rui, Y. Browsing digital video.; 2000. p. 169–76.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/332040.332425Search in Google Scholar
2. Jawaheer, G; Weller, P; Kostkova, P Modeling user preferences in recommender systems: a classification framework for explicit and implicit user feedback. ACM Transactions on Interactive Intelligent Systems Vol. 4, Article No. 8, https://doi.org/10.1145/2512208, 2014Search in Google Scholar
3. Zhang, X, Furnas, GW. mCVEs: using crossscale collaboration to support user interaction with multiscale structures. Presence 2005;14:31–46. https://doi.org/10.1162/1054746053890288.Search in Google Scholar
4. McCarthy, JF, Meidel, ES. ActiveMap: a visualization tool for location awareness to support informal interactions. In: Gellersen, H-W, editor. Lecture notes in computer science. Karlsruhe, Germany: Springer Berlin Heidelberg; 2000, vol 1707:158–70 pp.10.1007/3-540-48157-5_16Search in Google Scholar
5. Nesbitt, KV. Getting to more abstract places using the metro map metaphor.; 2004. p. 488–93.Proceedings of the 8th International Conference on Information Visualisation10.1109/IV.2004.1320189Search in Google Scholar
6. Gansner, ER, Hu, Y, Kobourov, S. GMap: visualizing graphs and clusters as maps.; 2010. p. 201–8.Proceedings of IEEE Pacific Visualization Symposium10.1109/PACIFICVIS.2010.5429590Search in Google Scholar
7. Ullah, W, Hussain, T, Baik, SW. Vision transformer attention with multi-reservoir echo state network for anomaly recognition. Inf Process Manag 2023;60:103289. https://doi.org/10.1016/j.ipm.2023.103289.Search in Google Scholar
8. Szegedy, C, Vanhoucke, V, Ioffe, S, Shlens, J, Wojna, Z. Rethinking the inception architecture for computer vision.; 2016. p. 2818–26.IEEE conference on computer vision and pattern recognition (CVPR)10.1109/CVPR.2016.308Search in Google Scholar
9. Ullah, W, Ullah, FUM, Khan, ZA, Baik, SW. Sequential attention mechanism for weakly supervised video anomaly detection. Expert Syst Appl 2023;230:120599. https://doi.org/10.1016/j.eswa.2023.120599.Search in Google Scholar
10. Salazar González, JL, Álvarez-García, JA, Rendón-Segador, FJ, Carrara, F. Conditioned cooperative training for semi-supervised weapon detection. Neural Netw 2023;167:489–501. https://doi.org/10.1016/j.neunet.2023.08.043.Search in Google Scholar PubMed
11. Jegham, I, Ben Khalifa, A, Alouani, I, Mahjoub, MA. Visionbased human action recognition: an overview and real world challenges. Forensic Sci. Int., Digit. Invest. 2020;32:200901. https://doi.org/10.1016/j.fsidi.2019.200901.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston