Abstract
Line segment detection can offer essential technical assistance for various visual tasks, aiding computer systems in comprehending image content more effectively and executing advanced analysis and applications. Focusing on the fact that most current line segment detection methods predict the center and the displacement maps of two endpoints of a line segment separately without exploiting the coupling between them, this article proposed a dynamic label assignment strategy and an improved deformable convolution for center prediction using displacement priors, which enhances the model’s line segment sensing capability and effectively improves the detection performance. The predicted displacements are used as a priori information to guide the centroid label assignment and deformable convolution sampling of center branches, which significantly improves the performance of centroid prediction. In addition, HRNet and UNet3+ are introduced to enhance the feature expression capability of the backbone network. Finally, experiments show that the
1 Introduction
Line segments are important visual features in images that can provide basic information for more advanced visual tasks, such as object detection, image segmentation, pose estimation, and 3D reconstruction. In object detection [1–3], line segment detection can be used as a preprocessing step to extract the structural information of the image, enhance features such as edges and contours of the object, and reduce noise interference. It can also serve as a post-processing step to optimize the position and shape of the detection results, improve detection accuracy, and assist in identification and object localization. In image segmentation [4,5], line segment detection can refine rough rectangular boxes into line segments with semantic and perceptual meanings. This process further helps determine which object the pixels belong to, enhancing accuracy and more precisely delineating different regions. In 3D reconstruction [6–8], line segment detection can be used to recognize structural information, such as edges of buildings and road contours in an image, and thus estimate the shape, size, and relative position of an object, aiding in the reconstruction of a 3D scene. In addition, as one of the most basic geometric features in image structure, line segments have important applications in fields such as pose estimation [9–11], template matching [12], and vanishing point estimation [13].
According to the way of generating line segments, the current deep line segment detection methods can be categorized into several groups, including endpoint matching, attractive field, center point, and transformer [14]. Deep wireframe parser (DWP) [15] and end-to-end wireframe parsing (L-CNN) [16] are two highly representative methods that utilize endpoint matching. The fundamental concept involves detecting the endpoints in the image and subsequently matching these endpoints to form the line segments. Holistically-attracted wireframe parser (HAWP) [17] represents each pixel in the image as a vector of the attraction field of its nearest line segment. It then predicts the line segment based on the attraction field. The predicted line segment is retained only if the two endpoints of the predicted line segment meet the Euclidean distance threshold condition with the corresponding endpoints in the endpoint heatmap. The transformer-based line segment detection method LinE segment transformers (LETR) [18] uses a multi-scale encoder–decoder structure to iteratively enhance a set number of pre-defined line entities and produces an equal number of final line entities. Subsequently, the feedforward network directly predicts the endpoint coordinates and confidence of the line segments for each line entity.
The line segment detection methods using center points are similar to the CenterNet [19]. Tri-points based line segment detector (TP-LSD) [20] uses tri-points for the first time to predict line segments, which are represented by centers and displacements. It avoids the extensive computation caused by endpoint matching and the intricate representations of the attraction field, significantly enhancing the speed and efficiency of line segment detection. F-Clip [21] and efficient line segment detector and descriptor (ELSD) [22] generate line segments by predicting the angles and lengths of line segments. They outperform other line segment detection methods, including LETR, in terms of performance, and achieve better results in efficiency. Existing deep learning-based methods tend to use larger network models to improve detection performance, which restricts their applicability in real-time environments. In pursuit of lightweight and high efficiency, mobile line segment detection (M-LSD) [23] eliminates the multi-module prediction process in traditional methods by minimizing the backbone network. This reduction significantly decreases the computational cost, enhances network inference speed, and enables real-time detection on mobile devices with superior results.
The key to improving the model’s performance in line segment detection is to enhance its ability to perceive line segments. In order to better extract line segment features, TP-LSD uses a pixel-wise line map to provide auxiliary information for the center branch, M-LSD uses dilated convolution [24] to increase the receptive field of the network, and ELSD uses deformable convolution [25] to adapt the shape of the line segment. Rehman et al. [26] implement a modified resilient backpropagation algorithm to improve the convergence and efficiency of convolutional neural network training, Wang et al. [27] provide a comprehensive survey of the relationship between ConvNet with different pre-trained learning methodologies and its optimization effects. These inspired us that in addition to the network structure, loss function, etc., preprocessing, backpropagation, and other factors that are often unnoticed can also have a large impact on the performance of a model. In this article, we utilize displacements as a priori information, relying on the coupling between the center and the displacements of the line segment. Predicted displacements are used to dynamically assign labels to centers and guide the sampling of deformable convolution for the center branch.
The main contributions of this article are as follows:
The sampling of deformable convolution for the center branch is guided by the predicted displacements, which enhances the model’s perception of line segment features.
The labels of the centers during training are dynamically adjusted based on the prediction results of the displacements. This adjustment allows the model to concentrate more on learning the challenging-to-score center samples.
2 Related works
2.1 Deep line segment detection
M-LSD [23] uses tri-points to predict line segments, producing superior results in model parameters, network complexity, efficiency, and precision. Therefore, this article aims to dive deeper into line segment detection using M-LSD. The backbone of M-LSD is an encoder–decoder structure, the MobileNetV2 [30] network is used as the encoder structure of the backbone network, and the decoder is designed as a top-down architecture.
As shown in Figure 1, M-LSD outputs 16 feature maps, including center maps, displacement maps, angle maps, length maps, line map, and endpoint map. Each feature map corresponds to a labeled map of the same resolution. To address the limited detection performance of the model for long line segments caused by an insufficient network receptive field, M-LSD uses the segments of line (SoL) augmentation. The specific approach involves splitting a long line segment into several shorter line segments that are treated as a separate entity for prediction. The SoL maps do not play a direct role in the final line segment prediction; instead, they offer auxiliary information to aid in the prediction of line segments.

Overall architecture of M-LSD-tiny.
As shown in Figure 2, the center map is used to predict the centers of line segments, while the displacement maps are used to predict the displacements from the centers to the endpoints of the line segments:
where

Tri-points representation of line segments.
The centroid-based representation avoids the significant computation required by direct endpoint matching and greatly enhances the detection efficiency of line segments. Compared with representing line segments based on length and angle, the tri-points representation of line segments can make the model more sensitive to angles, particularly for long line segments. In this article, the tri-points representation will be utilized to predict line segments.
2.2 Deformable convolution
In visual tasks, the learning process essentially involves adjusting the model parameters to adapt to the size, shape, and posture of the object. Vanilla convolutional kernels sample the feature map at a fixed position and apply the same weight to each position to process the feature. However, they lack the processing ability for objects with irregular shapes, making it difficult to accurately capture the shape and position of the object. Since different positions in the feature map may correspond to objects of various sizes and shapes, convolution is necessary to adaptively adjust the sampling position and receptive field size based on different objects for visual tasks that demand high precision. In 2017, Dai et al. [25] proposed deformable convolution to enhance the ability to express deformable characteristics in processing objects. The spatial deformation was introduced to allow the convolution kernel to generate positional deviations when sampling from the input feature maps.
In 2D convolution, sampling occurs at a fixed position
where
As shown in Figure 3, deformable convolution introduces a set of offsets
Deformable convolution adjusts the sampling position of the convolution based on the object’s shape. This adjustment ensures the validity and accuracy of the features extracted by the convolution kernel, allowing the network to model the object more precisely and enhancing its robustness to shape variations.

Illustration of
2.3 Datasets
In this article, the model will be trained and tested using the Wireframe dataset [15] and the YorkUrban dataset [6], 5,000 images from the Wireframe dataset, while the test set will include 462 images from the Wireframe dataset and 102 images from YorkUrban. In the training process, we perform the following dataset augmentations: (1) unchanged; (2) horizontal flip, vertical flip, or horizontal and vertical direction flip simultaneously; (3) rotate 90 degrees clockwise or counterclockwise; (4) randomly crop the image, and then resize it to a resolution of
3 Displacement line segment detection (D-LSD) for line segment detection
In this section, we present the details of line segment detection using displacement prior (D-LSD). First, HRNet and UNet3+ are used as the backbone network. In the prediction head, the predicted displacements are used as prior information to guide the sampling of the deformable convolution for the center branch. Additionally, the predicted displacements are also used to dynamically adjust the label assignment on the center map.
3.1 Overall network architecture
As shown in Figure 4, our proposed network is one-stage and consists of a backbone and a prediction head. The backbone takes an image of size

Overall architecture of D-LSD.
In the final feature maps, displacement maps predict the displacements of line segments, the center map predicts the centers of line segments, the junction map and line map predict junctions, and the pixel-wise map predicts the line segments’ details. During training, displacements are regressed using smooth L1 loss, while centers, junctions, and pixels are trained using focal loss.
3.2 Backbone network
HRNet is a deep learning network architecture used for human pose estimation. It has also been employed as a backbone network for line segment detection, demonstrating promising results. The decoder of UNet3+ fully integrates high-level semantic features and low-level semantic features in the full-size feature map through full-size skip connections, allowing information to be more comprehensively transmitted within the network and effectively enhancing model performance. As shown in Figure 5, our backbone network features an encoder–decoder structure. To enhance the network’s ability to extract line segment features, we utilize HRNet as the encoder of the backbone network. In order to fully integrate features at different levels, capture detailed information at various scales, and enhance the network’s capability to represent features such as fine-grained structures and edges in the image, the decoder of UNet3+ is selected as the decoder of the backbone network.

Overall architecture of the backbone network.
The decoder network takes an image of size
3.3 Deformable convolution using displacement prior
In this section, the predicted displacements are used as prior information to guide the sampling position of the deformable convolution in the center branch. As shown in Figure 6, the offsets of the deformable convolution for the center branch do not need to be obtained by convolution; instead, they are entirely derived from the prediction results of the displacement maps.

Center branch using displacement prior.
Let the shared feature be
The offset of the deformable convolution can be expressed as follows:
The value at
The offsets of the deformable convolution using the displacements are illustrated in Figure 7, where all the sampling points are distributed between the center and the endpoints.

Deformable convolution using prior displacements.
3.4 Dynamic label assignment for the center of line segment
In M-LSD, the centers of line segments are predicted by working with a feature map of size
where

Center label of line segment: (a) is the static center label assignment strategy used by M-LSD: “1” represents a positive sample and “0” represents a negative sample, respectively; (b) is our dynamic center label assignment strategy: “X” means ignore sample, “?” represents the candidate negative sample based on the its displacement prediction: (a) center label in M-LSD and (b) dynamic center label.
The displacement maps predict the vectors from the centers to the endpoints, and the final predicted line segment can be obtained from the center map and the displacement maps:
where
The line segment is predicted by both the center and the displacement. During the training process, the proposed center label assignment strategy dynamically assigns whether a candidate center negative sample is classified as a negative sample or an ignore sample according to its segmentation results. For the candidate negative sample, based on its displacement, if the line segment result is accurate, it should be classified as an ignored sample. The accuracy of the line segment is calculated based on the distance between the predicted endpoints and the label endpoints:
where
If
| Algorithm 1. Dynamic label assignment for the centers of line segments |
|---|
|
Require Set of ground truth
|
|
Ensure Label map
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| end while |
| In the above process, the label equal to 1 indicates positive sample, equal to 0 indicates negative sample, and equal to
|
Furthermore, the other pixels surrounding the centers no longer share the displacements of the centers. Instead, they are computed separately for each of their respective displacements to the endpoints of the line segments. This enhancement significantly improves the accuracy of the displacements.
The dynamic label assignment strategy enables the network to concentrate more on learning to classify the center and the background without allocating excessive attention to the regions surrounding the centers, which have less impact on the results. This approach allows the network to demonstrate improved performance.
4 Experiments
In this section, we will validate the effectiveness of the proposed methods through experiments, all of which are conducted on the Wireframe dataset and the YorkUrban dataset.
4.1 Ablation study
In order to verify the effectiveness of the dynamic label assignment strategy and the backbone network proposed in Section 3, the following ablation experiments will be conducted using M-LSD-tiny as a benchmark. All experiments will be carried out on NVIDIA GTX 3080Ti GPUs using the PyTorch framework. The model parameter settings and optimization will be consistent with those of M-LSD-tiny.
As shown in Table 1, we use M-LSD-tiny as baseline for the ablation experiments. Replacing the label assignment strategy in baseline with our dynamic label assignment strategy, the
Ablation study of dynamic label and backbone
| Wireframe | YorkUrban | ||||||
|---|---|---|---|---|---|---|---|
| Dynamic | Backbone |
|
|
|
|
|
|
| 77.2 | 52.3 | 58.0 | 62.4 | 22.1 | 25.0 | ||
|
|
77.9 | 52.8 | 58.7 | 64.1 | 22.8 | 25.5 | |
|
|
79.7 | 61.7 | 66.6 | 65.0 | 25.4 | 27.8 | |
|
|
|
80.0 | 61.9 | 66.9 | 65.1 | 25.8 | 28.2 |
We use M-LSD-tiny as baseline, Dis-Deform denotes replacing Block C with the prediction head shown in Section 3.3, and Deform denotes changing the deformable convolution using displacement prior in the prediction head of Section 3.3 to normal deformable convolution. As shown in Table 2, using the prediction head proposed in this article, the
Ablation study of deformable convolution using displacement prior
| Wireframe | YorkUrban | |||||
|---|---|---|---|---|---|---|
| Model |
|
|
|
|
|
|
| Baseline | 77.2 | 52.3 | 58.0 | 62.4 | 22.1 | 25.0 |
| Deform | 78.6 | 57.4 | 62.5 | 63.8 | 23.9 | 26.5 |
| Dis-Deform | 78.8 | 59.4 | 64.3 | 64.7 | 25.1 | 27.8 |
Then, simultaneously using the dynamic label assignment strategy, prediction head and backbone network proposed in this article, the
In addition, two non-maximal suppression methods, SoftNMS and StructNMS, proposed by F-Clip [21] are applied to the post-processing algorithm of our proposed algorithm, and the results is shown in Table 3. Eventually, the proposed model achieves 81.2, 65.5, and 69.5 for
Ablation study of SoftNMS and StructNMS
| Wireframe | YorkUrban | ||||||
|---|---|---|---|---|---|---|---|
| SoftNMS | StructNMS |
|
|
|
|
|
|
| 81.0 | 64.4 | 68.7 | 65.2 | 27.8 | 30.4 | ||
|
|
81.1 | 65.1 | 69.1 | 65.3 | 29.2 | 31.9 | |
|
|
81.2 | 64.7 | 68.9 | 65.3 | 28.6 | 31.4 | |
|
|
|
81.2 | 65.5 | 69.5 | 65.3 | 29.5 | 32.2 |
4.2 Comparison with other methods
The results of comparing our proposed line segment detection algorithm with other top-performing line segment detection algorithms such as HAWP, L-CNN, and F-Clip on Wireframe dataset and YorkUrban dataset are shown in Table 4. On the Wireframe dataset,
Quantitative comparisons with existing LSD methods
| Wireframe | YorkUrban | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | Input size |
|
|
|
|
|
|
|
|
|
| LSD | 320 | 64.1 | 6.7 | 8.8 | — | 60.6 | 7.5 | 9.2 | — | |
| Two-stage | HAWP | 512 | 80.3 | 62.5 | 66.5 | 68.2 | 64.8 | 26.1 | 28.5 | 29.7 |
| L-CNN | 512 | 76.9 | 58.9 | 62.8 | 64.9 | 63.8 | 24.3 | 26.4 | 27.5 | |
| ELSD | 512 | 83.1 | 64.3 | 68.9 | 70.9 | 64.8 | 27.6 | 30.2 | 31.8 | |
| One-stage | DWP | 512 | 72.2 | 3.7 | 5.1 | 5.9 | 61.6 | 1.5 | 2.1 | 2.6 |
| AFM | 320 | 77.2 | 18.5 | 24.4 | 27.5 | 63.3 | 7.3 | 9.4 | 11.1 | |
| LETR | 512 | 83.3 | — | 65.2 | 67.7 | 66.6 | — | 29.4 | 31.7 | |
| TP-LSD | 512 | 80.6 | 57.6 | 57.2 | — | 67.2 | 27.6 | 27.7 | — | |
| M-LSD-tiny | 512 | 77.2 | 52.3 | 58.0 | — | 62.4 | 22.1 | 25.0 | — | |
| F-Clip | 512 | 80.9 | 64.3 | 68.3 | 70.1 | 64.5 | 28.5 | 30.8 | 31.3 | |
| Ours | 512 | 81.2 | 65.6 | 69.5 | 71.1 | 65.3 | 29.5 | 32.2 | 33.8 | |
Bold values highlight the best-performing method for each evaluation metric.
Among the existing methods for line segment detection, two-stage methods demonstrate better performance. The method described in this article utilizes displacements as prior information to guide the sampling process of deformable convolutions for the center branch and the label assignment of centers. Finally, our method achieves better detection performance with a one-stage structure than the existing two-stage methods.
Comparison of detection results with L-CNN, HAWP, and F-Clip on the Wireframe dataset and the YorkUrban dataset are shown in Figure 9, respectively. Both L-CNN and HAWP rely on endpoint detection and sampling of line segment features. However, they face challenges with connectivity points and texture variations, and F-Clip predicts line segments based on angle and length, making it more sensitive to angle. This sensitivity limits its accuracy in detecting long line segments. In contrast, the method proposed in this chapter has higher detection accuracy, check-all rate, and overall performance.

Visualization of line segment detection methods on Wireframe dataset and YorkUrban dataset. (a) Label, (b) L-CNN, (c) HAWP, (d) F-Clip, and (e) Ours.
5 Conclusion
This article proposes a one-stage line segment detection method using displacement priors; line segments are predicted by centers and displacements. To enhance the accuracy of the centers, we utilize displacements as prior information to dynamically adjust the allocation of samples to the centers and to direct the sampling of deformable convolution in the center branch. In addition, HRNet and UNet3+ have been implemented to enhance the backbone network, resulting in
In the future, we will further investigate the dynamic label assignment strategy for other feature maps and use line segment angles and lengths as priors to further enhance the line segment perception capability of the model.
-
Funding information: Authors state no funding involved.
-
Author contributions: Xin Zhu: conceptualization, methodology, software, writing. Hancheng Yu: conceptualization, methodology, supervision, funding acquisition. Yupu Zhang: visualization, validation, writing. Ming Zhou: visualization, validation, writing.
-
Conflict of interest: Authors state no conflict of interest.
-
Data availability statement: The data that support the findings of this study are openly available in the following repositories: The Wireframe dataset is available at https://github.com/huangkuns/wireframe, under a permissive academic license. The YorkUrban dataset is available at https://github.com/NamgyuCho/Linelet-code-and-YorkUrban-LineSegment-DB, also under a permissive academic license. These datasets were used for training and evaluating the proposed method.
References
[1] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2014, pp. 580–587. 10.1109/CVPR.2014.81Search in Google Scholar
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779–788. 10.1109/CVPR.2016.91Search in Google Scholar
[3] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, et al., “Ssd: Single shot multibox detector,” In: Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 21–37. 10.1007/978-3-319-46448-0_2Search in Google Scholar
[4] M. Bai, and R. Urtasun, “Deep watershed transform for instance segmentation,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 5221–5229. 10.1109/CVPR.2017.305Search in Google Scholar
[5] S. Peng, W. Jiang, H. Pi, H. Bao, and X. Zhou, “Deep snake for real-time instance segmentation,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 8533–8542. 10.1109/CVPR42600.2020.00856Search in Google Scholar
[6] P. Denis, J. H. Elder, and F. J. Estrada, “Efficient edge-based methods for estimating manhattan frames in urban imagery,” In: Proc. Eur. Conf. Comput. Vis. (ECCV), 2008, pp. 197–210. 10.1007/978-3-540-88688-4_15Search in Google Scholar
[7] Y. Zhou, J. Huang, X. Dai, L. Luo, Z. Chen, and Y. Ma, HoliCity: A city-scale data platform for learning holistic 3D structures, 2020, arXiv:2008.03286. Search in Google Scholar
[8] Y. Zhou, H. Qi, Y. Zhai, Q. Sun, Z. Chen, and L. Y. Wei, “Learning to reconstruct 3d manhattan wireframes from a single image,” In: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 7698–7707. 10.1109/ICCV.2019.00779Search in Google Scholar
[9] B. Přibyl, P. Zemçík, and M. Čadík, “Absolute pose estimation from line correspondences using direct linear transformation,” Comput vis Image UND., vol. 161, pp. 130–144, Aug. 2017. 10.1016/j.cviu.2017.05.002Search in Google Scholar
[10] C. Xu, L. Zhang, L. Cheng, and R. Koch, “Pose estimation from line correspondences: A complete analysis and a series of solutions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, pp. 1209–1222, 2016. 10.1109/TPAMI.2016.2582162Search in Google Scholar PubMed
[11] A. Elqursh, and A. Elgammal, “Line-based relative pose estimation,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2011, pp. 3049–3056. 10.1109/CVPR.2011.5995512Search in Google Scholar
[12] N. Xue, G. S. Xia, X. Bai, L. Zhang, and W. Shen, “Anisotropic-scale junction detection and matching for indoor images,” IEEE Trans. Image Process., vol. 27, pp. 78–91, 2017. 10.1109/TIP.2017.2754945Search in Google Scholar PubMed
[13] Y. Zhou, H. Qi, J. Huang, and Y. Ma, “Neurvps: Neural vanishing point scanning via conic convolution,” In: Proc. Adv. Neural Inform. Process. Syst. (NeurIPS), vol. 32, 2019. Search in Google Scholar
[14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, and A N Gomez, et al., “Attention is all you need,” In: Proc. Adv. Neural Inform. Process. Syst. (NeurIPS), vol. 30, 2017, pp. 6000–6010. Search in Google Scholar
[15] K. Huang, Y. Wang, Z. Zhou, T. Ding, S. Gao, and Y. Ma, “Learning to parse wireframes in images of man-made environments,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 626–635. 10.1109/CVPR.2018.00072Search in Google Scholar
[16] Y. Zhou, H. Qi, and Y. Ma, “End-to-end wireframe parsing,” In: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 962–971. 10.1109/ICCV.2019.00105Search in Google Scholar
[17] N. Xue, T. Wu, S. Bai, F. Wang, G. S. Xia, and L. Zhang, et al., “Holistically-attracted wireframe parsing,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 2788–2797. 10.1109/CVPR42600.2020.00286Search in Google Scholar
[18] Y. Xu, W. Xu, D. Cheung, and Z. Tu, “Line segment detection using transformers without edges,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 4257–4266. 10.1109/CVPR46437.2021.00424Search in Google Scholar
[19] X. Zhou, D. Wang, and P. Krähenbühl, Objects as points, 2019, arXiv:1904.07850. 10.1007/978-3-030-58548-8_28Search in Google Scholar
[20] S. Huang, F. Qin, P. Xiong, N. Ding, Y. He, and X. Liu, “TP-LSD: Tri-points based line segment detector,” In: Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 770–785. 10.1007/978-3-030-58583-9_46Search in Google Scholar
[21] X. Dai, H. Gong, S. Wu, X. Yuan, and Y. Ma, “Fully convolutional line parsing,” Neurocomputing., vol. 506, pp. 1–11, 2022. 10.1016/j.neucom.2022.07.026Search in Google Scholar
[22] H. Zhang, Y. Luo, F. Qin, Y. He, and X. Liu, “ELSD: Efficient line segment detector and descriptor,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 2969–2978. 10.1109/ICCV48922.2021.00296Search in Google Scholar
[23] G. Gu, B. Ko, S. H. Go, S. H. Lee, J. Lee, and M. Shin, “Towards light-weight and real-time line segment detection,” In: Proc. AAAI Conf. Artif. Intell., 2022, pp. 726–734. 10.1609/aaai.v36i1.19953Search in Google Scholar
[24] F. Yu, and V. Koltun, Multi-scale context aggregation by dilated convolutions, 2015, arXiv:1511. 07122. Search in Google Scholar
[25] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, and H. Hu, “Deformable convolutional networks,” In: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2017, pp. 764–773. 10.1109/ICCV.2017.89Search in Google Scholar
[26] S. U. Rehman, S. Tu, O. U. Rehman, Y. Huang, C. M. S. Magurawalage, and C. C. Chang, “Optimization of CNN through novel training strategy for visual classification problems,” Entropy., vol. 20, 2018, id. 290. 10.3390/e20040290Search in Google Scholar PubMed PubMed Central
[27] S. U. Rehman, S. Tu, M. Waqas, Y. F. Huang, O. U. Rehman, and B. Ahmad, et al., “Unsupervised pre-trained filter learning approach for efficient convolution neural network,” Neurocomputing, vol. 365, pp. 171–190, 2019. 10.1016/j.neucom.2019.06.084Search in Google Scholar
[28] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, and Y. Zhao, et al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, pp. 3349–3364, 2020. 10.1109/TPAMI.2020.2983686Search in Google Scholar PubMed
[29] H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, and Y. Iwamoto, et al., “Unet 3.: A full-scale connected unet for medical image segmentation,” In:Proc. ICASSP., 2020, pp. 1055–1059. 10.1109/ICASSP40776.2020.9053405Search in Google Scholar
[30] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510–4520. 10.1109/CVPR.2018.00474Search in Google Scholar
© 2025 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Research Articles
- Intelligent data collection algorithm research for WSNs
- A novel behavioral health care dataset creation from multiple drug review datasets and drugs prescription using EDA
- Speech emotion recognition using long-term average spectrum
- PLASMA-Privacy-Preserved Lightweight and Secure Multi-level Authentication scheme for IoMT-based smart healthcare
- Blended teaching design of UMU interactive learning platform for cultivating students’ cultural literacy
- Basketball action recognition by fusing video recognition techniques with an SSD target detection algorithm
- Evaluating impact of different factors on electric vehicle charging demand
- An in-depth exploration of supervised and semi-supervised learning on face recognition
- The reform of the teaching mode of aesthetic education for university students based on digital media technology
- QCI-WSC: Estimation and prediction of QoS confidence interval for web service composition based on Bootstrap
- Line segment using displacement prior
- 3D reconstruction study of motion blur non-coded targets based on the iterative relaxation method
- Overcoming the cold-start challenge in recommender systems: A novel two-stage framework
- Optimization of multi-objective recognition based on video tracking technology
- An ADMM-based heuristic algorithm for optimization problems over nonconvex second-order cone
- A multiscale and dual-loss network for pulmonary nodule classification
- Artificial intelligence enabled microgrid power generation prediction
- Special Issue on Informatics 2024
- Analysis of different IDS-based machine learning models for secure data transmission in IoT networks
- Using artificial intelligence tools for level of service classifications within the smart city concept
- Applying metaheuristic methods for staffing in railway depots
- Interacting with vector databases by means of domain-specific language
- Data analysis for efficient dynamic IoT task scheduling in a simulated edge cloud environment
- Analysis of the resilience of open source smart home platforms to DDoS attacks
- Comparison of various in-order iterator implementations in C++
Articles in the same Issue
- Research Articles
- Intelligent data collection algorithm research for WSNs
- A novel behavioral health care dataset creation from multiple drug review datasets and drugs prescription using EDA
- Speech emotion recognition using long-term average spectrum
- PLASMA-Privacy-Preserved Lightweight and Secure Multi-level Authentication scheme for IoMT-based smart healthcare
- Blended teaching design of UMU interactive learning platform for cultivating students’ cultural literacy
- Basketball action recognition by fusing video recognition techniques with an SSD target detection algorithm
- Evaluating impact of different factors on electric vehicle charging demand
- An in-depth exploration of supervised and semi-supervised learning on face recognition
- The reform of the teaching mode of aesthetic education for university students based on digital media technology
- QCI-WSC: Estimation and prediction of QoS confidence interval for web service composition based on Bootstrap
- Line segment using displacement prior
- 3D reconstruction study of motion blur non-coded targets based on the iterative relaxation method
- Overcoming the cold-start challenge in recommender systems: A novel two-stage framework
- Optimization of multi-objective recognition based on video tracking technology
- An ADMM-based heuristic algorithm for optimization problems over nonconvex second-order cone
- A multiscale and dual-loss network for pulmonary nodule classification
- Artificial intelligence enabled microgrid power generation prediction
- Special Issue on Informatics 2024
- Analysis of different IDS-based machine learning models for secure data transmission in IoT networks
- Using artificial intelligence tools for level of service classifications within the smart city concept
- Applying metaheuristic methods for staffing in railway depots
- Interacting with vector databases by means of domain-specific language
- Data analysis for efficient dynamic IoT task scheduling in a simulated edge cloud environment
- Analysis of the resilience of open source smart home platforms to DDoS attacks
- Comparison of various in-order iterator implementations in C++