Spatial objects classification using machine learning and spatial walk algorithm

Iwona Kaczmarek

doi:10.1515/geo-2022-0542

Artikel Open Access

Spatial objects classification using machine learning and spatial walk algorithm

Iwona Kaczmarek

Veröffentlicht/Copyright: 25. September 2023

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Open Geosciences Band 15 Heft 1

Abstract

This article presents a novel method for classifying spatial objects by learning node representations via a spatial walk algorithm. The findings show that considering both the attributes of objects and their topological relationships enables more efficient and precise spatial objects’ classification than methods that only consider the objects’ characteristics. The method emphasizes the importance of spatial dependencies in learning representations for spatial data. A distinctive feature of the method is its focus on local analysis of the neighborhood structure of the node under investigation. The spatial walk algorithm offers a defined path generation scheme, facilitating a deeper understanding of local spatial dependencies between objects. This approach provides a more accurate representation of the essential relationships between spatial objects than random path generation and enhances the classification results, as demonstrated in three different classification scenarios. The method proves particularly effective in the context of spatial objects, where proximity and a limited number of neighbors play a significant role. This is exemplified in the classification of planning areas in spatial development plans.

Keywords: random walk; graphs; neural networks; representation learning; geospatial data

1 Introduction

Spatial relationships between spatial features play a significant role in geospatial analysis. Neighborhood information allows for the study of spatial dependencies and patterns of object distribution on the Earth’s surface. Tobler’s law, also known as the first law of geography, emphasizes that objects are interconnected, and the strength of these connections decreases as the distance between them increases [1]. This assumption forms the basis for spatial autocorrelation analysis, which allows one to examine the mutual relationships between variable values at different locations.

The methods of neighborhood study in geospatial analysis vary depending on the data model (raster or vector) and the objectives of the analysis. In the case of raster data, the neighborhood can be analyzed by applying different raster processing methods, such as local, regional, or global analysis. In local analysis, each raster cell is analyzed in the context of its immediate neighbors, whereas, in regional analysis, larger areas are examined. Meanwhile, global analysis allows one to examine the dependencies between cells throughout the entire area of analysis.

In contrast, graphs can be used to analyze the neighborhood of spatial objects in vector form. The vertices of the graph represent spatial objects, and the edges connecting these vertices describe the spatial relations between them. Graph neighborhood analysis methods include algorithms such as graph traversal, clustering, or finding the shortest route.

New possibilities arise for neighborhood examination in machine learning. In the case of raster data, such as images, the neighborhood is used in convolution for a machine-learning classification task to consider local dependencies between pixels. Convolution is applied to detect features at various levels of abstraction using filters learned by the neural network.

However, when analyses are not based on the raster format and spatial data are represented in a vector form (e.g., cadastral parcels) that are not arranged in a regular pixel grid, the application of convolutional filters becomes more complicated. This scenario calls for the use of graph neural networks (GNNs), which suit vector data analysis related to spatial objects.

GNNs are a type of neural network that uses graphs as a data structure to represent relationships between objects. GNNs learn vector representations of vertices, taking into account the neighborhood structure, making them capable of capturing spatial dependencies between objects [2].

In machine learning for spatial object analysis in the vector model, the scope extends beyond object features (descriptive and geometric attributes) to embrace embeddings. Embedding involves transforming data, such as words, phrases, graph nodes, or other entities, into reduced-dimensional numerical vectors. Thanks to embeddings, words with similar meanings have similar vectors, which allows for the storage of information about the semantics and context of words. This process aims to streamline the dimension of the data, creating representations that are conducive to analysis, machine learning, and data processing. In the context of machine learning and natural language processing, word embeddings are vital for text analysis.

Furthermore, in the context of network or graph analysis, node embeddings play a crucial role. They are used to represent nodes in lower-dimensional space. Nodes sharing similarities show related vectors, which allows for in-depth insights into network structures, node clustering, and the ability to predict connections between nodes. This unique embedding representation proves to be extremely beneficial, especially in classification tasks, where it can significantly improve the performance of predictive models.

The research presented in this article is a continuation of the work previously carried out in the study [2]. In that research, we focused on the classification of spatial objects, considering information about the neighborhood of the objects. The first neighborhood representation for the purpose of spatial object classification in machine learning used an adjacency matrix. In the second method, the classification algorithm utilized a GNN. Previous studies provided significant information on the impact of neighborhood relationships on classification results. The highest model quality was recorded for GNNs. However, potential areas for further investigation were also identified.

Other studies related to the topic of future land use classification were included in our work [3], where both unsupervised and supervised learning were tested, including convolutional neural networks. However, these methods were primarily text-based and specifically designed for a distinct use case, relying on a specially prepared corpus of texts. Although effective for its intended purpose, this specificity limited the broader applicability of the method. Recognizing this limitation prompted us to seek more versatile solutions applicable to spatial data. Consequently, we embarked on a quest to identify a suitable representation and classification algorithm for machine learning applied to geospatial data.

This study aims to explore the potential of using embeddings generated through random walk techniques within machine learning to represent neighborhoods of spatial objects. Through a series of experiments, we evaluate this approach’s efficacy in spatial object classification, exploring whether graph exploration methods, like random walk, enhance classification model performance.

This article addresses two primary aspects. The first concern neighborhood representation through embeddings. We introduce a graph-based model where spatial objects are graph nodes. These nodes feature two types of embeddings: one generated from descriptive attributes (text) and another generated via random walk, aiming to incorporate graph structure and topological relationships between nodes. To explore alternative approaches, we are developing a custom spatial walk-path generation algorithm to facilitate the creation of features that describe local spatial relationships. We use it in the developed classification method, where it allows for the creation of features that describe local spatial relations for a given object.

The second aspect revolves around selecting an appropriate classifier. Our experiments encompass diverse methods of neighborhood representation via embeddings, combined with different machine-learning classification algorithms. Our developed approach employs bidirectional long short-term memory networks (BiLSTMs), treating graph paths as sentences, akin to methods in natural language processing.

2 Related works

2.1 Node embeddings in a graph

A graph is a mathematical structure that is used to represent the relationships between objects in pairs [2]. A graph consists of nodes (or vertices) along with connections, called edges, that exist between pairs of those nodes. There are many types of graphs. For example, we have undirected graphs where the edges symmetrically connect two vertices and directed graphs where the edges form an asymmetric connection between two vertices. Further data can be associated with graph elements. Such data can be attributes associated with edges or nodes.

With advances in deep learning, so-called node representation learning in graphs has become particularly important. Node representation learning in graphs, also known as node embedding learning, is a process in which we try to find vector representations (known as embeddings) for nodes in a graph. These vector representations are intended to encode information about the graph structure and relationships between nodes. The goal of these embeddings is to enable efficient execution of various types of analyses and algorithms on graphs, such as node classification, edge prediction, clustering, etc. Vector representations are particularly useful in the case of machine learning algorithms, which usually require input in the form of fixed-dimensional vectors.

Node embeddings are generated by unsupervised and supervised machine learning algorithms, such as GNNs, large-scale information network embedding (LINE) [4], DeepWalk [5], and Node2Vec [6]. The aim of these algorithms is to generate embedding vectors for nodes in a graph that preserve essential information about the graph structure. These algorithms learn vector representations of nodes that are capable of capturing dependencies, patterns, and properties in the graph. As a result, the similarity between the node embeddings can reflect the similarity between the nodes in the graph.

Node2Vec is an algorithm based on the idea of random walks, enhanced with techniques borrowed from Word2Vec [3].

Word2vec is one of the most commonly used word embedding methods. Word embedding is a key aspect in the field of natural language processing and machine learning. These techniques are aimed at converting textual data such as words, phrases, and even entire sentences into numerical vectors of reduced dimensions. This transformation not only simplifies the data representation but also captures the semantic and contextual relationships between language elements. By mapping words to numeric vectors, words with similar meanings have similar vector representations. This makes it easy to store valuable information about the semantics and context of words. Other methods to create word embeddings are GloVe [5] or FastText [7]. These methods use different strategies to capture the semantics and context of a word. For example, Word2Vec uses two basic architectures, Continuous Bag of Words and Skip-gram, to learn vector representations. GloVe, in contrast, uses global word cooccurrence statistics to capture semantic relationships between words. FastText extends this idea by considering subword information, making it particularly suitable for handling morphologically rich languages.

One of the latest and most advanced developments in word representation is bidirectional encoder representations from transformers (BERT) [8]. BERT is a language model based on transformer architecture. Unlike previous methods, BERT performs sentence context analysis in two stages, i.e., it takes two words before and, after a given word and for this reason, is able to understand the meaning of words in more complex contexts.

Another way to analyze sentences is to use transformer-based models such as sentence transformers. These models have the ability to generate meanings that refer to whole sentences. Rather than focusing solely on words, these models take entire sentences and try to capture their meaning in context. For this reason, it can be used in tasks that require the analysis of meaning and similarity in sentences, not just words.

Node2Vec operates on the principle of unsupervised learning by generating vector representations, or embeddings, for nodes in graphs. The similarity to Word2Vec lies in the use of context. Just as Word2Vec learns from the context of words in sentences, Node2Vec uses the context of nodes in random walks to learn their embeddings. Importantly, Node2Vec allows for a balance between exploring the local and global structure of the network. This means that the model can focus on both direct neighbors of a node (similar to a breadth-first search [BFS], strategy) and on distant structures (analogous to a depth-first search [DFS]), making it a very flexible tool for network analysis.

DeepWalk is an unsupervised algorithm that, similar to Node2Vec, uses the idea of random walks and methods known from Word2Vec to create embeddings for nodes in a graph. The basis for DeepWalk’s operation is the random walks, which the algorithm generates from each node in the network. These walks are treated as sentences, where the nodes are words. The algorithm uses the Word2Vec approach to learn vector representations for nodes based on their context in the generated walks. The main difference between DeepWalk and Node2Vec is that DeepWalk does not allow for controlling the exploration of the local and global network structure during the generation of random walks. As a result, DeepWalk tends to focus more on the local context of the nodes.

LINE is yet another algorithm for generating vector representations of nodes in networks, which is also an unsupervised algorithm. LINE differs from Node2Vec and DeepWalk in that, instead of using random walks to generate context, it directly optimizes the preservation of the network structure. This means that LINE strives to directly preserve both local and global network structures by minimizing a loss function defined based on the nodes’ neighborhood.

Unsupervised algorithms, such as DeepWalk and Node2Vec, rely on random walks, which allows the analysis of graph structures and relationships between nodes. Random walks are sequences of nodes that navigate through the graph according to a certain strategy, for example, with a probability proportional to the weight of the edges. In this way, both local relationships and the global network structure are analyzed. The random walk algorithm works as follows: Consider a graph G = (V,E), in which V represents the collection of nodes and E represents the set of edges. A random walk of length l is an operation that begins at a specific node v _i within V, proceeding to its adjacent nodes with each time increment. This process continues until the predetermined length l is achieved [9].

For random walks, there are two main methods: biased and unbiased. These different ways of traversing a graph have a significant impact on the generation of node embeddings. An unbiased random walk is a method in which every neighboring node has the same chance of being visited during the traversal. This method is based on the principle of selecting every node that becomes equal to obtain detailed insights into the structure of the graph. This approach enables both local and global connectivity between nodes to be analyzed.

On the contrary, a biased random walk reinforces certain preferences or rules when choosing the next node to visit. These preferences can be based on node attributes, such as their degree (number of edges leaving the node) or other relevant parameters. This method allows you to search for specific controlled aspects of the graph, such as focus a node on higher degrees, located in the network, or it can facilitate the identification of central nodes in important regions.

Both methods have advantages and disadvantages depending on the purpose of the research. An unbiased random walk provides a detailed overview of the graph structure, while a biased random walk allows a more focused search of specific features of a graph. In practice, selecting an appropriate random walk algorithm is based on the context of the research and research objectives. A well-known example of a biased random walk is the algorithm used in Node2Vec.

3 Learning representations for geospatial data

Embeddings serve as input features for classification algorithms such as support vector machines, decision trees, or neural networks. These techniques find application in various fields, including geospatial applications, for example, using GNNs to predict traffic intensity [10,11,12], the selection of the best location [13], or the classification of geosocial networks [14].

The approaches used to address the problem of learning representations for spatial data include techniques based on convolutional networks, graph networks, encoder–decoder frameworks, contrastive learning, recurrent networks, and hybrid approaches that use various types of data. These methods enable learning representations that capture both local and global patterns and take into account diverse spatial and semantic dependencies.

In the publication by Fang et al. [15], an approach to learning representations of spatial data was proposed using contrastive learning on graphs. This method is based on learning similarities and differences between various spatial data, leading to improved vector representations of spatial data.

In the Loc2Vec method [16], the authors use convolutional neural networks to create semantic embeddings for the location environment based on raster tilesets generated from OpenStreetMap data. Methods for creating embeddings for regions include techniques such as Zone2Vec [17] and mobility-based zone embedding [18]. Zone2Vec uses data such as road networks and taxi movement trajectories, while the mobility-based zone embedding learns the zone embeddings based on human mobility.

RegionEncoder, proposed by Jenkins et al. [19], is a comprehensive solution for the multimodal embedding of regions, a holistic approach to creating embeddings for regions that consider various data, such as city grid division, taxi mobility data, points of interest categories, and satellite images.

Other approaches include hierarchical road network representation [20], which focuses on capturing the graph structure of road infrastructure, and representation learning for road networks [21], which introduces the road network to vector model to learn representations of road networks.

Highway2Vec [22] is a method to generate embeddings for microregions based on characteristics of road infrastructure. It uses data from OpenStreetMap road networks in selected cities, and the H3 spatial index is used to learn representations. In the work of Kim and Yoon [23], the authors introduce HUGAT, an innovative approach that uses a heterogeneous urban graph to account for the heterogeneity of various urban data sets, including spatial and temporal variability in human mobility within a unified graph structure. Another example in which authors use GNNs to learn representations of regions is a previous work [24], in which mobility data are used.

4 Data

The experiments conducted in this study used data comprising spatial objects. These objects, represented as polygons, correspond to planning areas within a spatial development plan. Each area was assigned a text that constitutes the description of the provisions and conditions for land development applicable in that area. It can be said that this text represents a certain characteristic of the spatial object, which is the planning area.

The training data set consisted of planning areas represented in the form of a graph G(V, E), where V is the set of nodes representing the planning areas and E is the set of edges, representing the mutual adjacency of the areas. The connections in the graph are bidirectional, that is, each area has a mutual adjacency relationship with another area (Figure 1). In our study, the adjacency relationships between the planning areas were determined automatically through a spatial analysis process. We identified polygons that share common or touching boundaries. This approach eliminated the need for manual annotation of adjacency relationships, as they were derived directly from existing spatial data.

Figure 1

An example of representing spatial planning areas in the form of a graph, where each area is represented as a node in the graph.

The data set was divided into subsets representing municipalities or their parts that contain spatial planning areas. This division is associated with the need to consider the adjacency of areas. In the training data, each planning area has a label representing the category of future land use. The total number of categories is 8, i.e., housing, mining and quarrying, natural areas, agriculture, communication and infrastructure, forestry, production, and services. The text assigned to each area is represented by embedding vectors created using the Sentence Transformers model [25], each with a dimensionality of 768 (Figure 2).

Figure 2

Embeddings for nodes created from node attributes (textual description of the planning area).

5 Methods

5.1 Method I

In this strategy, we used a body of plan texts as the basis to develop a classifier. The sentence transformer model was used to convert these text data into an embedded format. It is worth noting that this technique is a baseline method that does not take into account any information about spatial relationships of the objects. We use this method as a comparative basis for two subsequent methodologies, which are centered around path generation in a graph. XGBoost was selected as the classifier for this task. The same method was used for this classification task in the study of Kaczmarek et al. [2].

5.2 Method II

In Method II, we employ the random walk algorithm to derive node representations, as illustrated in Figure 3. For every node in the graph, 200 random paths of length 3 are generated. We used a biased random walk with empirically determined parameters. However, it is important to note that, in the context of our case analyzed, the choice of parameters did not carry crucial significance. This is due to the fact that the graph exploration path, limited to three steps, was designed to focus specifically on capturing the local neighborhood characteristics of nodes.

Figure 3

Generating paths using the random walk algorithm.

Subsequently, each node within these paths is converted into its embedding representation using the embeddings previously generated by the sentence transformer. After this conversion, for each node, we have 800 embeddings in total. This is because, for each of the 200 paths, we obtain four embeddings: three nodes plus the starting node itself. Then, the average of these 800 embeddings is computed. As a result, for each node, we acquire a single embedding with a dimension of 768, which is consistent with the dimension produced by the sentence transformer. These averaged embeddings depict the local graph structure surrounding individual nodes and are utilized as features characterizing the nodes. They are concatenated with the embeddings for the texts of the examined nodes. These combined features are later used for predicting node labels via the XGBoost classifier.

In summary, this method harnesses information embedded in the node’s text (through embeddings from the sentence transformer) as well as information about its local structure in the graph (via averaged embeddings from the random walk algorithm). Additionally, this process is illustrated in Figure 4.

Figure 4

Classification process using Method II.

5.3 Method III

In Method III, we use our developed path generation algorithm, called spatial walk, which focuses on examining the nearest neighborhood of the examined node. In Method III, for each node, as many paths are generated to exhaust possible neighborhood combinations. The process of generating paths can be described in the following steps:

For a given analyzed node u, we create a list l of its first-degree neighbors of length n. For example, if we have five neighbors, the list will look like this: l = [v ₁ , v ₂ , v ₃ , v ₄ , v ₅].
Then, for the first element of the list l (v ₁), we generate (n − 1) paths of length 3, where the first two elements of each path are constant (v ₁ and the central node u), and the third element is one of the remaining neighbors in the list l (v ₂ , v ₃ , v ₄ , v ₅). We obtain the following paths: [v ₁, u, v ₂], [v ₁, u, v ₃], [v ₁, u, v ₄], [v ₁, u, v ₅].
We modify the list l, removing the first element (v ₁) and saving the result as a new list ll. We repeat Step 2 for the list ll until only one element remains in the input list.
As a result, we obtain (n − 1)! three-element paths, where n is the number of neighbors of node u.
For each generated path, we select the first and last elements from the list and generate a path of length 2 for them using the random walk method.
We append the paths created in Step 5 to the three-element paths from Step 4, including the paths for the first node at the beginning and for the last node at the end.

Example paths for node 3 are presented in Figure 5. To provide a clearer illustration of the path generation process, the example has been limited to a subgraph containing five nodes and paths of length 2.

Figure 5

Generating paths using the spatial walk algorithm.

In our approach, we use a BiLSTM neural network to classify nodes in the graph. BiLSTM, due to its ability to process sequential data in both directions, is ideally suited for the analysis of paths generated by the spatial walk algorithm.

For each node in the graph, we generate a set of paths using the spatial walk algorithm. These paths represent the node’s local context, including its immediate neighborhood.

During the classification process, paths are introduced to the BiLSTM network in the form of object identifiers. The architecture of this network is shown in Figure 6. Within the BiLSTM network architecture, the embedding layer is implemented. Its primary function is to map each object identifier to its corresponding embedding. It is worth emphasizing that these embeddings were initially generated using the sentence transformer model, which converts object texts into their vector representations.

Figure 6

BiLSTM network architecture used to classify the nodes in the graph.

After converting the identifiers into embeddings, the subsequent layers of the BiLSTM network operate on these vector representations. The network uses these embeddings to make classification decisions. Owing to BiLSTM’s capability to analyze sequential data bidirectionally, the network can consider the context from both sides of each embedding in the sequence. The overall classification process is described in Figure 7.

Figure 7

Classification process using Method III.

6 Results

We have implemented our solution in Python 3.8. We used the Networkx [26] and Stellargraph [27] libraries to create the graph. The Stellargraph was also used to perform random walks. For the GNNs, we employed Tensorflow 2.10 with built-in Keras.

The classification was carried out for several variants of text lengths, specifically for the first 35, 50, 100, 150, 200, and 300 words in the text. The text, which is an attribute of the node, represents the characteristics of the planning area that we are classifying. This approach was adopted to determine the length of the text at which the classifier achieves the best results. The expectation is that the classifier will provide optimal performance at the longest text length.

The validation of models in individual methods was performed using sevenfold cross-validation. However, the division of the entire data set into seven folds had to take into account the mutual neighborhood of the planning areas. This means that the training data were prepared in such a way that the folds were drawn on groups of neighboring areas to preserve the neighborhood in space.

The data were divided into seven folds, maintaining the neighborhood of the planning areas in each fold. Then, for each of the seven iterations, six folds were used as training data and one fold as test data. Subsequently, the model was trained using the training data comprising neighboring planning areas. Its performance was assessed using the test data, with the model’s error being documented at each iteration. This process was repeated three times, which means a total of 21 different combinations of folds. After all iterations were performed, the average model error was calculated.

Figure 8 illustrates the F1-scores for the three distinct methods being examined. The ideal F1 score is 1, which signifies perfect precision and recall, while the worst possible score is 0. The F1 scores are calculated for each length of documents.

Figure 8

Evaluating the F1 score for the classification of spatial objects using three methods.

Method I, the baseline method, uses plain text as the source data for a classifier, transforming these text data into numerical form through embeddings produced by sentence transformers. This method does not consider spatial relationships of objects and uses XGBoost for classification.

Method II, in contrast, takes advantage of the random walk algorithm to learn node representations. For each node, random paths are created and transformed into embedding representations. The embeddings are then averaged, resulting in one embedding for each node. These embeddings are concatenated with the embeddings of the text under investigation for the prediction of labels using XGBoost.

Method III uses a spatial walk algorithm and performs a thorough exploration of each node’s local neighborhood. The path generation method focuses on the immediate neighbors of a given node. This method aims to exhaust all possible neighborhood combinations, allowing for a more in-depth analysis of the local graph structure. Spatial walk-based embeddings are then used in a BiLSTM network for classification.

Figure 9 presents example confusion matrices for each method analyzed, corresponding to a sample fold in a sample dataset, with document word lengths equal to 100. The confusion matrix shows the number of correct and incorrect classifications for each class. Columns A–C present the classification results for Methods I–III, respectively.

Figure 9

Confusion matrices for sample fold in a sample data set for a document length of 100. (a) Method I, (b) Method II, and (c) Method III.

The values on the diagonal represent correctly classified examples (i.e., the model predicted class 0 for true examples of class 0, class 1 for true examples of class 1, etc.). The values off the diagonal represent incorrectly classified examples (i.e., the model predicted a different class than the true one). The fewer the off-diagonal values, the better the model's performance.

In Tables 1–3, the results of classification measures of model quality are presented for individual methods.

Table 1

Classification report for sample data set in Method I

	Precision	Recall	F1-score	Support
0	1.00	0.80	0.89	5
1	0.65	0.96	0.77	323
2	1.00	0.67	0.80	15
3	0.91	0.90	0.91	790
4	0.98	0.94	0.96	677
5	0.98	0.92	0.95	586
6	0.95	0.89	0.92	140
7	0.99	0.56	0.71	160
Accuracy			0.90	2,696
Macro avg	0.93	0.83	0.86	2,696
Weighted avg	0.92	0.90	0.90	2,696

Table 2

Classification report for sample data set in Method II

	Precision	Recall	F1-score	Support
0	1.00	0.60	0.75	5
1	0.67	0.95	0.79	323
2	1.00	0.67	0.80	15
3	0.88	0.89	0.89	790
4	0.97	0.95	0.96	677
5	0.98	0.92	0.95	586
6	0.95	0.87	0.91	140
7	0.99	0.56	0.71	160
Accuracy			0.90	2,696
Macro avg	0.93	0.80	0.84	2,696
Weighted avg	0.91	0.90	0.90	2,696

Table 3

Classification report for sample data set in Method III

	Precision	Recall	F1-score	Support
0	1.00	0.40	0.57	5
1	0.74	0.98	0.84	323
2	1.00	0.80	0.89	15
3	0.95	0.98	0.96	790
4	0.98	0.94	0.96	677
5	0.99	0.92	0.95	586
6	0.95	0.84	0.89	140
7	0.96	0.76	0.85	160
Accuracy			0.93	2,696
Macro avg	0.95	0.83	0.86	2,696
Weighted avg	0.94	0.93	0.93	2,696

To evaluate the efficiency of the three approaches, we performed paired t-tests in pairs, applying the Benjamini–Hochberg correction for multiple comparisons, with a confidence level of 95% [28]. To do this, we randomly selected seven folds to evaluate the models, repeating this process three times. The following results illustrate the performance for a random data set and a random fold with a document length of 100. The necessity of designing multiple hypothesis testing experiments is to ensure the robustness and reliability of the results. Multiple hypothesis testing allows us to control the rate of false discoveries, which is particularly important when dealing with a large number of simultaneous hypotheses. This approach helps avoid the problem of inflated Type I errors (false positives) that can occur when multiple comparisons are made. By conducting multiple hypothesis testing, we can better understand the performance of our model under different conditions and validate its effectiveness and reliability.

Table 4 lists the results of a statistical comparison between the three methods using the paired t-test and the false discovery rate (FDR) control. The FDR control is used to correct for multiple comparisons, which is a common problem in statistical analyses when we test several hypotheses simultaneously.

Table 4

Outcomes of the conducted paired t-tests

Method 1	Method 2	Fdr_bh
Method I	Method III	0.02
Method I	Method II	0.85
Method III	Method II	0.02

In this context, the values in the “fdr_bh” column represent the adjusted p-values from these comparisons. A commonly used threshold for significance in such tests is 0.05. If the adjusted p-value is less than 0.05, we reject the null hypothesis and conclude that there is a statistically significant difference between the two methods being compared.

7 Discussion

When comparing the effectiveness of the three methods, certain patterns can be observed. As illustrated in Figure 8, Method I achieves an average F1 score of 0.660 for documents with a length of 35 words, showing an increasing trend up to 0.898 for documents of 300 words. This provides a baseline for comparing the effectiveness of the other two methods. Method II yields results comparable to Method I, with F1 scores of 0.659 for 35-word documents and a maximum F1 score of 0.901 for documents with 200 words. Method III seems to be the most promising. Beginning with a classification accuracy of 0.669 for 35-word documents, Method III reaches the highest score of 0.931 for 200-word documents, outperforming both Methods I and II.

All methods show similar performance with shorter text lengths, that is, 35 and 50 words (Figure 8). However, as the text length increases to 100 words, Method III starts to significantly outperform the others. This trend persists even as the text length exceeds 100 words. Specifically, at a text length of 100 words, Method III outperforms Method I by about 5.4% and Method II by about 5.7%. This suggests that an in-depth examination of each node’s local neighborhood and generating embeddings via spatial walks significantly boosts performance at this text length. While Methods I and II seem to reach a peak of around 200 words, showing a slight performance decline at 300 words, Method III maintains relatively high performance even at the maximum text length. There is a minor drop when transitioning from 200 to 300 words, but overall, Method III seems to yield the best results among the three methods, especially as text length increases.

For all three methods, the classification scores are significantly lower for a word count of 35 compared to other values. This suggests that using a larger number of words at the beginning of the document allows for a better understanding of the topic and context, leading to an improved classification. Increasing the length of the document can improve the classification results. This indicates that the additional information contained in these words allows the models to better understand the context and topic. However, after crossing a certain threshold (e.g., 100 words), increasing the document length does not significantly affect the classification results. This may suggest that the models already have enough information to achieve good classifications. Despite the overall trends, it is worth noting some anomalies. For example, in the case of Method III, the F1 score for 200-word documents is higher (0.931) than for 300-word documents (0.917). This highlights that increasing the length of the document does not always lead to improved accuracy, and the optimal document length may depend on the specific method and data set. Furthermore, this may relate to the embeddings used, which have a fixed dimensionality. Increasing the document length may introduce more noise into these embeddings, potentially leading to a performance drop.

The F1 scores demonstrated that Method III achieves better classification performance than Methods I and II, particularly with longer text lengths. Statistical confirmation of this fact is provided by the results of the statistical analysis, which show a significant difference between Method III and the other two methods.

The results of the Benjamini–Hochberg test (Table 4) confirm a statistically significant difference between Methods III and I, as well as between Methods III and II. No statistically significant differences were observed between Methods I and II. In terms of efficiency, based on previous results, this strengthens the conclusion that Method III surpasses the other two.

A detailed analysis of the results for a sample dataset and documents of 100 words in length confirms the conclusions drawn above. As can be seen in confusion matrices (Figure 9) and the tables presenting detailed reports on the classification results (Tables 1–3), generally, in all three methods, the models perform well in classifying all eight classes. However, Method III achieves the highest classification accuracy (0.93). This suggests that Method III is generally better at correctly classifying the planning areas compared to Methods I and II, which achieved the same precision (0.90).

Method III also displays superior precision for most classes. Especially for class 1, the precision for Method III is considerably higher (0.74) compared to Methods I and II, which were 0.65 and 0.67, respectively. Precision refers to the number of correctly predicted positive results out of all predicted positive results, which implies that Method III is more precise in predicting this class.

Method III also demonstrates better recall for some classes, for instance, for class 3, where the recall is 0.98, while for Methods I and II, it is 0.90 and 0.89, respectively. Recall is a measure of a model’s ability to correctly detect positive outcomes, meaning that Method III is more effective at detecting true cases of this class. Generally, Method III achieves a better F1 score for most classes, except class 0, where Method I performs better. Method III also has a higher average for the macro-avg and weighted avg metrics, suggesting that it is more efficient in classification, both in terms of consistency of results across all classes (as suggested by macro avg) and in terms of accuracy of results for classes with more instances (as weighted avg suggests).

In summary, Method III, which utilizes a spatial walk path generation algorithm and a BiLSTM network, yields the best results, particularly for longer texts. The use of a spatial walk algorithm allows for greater control over the exploration of the graph and a more thorough examination of the local neighborhoods of the nodes under study. It can capture important dependencies between neighboring nodes that help the classification process.

8 Conclusions

The presented article introduces a method of classifying spatial objects, specifically planning areas within spatial development plans, utilizing a technique of learning node representations via the spatial walk algorithm and the BiLSTM neural network. The results obtained confirm that taking into account information related to the characteristics of the object (in this case, the text describing the planning area), as well as its topological relationships with other objects, allows for a more efficient and precise classification of spatial objects.

A key aspect of our method is the focus on local analysis of the neighborhood structure of the investigated node. The spatial walk algorithm has a defined path generation scheme, which enables a deeper understanding of local spatial dependencies between objects. Thanks to this approach, the method better represents the essential relationships between spatial objects than random path generation. As the results of the models presented in three different classification scenarios show, our approach to path generation significantly improves the classification results.

This method proves particularly effective in the context of spatial objects, where the proximity of the neighborhood plays a significant role and where the number of neighbors is limited. An example of this is the problem of classifying planning areas, where each area usually has a limited number of neighbors. This is different from the analysis of the structure of a social graph, where nodes can be linked to many other nodes.

A noteworthy factor contributing to the superiority of Method III is the utilization of the BiLSTM neural network. BiLSTM is a sequential model that excels at analyzing sequential data, such as the paths generated by the spatial walk algorithm. In this case, when the sequence represents the local neighborhood of nodes in the graph, the ability to consider relationships both backward and forward can significantly enhance the model’s capability to recognize patterns and features characterizing different node classes. In the context of analyzing planning areas, where the sequence is of significant importance, the ability to analyze both preceding and succeeding contexts for each embedding allows for the inclusion of spatial and topological relationships. This, in turn, can contribute to a more accurate characterization of nodes and more precise classification.

In future work, we plan to apply this method to the analysis of point-geometry objects. Expanding this approach to point objects could significantly broaden its potential applications and possibilities.

Acknowledgments

Partial research funding was provided by the project “TeleCyfro – methods of automated data extraction from analog unstructured engineering documentation using AI in remote work environment” POIR.01.01.01-00-0359/20, which is supported by the National Center for Research and Development in Poland.

Conflict of interest: Author states no conflict of interest.

References

[1] Tobler WR. A computer movie simulating urban growth in the Detroit region. Economic Geogr. 1970;46:234–40.10.2307/143141Suche in Google Scholar

[2] Kaczmarek I, Iwaniak A, Świetlicka A. Classification of spatial objects with the use of graph neural networks. ISPRS Int J Geo-Inf. 2023;12:83.10.3390/ijgi12030083Suche in Google Scholar

[3] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Epub ahead of print; Sept 2013. 10.48550/arXiv.1301.3781.Suche in Google Scholar

[4] LINE. Proceedings of the 24th International Conference on World Wide Web; (accessed 30 April 2023). 10.1145/2736277.2741093.Suche in Google Scholar

[5] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. p. 1532–43.10.3115/v1/D14-1162Suche in Google Scholar

[6] Grover A, Leskovec J. node2vec: Scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery. p. 855–64.10.1145/2939672.2939754Suche in Google Scholar

[7] Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching Word Vectors with Subword Information. Epub ahead of print; June 2017. 10.48550/arXiv.1607.04606.Suche in Google Scholar

[8] Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Epub ahead of print; May 2019. 10.48550/arXiv.1810.04805.Suche in Google Scholar

[9] Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online Learning of Social Representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: Association for Computing Machinery. p. 701–10.10.1145/2623330.2623732Suche in Google Scholar

[10] Jiang W, Luo J, He M, Gu W. Graph neural network for traffic forecasting: The Research Progress. ISPRS Int J Geo-Inf. 2023;12:100.10.3390/ijgi12030100Suche in Google Scholar

[11] Li H, Yang S, Song Y, Luo Y, Li J, Zhou T. Spatial dynamic graph convolutional network for traffic flow forecasting. Appl Intell. 2022;53:14986–98.10.1007/s10489-022-04271-zSuche in Google Scholar

[12] Liu T, Zhang J. An adaptive traffic flow prediction model based on spatiotemporal graph neural network. J Supercomput. 2023;79:15245–69.10.1007/s11227-023-05261-9Suche in Google Scholar

[13] Lan T, Cheng H, Wang Y, Wen B, et al. Site selection via learning graph convolutional neural networks: A case study of Singapore. Remote Sens. 2022;14:3579.10.3390/rs14153579Suche in Google Scholar

[14] Zhao X, Wang S, Wang H. Organizational geosocial network: A graph machine learning approach integrating geographic and public policy information for studying the development of social organizations in China. ISPRS Int J Geo-Inf. 2022;11:318.10.3390/ijgi11050318Suche in Google Scholar

[15] Fang L, Kou Z, Yang Y, Li T. Representing spatial data with graph contrastive learning. Remote Sens. 2023;15:880.10.3390/rs15040880Suche in Google Scholar

[16] Sierra D. Loc2Vec: Learning location embeddings with triplet-loss networks. Sentiance; (2018, accessed 17 May 2023). https://sentiance.com/loc2vec-learning-location-embeddings-w-triplet-loss-networks.Suche in Google Scholar

[17] Du J, Chen Y, Wang Y, Pu J. Zone2Vec: Distributed representation learning of urban zones. 2018 24th International Conference on Pattern Recognition (ICPR); 2018. p. 880–5.10.1109/ICPR.2018.8545376Suche in Google Scholar

[18] Yao Z, Fu Y, Liu B, Hu W, Xiong H. Representing urban functions through zone embedding with human mobility patterns. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization. p. 3919–25.10.24963/ijcai.2018/545Suche in Google Scholar

[19] Jenkins P, Farag A, Wang S, Li Z. Unsupervised representation learning of spatial data via multimodal embedding. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing China: ACM. p. 1993–2002.Suche in Google Scholar

[20] Wu N, Zhao XW, Wang J, Pan D. Learning effective road network representation with hierarchical graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: Association for Computing Machinery. p. 6–14.Suche in Google Scholar

[21] Wang M-X, Lee W-C, Fu T-Y, Yu G. On representation learning for road networks. ACM Trans Intell Syst Technol. 2020;12:1–27.10.1145/3424346Suche in Google Scholar

[22] Leśniara K, Szymański P. Highway2vec: Representing OpenStreetMap microregions with respect to their road network characteristics. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery. Seattle Washington: ACM. p. 18–29.10.1145/3557918.3565865Suche in Google Scholar

[23] Kim N, Yoon Y Effective Urban Region Representation Learning Using Heterogeneous Urban Graph Attention Network (HUGAT); (2022, accessed 17 May 2023). http://arxiv.org/abs/2202.09021.Suche in Google Scholar

[24] Seong G, Kim N, Kim S, Yoon Y. Multi-modal based region representation learning considering mobility data in Seoul. Procedia Comput Sci. 2023;220:251–8.10.1016/j.procs.2023.03.153Suche in Google Scholar

[25] Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. Epub ahead of print; Aug 2019. 10.48550/arXiv.1908.10084.Suche in Google Scholar

[26] Hagberg A, Swart P, Chult DS. Exploring network structure, dynamics, and function using NetworkX. Los Alamos, NM (United States): Los Alamos National Lab. (LANL); 2008.Suche in Google Scholar

[27] Data61 C. StellarGraph Machine Learning Library. GitHub Repository; 2018. https://github.com/stellargraph/stellargraph.Suche in Google Scholar

[28] Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc: Ser B (Methodol). 1995;57:289–300.10.1111/j.2517-6161.1995.tb02031.xSuche in Google Scholar

Received: 2023-05-28

Revised: 2023-08-11

Accepted: 2023-09-05

Published Online: 2023-09-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/geo-2022-0542

Schlagwörter für diesen Artikel

random walk; graphs; neural networks; representation learning; geospatial data

Creative Commons

BY 4.0