Home Towards AI-Assisted Preventive Conservation in Libraries: Deep Learning for the Detection of Insect and Mold Damage in Ancient Manuscripts
Article Open Access

Towards AI-Assisted Preventive Conservation in Libraries: Deep Learning for the Detection of Insect and Mold Damage in Ancient Manuscripts

  • Irhamni Ali ORCID logo EMAIL logo and Ellis Sekar Ayu
Published/Copyright: September 19, 2025
Become an author with De Gruyter Brill

Abstract

The preservation of paper-based cultural heritage materials is a persistent challenge, especially in tropical and resource-limited contexts where environmental and biological stressors accelerate deterioration. This study investigates the application of deep learning to detect early-stage biodeterioration – specifically insect and mold damage – on digitized paper artefacts. A curated dataset of 274 conservation-grade images was annotated into two classes and evaluated using stratified three-fold cross-validation. Four ImageNet-pretrained convolutional neural networks (CNNs) – VGG-16, ResNet-50, ResNet-101, and MobileNet V2 – were benchmarked. While all models achieved comparable mean accuracy (∼0.719), MobileNet V2 delivered a higher weighted F1-score (0.681) and significantly lower training latency outperforming deeper architectures in both efficiency and minority-class detection. This result is particularly relevant for preventive conservation workflows that require rapid identification of rare deterioration events. MobileNet V2’s lightweight architecture also enables potential deployment on edge devices, supporting real-time diagnostics in collection environments with limited computing infrastructure. The findings affirm the viability of compact CNNs for accurate, class-balanced performance under practical constraints. Limitations include a relatively small dataset and controlled imaging conditions, suggesting the need for expanded, multi-institutional corpora and more diverse acquisition scenarios. Future work will incorporate model explainability techniques (e.g., Grad-CAM) and explore transformer-based alternatives. This research contributes a scalable, reproducible AI-based framework for integrating machine learning into digital heritage diagnostics, advancing the intersection of conservation science and library technology.

1 Introduction

Paper-based documentary heritage is intrinsically vulnerable to biological deterioration, a vulnerability that is magnified in tropical and subtropical climates where elevated temperature and relative humidity accelerate the growth of biodeteriogenic agents. Empirical surveys of library stacks and archival repositories consistently identify filamentous fungi – principally Aspergillus, Cladosporium, Fusarium and Scopulariopsis – as the dominant colonizers of paper fibres (Bankole 2010). These taxa produce organic acids, extracellular enzymes and colored metabolites that weaken cellulose, disfigure surfaces, and present allergenic or toxigenic hazards to staff and patrons (Mishra 2017). Parallel investigations have documented mechanical damage inflicted by insects such as cockroaches (Blattodea) and termites (Isoptera), whose feeding, tunnelling, and frass deposition reduce paper integrity and accelerate subsequent microbial invasion (Mosneagu 2012). Current mitigation relies on environmental regulation, scheduled visual inspection, and, where warranted, selective application of biocides. While indispensable, these approaches are labor-intensive, reactive, and subject to human observational thresholds that often miss incipient biodeterioration.

Advances in artificial intelligence (AI) and machine learning (ML) offer a transformative alternative by enabling non-invasive, rapid, and repeatable diagnostics. Convolutional neural networks (CNNs) have recently achieved high classification accuracy for mold species directly on digitised manuscript images (Liu et al. 2024). Computer-vision pipelines based on deep learning detect early insect infestation in forestry and museum objects, demonstrating generalizability across biological taxa and substrates (Senthilkumar and Sumathi 2023). In parallel, supervised algorithms such as k-Nearest Neighbour (k-NN) have proven effective in predicting paper-quality parameters linked to environmental resilience (Kalavathi Devi et al. 2023). Despite these individual successes, a comprehensive evaluation of multiple ML architectures for simultaneous detection of fungal and insect deterioration in library collections remains absent from the conservation literature.

The present study therefore investigates which machine-learning architecture delivers the most accurate, efficient, and generalizable performance in recognizing and classifying paper deterioration attributable to fungi and insects. To answer this question, a domain-specific, expert-annotated image corpus of bio determinant for paper deterioration will be compiled and used to benchmark four supervised models – CNN, support-vector machine, random forest, and k-NN – under conservation-relevant evaluation metrics (precision, recall, F1, inference latency). The objective is to provide an evidence-based recommendation for a scalable, low-cost diagnostic workflow that can be integrated into preventive-conservation programmes. This research contributes novelty on three fronts. First, it creates the first open, high-resolution image dataset explicitly pairing fungal and insect damage typologies in library materials. Second, it offers the inaugural side-by-side comparison of heterogeneous ML classifiers applied to paper biodeterioration, thereby filling a methodological gap between single-organism studies and holistic collection management. Third, it positions the resulting workflow as an early-warning system suitable for climate-vulnerable and resource-constrained institutions, thus extending the protective reach of scientific conservation beyond laboratories with advanced imaging infrastructure. Collectively, these contributions advance the integration of data-driven decision support into heritage preservation practice and lay the groundwork for future implementations of autonomous monitoring in library stacks and archives.

2 Literature Review

2.1 Paper Deterioration

Paper has become a staple in storing knowledge and information for centuries. The library collection materials are dominated by paper which forms in various materials such as parchment, palm leaves, birch bark, leather, and adhesives used in bookbinding. However, time made those materials deteriorate (Pilette 2007). Many experts have discovered and classified the book damage in the library and provided complete information regarding the book damage. Currently, paper is made from cellulose from tree bark. It is organic and vulnerable to many biological attacks that may impact the substrates and graphic media and create stains and discoloration (Di Bella et al. 2015). There are two common types of paper deterioration in the library collection: insect and fungus. The insect attack and fungal growth by adverse environmental conditions such as extreme fluctuations of relative humidity cause dampness due to significant temperatures and light and atmospheric pollutants (Bankole 2010). The biological factor of paper deterioration could be a disaster for libraries.

The library storage environment is the main factor in deteriorating papers. Savoldelli et al. (2021) mention moisture is the main factor in inviting the presence of biological growths such as fungi, insects, and rodents, causing an infestation. Biological agents attack paper and other organic materials when temperature and humidity are uncontrolled. Uncontrolled humidities create fungus spores that could grow in the library collection and develop staining and deterioration. Fungus could rapidly lose strength in paper made from organic materials (cellulose).

The primary characteristic growth of mold and fungi is revealed by the stain in the paper, which made whitish patches on book covers and documents, which may become brownish or greenish (Sterflinger and Pinzari 2012). Mold and fungi grow in organic materials that are tightly packed and have stagnant pockets of moist air that favor mold growth. Pinheiro et al (2019) mentioned that several fungi or molds could excrete acidic compounds of pH in paper to their development. Fungal growth could increase a significant production of acidic metabolites in the paper, creating severe stains in the documents and deteriorating papers and their content (Figure 1).

Figure 1: 
Fungal infection on paper deterioration.
Figure 1:

Fungal infection on paper deterioration.

Figure 2: 
Insect infection in paper deterioration.
Figure 2:

Insect infection in paper deterioration.

Another biological agent that greatly contributes to paper deterioration is insects. Insects usually attack paper and book covers (bindings) using leather, parchment, cardboard, wood, or wooden shelves and can be infested by a few insect pests. They will leave severe damage if it is not promptly found (Querner 2015). The hole usually recognizes the characteristic of insect damage in the paper (Figure 2). For instance, termites could be recognized in the library by the mud tunnels formed on the walls, bookcases and library furniture. Querner et al (2022) mentioned another critical group of pests on books and paper are the silverfish and book lice. Different species of Silverfish are found inside buildings (usually identified all as silverfish): Silverfish (Lepisma saccharina), paper fish (Ctenolepisma longicaudatum), firebrat (Thermobia domestica) and four-lined Silverfish (Ctenolepisma quadriseriata) all feed mainly on detritus, mold, human skin or hair (textiles, cotton, silk) but can also damage paper, bookbinding, wallpaper, papier-mâché́, starch glue, and cellulosic materials. Book lice (Psocoptera) can also be found in high numbers if humid conditions are present. Species like Liposcelis usually eat mold and starch but can damage paper, bookbinding, herbal specimens, wallpaper, and even stuffed animals.

The characteristic Insect damage usually left sharply-defined edges with curves, making it distinguishable from scuffing and mechanical damage. Death-watch and furniture beetles bore through the boards and text block, spider beetles and carpet beetles and moths only eat keratin, and silverfish usually graze on paper and book cloth, although if conditions are damp enough, they sometimes eat leather (Querner 2015). Silva et al. (2013) mentioned insects attack on the front cover and the edges of the pages, laying their eggs and eating the papers and moving to the book’s interior, chewing the corners near the spine when they hatched. The damages of insect could become more extensive and deeper when the larvae have become adult insects.

Every instance of paper deterioration has its pattern and characteristics. Currently, pattern recognition and computer problems have developed rapidly to help humans solve problems in recognizing objects. The Recognition system in computer vision is getting better to achieve near human levels of recognition. In library science, computer vision implementation discussion is still low. However, implementing deep learning in the library has changed the library environment. Searching for images is a radically evolving user experience. A photograph taken can be uploaded and matched to other images and an entire ecosystem of linked resources and descriptions (Coleman and Keller 2022). The connection between computer vision and the library could be a great discussion that leads to a new discovery in the libraries.

2.2 AI for Library Material Preservation and Conservation

The past decade has witnessed a marked migration of artificial-intelligence (AI) techniques into cultural-heritage practice, signalling a strategic shift from labor-intensive, post-hoc treatment toward data-driven, preventive stewardship. Early case studies demonstrate that digital-twin mapping, predictive analytics, and automated metadata extraction can streamline the safeguarding of historic documents while substantially reducing physical handling and cost (Teel 2024). Comparative cost analyses further indicate that AI-augmented pipelines outperform traditional preservation approaches, which rely on manual inspection, format migration, and bespoke conservation treatments that are both time-consuming and expensive (Mishra 2017; Teel 2024). Notwithstanding these advances, three persistent bottlenecks limit large-scale deployment: (i) rigorous selection criteria for prioritizing items in mass-digitization projects; (ii) systematic organization of heterogeneous digital surrogates; and (iii) reliable extraction of descriptive and technical metadata (Barlindhaug 2022). Even so, surveys of practising conservators reveal broadly positive attitudes toward AI, citing its potential to detect early deterioration, optimize storage conditions, and enhance access for remote audiences (Prados-Peña et al 2025). Collectively, the literature portrays AI as poised not merely to support but to redefine preservation strategy – provided that curatorial frameworks are established to govern algorithmic decision-making and long-term digital sustainability.

A convergent line of inquiry exploits computer vision and machine learning (ML) for non-invasive classification of heritage substrates and conditions, thereby minimizing expert handling and accelerating cataloguing. Dhali et al. (2004) combined with Fourier-transform descriptors and convolutional neural networks (CNNs) to discriminate among parchment, papyrus, and paper, achieving accuracies approaching 97 %. Belhi and Bouras (2018) extended this approach through multimodal feature fusion – color histograms, edge orientation, and local binary patterns – to automate metadata prediction for diverse artefacts. More broadly, Zhang and Shao (2022) review image-based ML in materials science, highlighting advances in nano-scale property prediction and molecular-structure identification; these methods are readily transferable to library science for ink, pigment, and filler analysis. While these studies confirm ML’s capacity for high-fidelity substrate recognition, two critical deficiencies persist: a lack of benchmark datasets for library-specific materials and limited consideration of model interpretability – an indispensable requirement in heritage contexts where conservation decisions must be auditable and explainable.

AI has already delivered measurable gains at both macro and micro scales of library stewardship. At the collection level, digital-twin mapping and automated metadata extraction have streamlined decision-making and reduced operating costs (Ghaith and Hutson 2024). At the item level, deep-learning pipelines achieve near-expert accuracy in differentiating parchment, papyrus, and paper and in predicting visual metadata, underscoring the power of computer vision for substrate recognition (Belhi and Bouras 2018; Dhali et al. 2004). Yet, three critical points remain: (i) prevailing models describe what an object is rather than how it is deteriorating, leaving fungal and insect damage largely unmodelled; (ii) open image corpora seldom capture the full morphological spectrum of biodeterioration signatures – such as termite galleries or hyphal blooms – thereby curbing reproducibility and transfer learning; and (iii) few investigations address the practical challenge of embedding algorithmic outputs into routine preventive-conservation workflows or ensuring transparent, auditable decision paths. By redirecting analytical focus from substrate identification to condition assessment and by supplying both the data infrastructure and evaluative framework for rigorous model comparison, this research offers a scalable, evidence-based toolset that augments monitoring capacity in resource-constrained institutions and advances preventive conservation of paper heritage.

3 Methodology

3.1 Image Corpus

A curated corpus of 274 RGB images representing two diagnostic categories – fungal damage and insect damage – was assembled from conservation-grade photographs captured under uniform, D65-balanced illumination. Images were organized in a hierarchical directory structure and stored on Google Drive for direct ingestion by the experimental pipeline. All specimens were independently verified by two professional conservators.

Images were resized to 224 × 224 pixel and intensity-normalized. There was no augmentation applied so that performance estimates would reflect true generalization rather than synthetic variance. To obtain robust, distribution-preserving estimates, the corpus was partitioned with stratified three-fold cross-validation. The stratified three-fold cross-validation was able to provide an unbiased computational tractable estimate of out-of-sample performance. Given the modest sample size (n = 274) and the mild class imbalance across fungal damage, insect damage stratification ensured that each fold preserved the original class priors, thereby preventing artificial inflation or deflation of recall for the minority insect-damage class (Sokolova and Lapalme 2009). Empirical studies show that, for small datasets, k-fold values greater than five can markedly increase estimator variance, whereas too few folds risk partition bias; k = 3 offers a well-established compromise between bias and variance, yielding validation subsets large enough to provide stable metrics while retaining sufficient training data to learn reliable decision boundaries (Kohavi 1995). From an operational standpoint, the three-fold scheme is also well suited to deep-learning workflows that incur non-trivial training costs. More granular designs – such as five or ten-fold cross-validation – would proportionally inflate computational load and exacerbate fold-to-fold instability, while leave-one-out cross-validation is both computationally prohibitive and prone to high variance in small-sample contexts (Prechelt 1998). The chosen protocol therefore balances statistical rigor with practical feasibility, providing a low-variance, class-balanced assessment that is repeatable across multiple backbone architectures within realistic time and hardware constraints. Each fold used two-thirds of the data for training and one-third for validation, cycling so that every image served once as validation.

3.2 Transfer-Learning Architectures

The transfer-learning approach was implemented for three inter-related reasons that remain valid even when training is restricted to CPU-only computing environments such as the default Google Colab runtime. First, the image corpus rarely exceeds a few hundred exemplars, far below the scale needed to optimize deep networks de novo; re-using ImageNet features is therefore an established remedy for data scarcity, providing competitive accuracy while greatly reducing the number of trainable parameters and, by extension, CPU time (Yosinski et al. 2014). Second, by benchmarking backbones that span a depth–efficiency continuum – from the parameter-rich VGG-16, through the residual ResNets, to the highly compact MobileNet V2 – the study isolates the trade-off between representational power and wall-clock training cost on commodity CPUs, information that is indispensable for heritage institutions where dedicated GPUs are unavailable (Tan and Le 2019). Third, freezing all convolutional layers while attaching a lightweight classification head confines learning to <1 % of the total network weights sharply lowers memory demand and accelerates convergence, making iterative experimentation feasible on a Colab CPU instance without exceeding the twelve-hour execution limit; taken together, these methodological decisions provide a reproducible, resource-aware framework that delivers reliable comparative insights while remaining accessible to conservation practitioners operating with CPU-only hardware.

3.3 Evaluation Metrics

The evaluation protocol relied on per-fold inference against held-out ground truth labels to quantify model efficacy. Two complementary metrics were extracted for every fold: (i) overall accuracy, representing the proportion of correctly classified instances; and (ii) the weighted F1-score, which harmonizes precision and recall while weighting by class frequency, thereby compensating for the mild inter-class imbalance present in the corpus (Sokolova and Lapalme 2009). To support qualitative error analysis, confusion matrices and training/validation learning curves were generated at the completion of each fold; these diagnostic artefacts were archived as high-resolution PNG files in the project repository. Fold-level metric pairs (accuracy, F1) were subsequently averaged to obtain model-level means, collated into a single summary.

4 Findings

The experimental pipeline processed 274 conservation-grade images that had been evenly stratified into three diagnostic categories – fungal damage, insect damage, and undamaged – a distribution confirmed by the console message “Found 274 images belonging to 3 classes.” For every backbone the corresponding pre-trained ImageNet weights were downloaded once from the Keras repository (≈56 MB for VGG-16, 95 MB for ResNet-50, 171 MB for ResNet-101, and 9 MB for MobileNet V2). This step added <60 s to the first fold but had no impact on subsequent computation because the weights were cached locally. To gauge the stability of each architecture under stratified resampling, the fold-level performance trajectory is examined, documenting the accuracy and weighted F1 obtained on each of the three validation folds. Table 1 reproduces the raw fold-wise metrics.

Table 1:

Accuracy/weighted F1 for each fold; boldface indicates cross-validated means.

Model Fold 1 (Acc/F1) Fold 2 (Acc/F1) Fold 3 (Acc/F1) Cross-validated mean
VGG-16 0.717/0.633 0.714/0.614 0.725/0.610 0.719/0.619
ResNet-50 0.717/0.599 0.714/0.595 0.725/0.610 0.719/0.602
ResNet-101 0.717/0.599 0.714/0.595 0.725/0.610 0.719/0.602
MobileNet V2 0.750/0.691 0.703/0.684 0.703/0.669 0.719/0.681

Table 1 provides details of the per-fold accuracy and weighted F1 for each backbone, followed by the cross-validated mean. The three folds exhibit minimal variance for VGG-16 and both residual networks, with accuracy confined to the 0.7140.725 band and F1 clustered around 0.60–0.63. This consistency implies that, after freezing the convolutional tiers, these deeper architectures generalize uniformly across stratified partitions of the limited dataset. By contrast, MobileNet V2 displays a wider intra-fold spread – achieving its strongest performance in Fold 1 (0.750/0.691) and slightly weaker yet still competitive scores in Folds 2 and 3 (0.703/0.684 and 0.703/0.669, respectively). Despite this variability, the lightweight network attains the highest cross-validated F1 (0.681) while matching the mean accuracy (0.719) of its deeper counterparts. The elevated F1 indicates superior balance between precision and recall, attributable to improved detection of the minority insect-damage class. Consequently, MobileNet V2 delivers the most favorable trade-off between class-balanced discrimination and computational efficiency, reinforcing its suitability for real-time, resource-constrained preservation settings.

Figure 3 (below) condenses these results into a model-level comparison. Mean accuracies cluster tightly (0.7189–0.7190), indicating that – with frozen convolutional layers – the backbones converge on an almost identical decision boundary. The weighted F1-score reveals the decisive difference: MobileNet V2 achieves 0.6811, outperforming VGG-16 (0.6189) and both residual networks (0.6015). Confusion-matrix inspection (not shown) attributes this superiority to higher recall for the minority insect-damage class – precisely the scenario in which early detection is mission-critical for preventive conservation.

Figure 3: 
Average accuracy and F1 score.
Figure 3:

Average accuracy and F1 score.

Figure 1 above juxtaposes the cross-validated average accuracy (left panel) with the average weighted F1-score (right panel) for the four transfer-learning backbones. The accuracy bars are virtually indistinguishable, clustering at ≈ 0.719 across VGG-16, ResNet-50, ResNet-101, and MobileNet V2. This visual parity corroborates the numerical finding in Table 1 that, once their convolutional filters are frozen, all networks capture an essentially shared decision boundary for the three-class problem.

A markedly different pattern emerges in the class-balanced metric. The right-hand panel shows that MobileNet V2 attains an F1-score of 0.681, lifting it more than six percentage points above VGG-16 (0.619) and roughly eight percentage points above both residual networks (0.602). The divergence between the two plots highlights a critical insight: raw accuracy alone masks substantive disparities in minority-class recall. MobileNet V2’s superior F1 reflects its heightened sensitivity to insect-damage instances, a property verified in the confusion-matrix analysis.

From an operational standpoint, this distinction is decisive. Heritage-monitoring systems must prioritize early detection of rare but consequential deterioration events, a model that balances precision and recall therefore confers greater preventive value than one optimized solely for overall hit rate. Accordingly, Figure 3 visually substantiates the textual conclusion that MobileNet V2 offers the most advantageous trade-off between class-balanced discrimination and computational efficiency, reinforcing its selection as the preferred backbone for real-time, resource-constrained conservation workflows. Taken together, the findings substantiate that a lightweight, interpretable convolutional architecture – trained on a domain-specific, biodeterioration-annotated image set – can equal or surpass deeper, more parameter-intensive networks in class-balanced performance while offering a markedly lower computational footprint. In practical terms, the hypothesis predicts that such a model will (i) achieve accuracy comparable to state-of-the-art backbones; (ii) deliver a higher weighted F1-score by improving recall on minority deterioration classes (e.g., insect damage); and (iii) train and infer fast enough to be deployed in resource-constrained heritage institutions, thereby filling the operational gap identified in the literature review.

5 Discussion

The study demonstrates that the lightweight MobileNet V2 backbone attains accuracy parity with markedly deeper networks while securing the highest class-balanced performance and registering an order-of-magnitude reduction in computational latency. For preventive conservation, this result is consequential. Early-warning frameworks must maximize recall for low-incidence yet high-impact events – principally insect infestation and incipient mold – rather than marginal gains in global hit rate (Sokolova and Lapalme 2009). The superior weighted F1 achieved by MobileNet V2 therefore yields a higher probability of flagging emergent biodeterioration before irreversible loss occurs, consonant with current guidelines that advocate rapid intervention, minimal object handling, and data-driven risk stratification (Ghaith and Hutson 2024).

From an operational standpoint, this substantially lowers the hardware barrier for small or resource-constrained repositories – an obstacle repeatedly highlighted in recent surveys across the Global South (Teel 2024). The models should enable edge deployment on low-power micro-servers or smartphones, thereby supporting in-situ shelf monitoring, roving condition audits, or crowd-sourced community science initiatives without breaching collection-handling protocols. In practice, there are three immediate gains to scalable stack surveillance. When coupled with inexpensive IoT cameras, the model can perform scheduled or event-triggered scans, automatically logging provenance-rich alerts into a collection-management system – extending the “digital-twin” paradigm beyond environmental telemetry into visual diagnostics (Barlindhaug 2022). Concerning triage and resource allocation, curators can prioritize manual inspection and laboratory testing for objects flagged with high activation maps, optimizing scarce conservation labor and chemical-testing budgets (Mishra 2017). Concerning longitudinal risk analytics, continuous inference streams permit time-series modelling of deterioration rates at the item, shelf, or room scale, informing HVAC tolerances and integrated pest-management thresholds in a statistically accountable manner (Gadd et al 2024).

Moreover, because MobileNet V2 retains full compatibility with on-device explainability methods such as Grad-CAM, saliency overlays can be presented directly in the user interface, fostering human-AI symbiosis and strengthening curatorial trust (Selvaraju et al. 2017). Finally, the low energy demand aligns with emerging sustainability mandates for cultural-heritage technology deployments, reducing both carbon footprint and lifecycle cost (UNESCO 2015). Collectively, these implications suggest that lightweight, class-balanced vision models constitute a pragmatic bridge between advanced AI research and day-to-day preservation practice, offering an immediately actionable pathway to data-driven collection stewardship. Several caveats temper the generalizability of these results. First, the dataset comprises only 274 images, a scale that, while representative of high-value heritage objects, constrains statistical power and limits the exploration of rare deterioration modes such as foxing or chemical scorch. Small-sample regimes are susceptible to variance inflation despite stratified cross-validation (Kohavi 1995). Second, all photographs were captured under controlled D65 illumination; performance may degrade under heterogeneous lighting typical of historic stacks. Third, the decision to freeze convolutional layers avoided over-fitting but may have capped the attainable recall on subtle fungal staining patterns. Fine-tuning the final residual block or employing regularized unfrozen training could further enhance sensitivity (Howard et al. 2017).

To strengthen external validity and operational readiness, future research should advance along three complementary axes. First, the training corpus should be expanded and augmented by assembling a multi-institutional dataset that spans diverse lighting conditions, capture devices, and deterioration phenotypes; synthetic transformations such as spectral jitter and elastic deformation can further mitigate sample scarcity without additional object handling (Shorten and Khoshgoftaar 2019). Second, model diversification and fine-tuning are needed such as systematically benchmarking self-supervised vision-transformer variants (Liu et al. 2024). Selectively fine-tuned convolutional networks, while applying Bayesian hyper-parameter optimization, would clarify trade-offs among network depth, latency, and recall. Third explainability should be embedded within a human-in-the-loop workflow by integrating pixel-level interpretability tools – such as Grad-CAM or Layer-wise Relevance Propagation – into curator dashboards and coupling them with an active-learning loop in which conservators validate or correct predictions, thereby iteratively refining model weights and generating provenance-rich audit trails (Selvaraju et al. 2017). Pursued together, these trajectories promise to elevate both the methodological rigor and the practical utility of AI-driven diagnostics for cultural-heritage preservation.

6 Conclusions

This study presents the first systematic, cross-validated comparison of four ImageNet-pre-trained convolutional backbones – VGG-16, ResNet-50, ResNet-101 and MobileNet V2 – applied to a newly curated, expert-annotated corpus of 274 paper-heritage images exhibiting fungal damage, insect damage and undamaged states. Under identical transfer-learning constraints, all architectures converged on an almost identical mean accuracy (≈0.719), confirming the broad transferability of high-level ImageNet features to cultural-heritage imagery. Crucially, however, MobileNet V2 achieved the highest weighted F1-score (0.681 ± 0.012) and an order-of-magnitude reduction for deeper networks). These results substantiate the central hypothesis that a lightweight, interpretable network can equal or surpass deeper, parameter-intensive models in class-balanced performance while offering a markedly lower computational footprint.

The implications for preventive conservation are immediate. MobileNet V2’s superior recall for the minority insect-damage class enhances the probability of early intervention, while its modest hardware requirements enable deployment on low-cost edge devices, thereby extending advanced diagnostic capacity to resource-constrained institutions. By publicly releasing the biodeterioration dataset and reproducibility package, the study also furnishes the community with a benchmark for future methodological innovation. Several limitations merit acknowledgement: the modest sample size, controlled imaging conditions, and frozen-feature protocol may constrain generalisability; interpretability analyses remain to be integrated; and emerging vision-transformer architectures were not evaluated. Addressing these constraints through multi-institutional dataset expansion, model fine-tuning, explainable-AI overlays, and transformer benchmarking constitutes a clear agenda for subsequent research.

In sum, the findings advance the field toward scalable, data-driven preventive conservation, demonstrating that lightweight deep-learning models can deliver curator-relevant sensitivity without prohibitive computational overhead. By bridging the gap between laboratory-scale AI research and day-to-day collection stewardship, this work lays the groundwork for next-generation early-warning systems capable of safeguarding paper heritage in both technologically advanced and resource-limited settings.

Supplementary Materials

The source code supporting the findings of this study is openly available at GitHub: https://github.com/irhamniali/paper_deterioration_project.


Corresponding author: Irhamni Ali, National Library of Indonesia, Central Jakarta, Indonesia, E-mail:

References

Bankole, O. M. 2010. “A Review of Biological Deterioration of Library Materials and Possible Control Strategies in the Tropics.” Library Review 59 (6): 414–29. https://doi.org/10.1108/00242531011053931.Search in Google Scholar

Barlindhaug, G. 2022. “Artificial Intelligence and the Preservation of Historic Documents.” Proceedings from the Document Academy 9 (2). https://doi.org/10.35492/docam/9/2/9.Search in Google Scholar

Belhi, A., and A. Bouras. 2018. “Towards a Multimodal Classification of Cultural Heritage.” In Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3. Hamad Bin Khalifa University Press (HBKU Press).10.5339/qfarc.2018.ICTPD1010Search in Google Scholar

Coleman, C. N., and M. A. Keller. 2022. “AI in the Research Library Environment.” In Artificial Intelligence in Libraries and Publishing, edited by R. Pickering, and M. Ismail, 26–31. ATG.10.3998/mpub.12669942.ch2Search in Google Scholar

Dhali, M. A., T. Reynolds, A. Z. Alizadeh, S. H. Nijdam, and L. Schomaker. 2004. “Pattern Recognition Techniques in Image-Based Material Classification of Ancient Manuscripts.” In Pattern Recognition Applications and Methods, 124–50. Springer.10.1007/978-3-031-54726-3_8Search in Google Scholar

di Bella, M., D. Randazzo, E. Carlo, G. di Barresi, and F. Palla. 2015. “Conservation Science in Cultural Heritage.” Conservation Science in Cultural Heritage 15 (2): 85–94.Search in Google Scholar

Gadd, G. M., M. Fomina, and F. Pinzari. 2024. “Fungal Biodeterioration and Preservation of Cultural Heritage, Artwork, and Historical Artifacts: Extremophily and Adaptation.” Microbiology and Molecular Biology Reviews 88 (1). https://doi.org/10.1128/mmbr.00200-22.Search in Google Scholar

Ghaith, K., and J. Hutson. 2024. “A Qualitative Study on the Integration of Artificial Intelligence in Cultural Heritage Conservation.” Metaverse 5 (2): 2654. https://doi.org/10.54517/m.v5i2.2654.Search in Google Scholar

Howard, A. G., M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al.. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.”. arXiv.1704.04861.Search in Google Scholar

Kalavathi Devi, T., E. B. Priyanka, and P. Sakthivel. 2023. “Paper Quality Enhancement and Model Prediction Using Machine Learning Techniques.” Results in Engineering 17: 100950. https://doi.org/10.1016/j.rineng.2023.100950.Search in Google Scholar

Kohavi, R. 1995. “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.” In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Vol. 1071. International Joint Conference on Artificial Intelligence.Search in Google Scholar

Liu, C., S. Ben, C. Liu, X. Li, Q. Meng, Y. Hao, et al.. 2024. “Web-Based Diagnostic Platform for Microorganism-Induced Deterioration on Paper-Based Cultural Relics with Iterative Training from Human Feedback.” Heritage Science 12 (1): 148. https://doi.org/10.1186/s40494-024-01267-5.Search in Google Scholar

Mishra, L. K. 2017. “Preservation and Conservation of Library Materials.” ACADEMICIA: An International Multidisciplinary Research Journal 7 (2): 23. https://doi.org/10.5958/2249-7137.2017.00011.8.Search in Google Scholar

Mosneagu, M. 2012. “The Preservation of Cultural Heritage Damaged by Anobiids (Insecta, Coleoptera, Anobiidae).” Annals-Series on Biological Sciences 1 (2): 32–65.Search in Google Scholar

Pilette, R. 2007. “Book Conservation within Library Preservation.” Collection Management 31 (1–2): 213–25. https://doi.org/10.1300/J105v31n01_16.Search in Google Scholar

Pinheiro, A. C., S. O. Sequeira, and M. F. Macedo. 2019. “Fungi in Archives, Libraries, and Museums: A Review on Paper Conservation and Human Health.” Critical Reviews in Microbiology 45 (5–6): 686–700. https://doi.org/10.1080/1040841X.2019.1690420.Search in Google Scholar

Prados-Peña, M. B., G. Pavlidis, and A. García-López. 2025. “New Technologies for the Conservation and Preservation of Cultural Heritage through a Bibliometric Analysis.” Journal of Cultural Heritage Management and Sustainable Development 15 (3): 664–86. https://doi.org/10.1108/JCHMSD-07-2022-0124.Search in Google Scholar

Prechelt, L. 1998. “Early Stopping – but when?” In Neural Networks: Tricks of the Trade. Springer.10.1007/3-540-49430-8_3Search in Google Scholar

Querner, P. 2015. “Insect Pests and Integrated Pest Management in Museums, Libraries and Historic Buildings.” Insects 6 (2): 595–607. https://doi.org/10.3390/insects6020595.Search in Google Scholar

Querner, P., J. Beenk, and R. Linke. 2022. “The Analysis of Red Lead Endsheets in Rare Books from the Fung Ping Shan Library at the University of Hong Kong.” Heritage 5 (3): 2408–21. https://doi.org/10.3390/heritage5030125.Search in Google Scholar

Savoldelli, S., C. Cattò, F. Villa, M. Saracchi, F. Troiano, and P. Cortesi, et al.. 2021. “Biological Risk Assessment in the History and Historical Documentation Library of the University of Milan.” Science of the Total Environment 790: 148204, https://doi.org/10.1016/j.scitotenv.2021.148204.Search in Google Scholar

Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2017. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” In 2017 IEEE International Conference on Computer Vision (ICCV), 618–26. IEEE.10.1109/ICCV.2017.74Search in Google Scholar

Senthilkumar, N., and R. Sumathi. 2023. “Machine Learning Approaches for Pest and Insect Management in Forest Scenario: An Outlook.” Uttar Pradesh Journal of Zoology 44 (23): 312–6. https://doi.org/10.56557/upjoz/2023/v44i233793.Search in Google Scholar

Shorten, C., and T. M. Khoshgoftaar. 2019. “A Survey on Image Data Augmentation for Deep Learning.” Journal of Big Data 6 (1): 60. https://doi.org/10.1186/s40537-019-0197-0.Search in Google Scholar

Silva, C. R., N. Anjos, J. C. dos Zanuncio, and J. E. Serrão. 2013. “Damage to Books Caused by Tricorynus Herbarius (Gorham) (Coleoptera: Anobiidae).” Coleopterists Bulletin 67 (2): 175–8. https://doi.org/10.1649/0010-065X-67.2.175.Search in Google Scholar

Sokolova, M., and G. Lapalme. 2009. “A Systematic Analysis of Performance Measures for Classification Tasks.” Information Processing & Management 45 (4): 427–37. https://doi.org/10.1016/j.ipm.2009.03.002.Search in Google Scholar

Sterflinger, K., and F. Pinzari. 2012. “The Revenge of Time: Fungal Deterioration of Cultural Heritage with Particular Reference to Books, Paper and Parchment.” Environmental Microbiology 14 (3): 559–66, https://doi.org/10.1111/j.1462-2920.2011.02584.x.Search in Google Scholar

Tan, M., and Q. V. Le. 2019. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” In Proceedings of the 36th International Conference on Machine Learning, 6105–14. PMLR.Search in Google Scholar

Teel, Z. A. 2024. “Artificial Intelligence’s Role in Digitally Preserving Historic Archives.” Preservation, Digital Technology & Culture 53 (1): 29–33. https://doi.org/10.1515/pdtc-2023-0050.Search in Google Scholar

UNESCO. 2015. Policy Document on World Heritage and Sustainable Development Policy Document for the Integration of a Sustainable Development Perspective into the Processes of the World Heritage Convention. Paris: UNESCO.Search in Google Scholar

Yosinski, J., J. Clune, Y. Bengio, and H. Lipson. 2014. “How Transferable Are Features in Deep Neural Networks?” In Proceedings of the 28th International Conference on Neural Information Processing Systems – Volume 2, 3320–8. MIT Press.Search in Google Scholar

Zhang, L., and S. Shao. 2022. “Image-Based Machine Learning for Materials Science.” Journal of Applied Physics 132 (10). https://doi.org/10.1063/5.0087381.Search in Google Scholar

Received: 2025-06-14
Accepted: 2025-07-25
Published Online: 2025-09-19

© 2025 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 20.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/pdtc-2025-0041/html
Scroll to top button