Exploring the Capabilities of ChatGPT in Ancient Chinese Translation and Person Name Recognition

Shijing Si; Siqing Zhou; Yugui Zhang

doi:10.1515/csh-2024-0017

Article Open Access

Exploring the Capabilities of ChatGPT in Ancient Chinese Translation and Person Name Recognition

Shijing Si , Siqing Zhou and Yugui Zhang

Published/Copyright: October 17, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Corpus-based Studies across Humanities Volume 2 Issue 2

Abstract

This study examines the potential of ChatGPT, notable for its skill in standard modern languages, in the context of ancient Chinese. We focus on two fundamental tasks: translating ancient Chinese into modern Chinese and identifying ancient Chinese names. We conduct experiments on Shi Shuo Xin Yu, which is a collection of anecdotes, short conversations, and pithy observations on personalities who lived in China between about 150 and 420 A.D. We also include other methods, like Baidu Yiyan (ERNIE-Bot) for ancient-to-modern translation, and SIKU-RoBERTa for ancient Chinese name recognition. The results reveal that ChatGPT’s performance in ancient Chinese remains in need of enhancement, but it excels in the task of ancient-to-modern translation with the provision of a few context sentences. For the ancient Chinese name recognition, ChatGPT can outperform SIKU-RoBERTa model in terms of precision, but suffers from low recall.

Keywords: large language model; ChatGPT; machine translation; ancient Chinese; person name recognition

1 Introduction

Ancient languages serve as repositories for humanity’s historical and cultural heritage. As a treasure and important heritage of ancient Chinese culture (Lee and Wong 2012; Si et al. 2019), ancient Chinese has important cultural significance and historical value in the study of natural language processing.

Recently there have been considerable advancements made in the analysis and interpretation of ancient Chinese through the use of deep learning models (Anderson et al. 2023; Cheng et al. 2020; Tang, Lin, and Li 2022; Tian et al. 2021; Wang et al. 2023a). However, the understanding of ancient Chinese is a very challenging task in natural language processing due to its complex grammatical structures, cultural nuances, and polysemy of the language (Guo et al. 2023; Jin, Zhao, and Liu 2023; Yu and Huangfu 2019).

Deep learning methods have achieved significant progress in many natural language processing (NLP) tasks (Brown et al. 2020; Devlin et al. 2018; Si et al. 2020). Notably, large language models like ChatGPT have been shown powerful generation and understanding capabilities across many languages (Lai et al. 2023; Liu et al. 2023; Sundararaman et al. 2020), and has attracted wide attention and application around the world since its release (Fang et al. 2023; Wang and Demszky 2023; Wu et al. 2023a)

However, there are few studies on the use of ChatGPT for ancient Chinese processing. Jin, Zhao, and Liu (2023) evaluates the performance of a few machine translation methods including ChatGPT. Lan (2023) systematically examined ChatGPT at six cognitive levels on the Book of changes (易经, Yi Jing in Chinese). Following these works, we study the capability of ChatGPT on ancient-to-modern Chinese translation and people name recognition (Zhang et al. 2021) in ancient Chinese text. We leveraged the ancient Chinese book, called Shi Shuo Xin Yu(世说新语 in Chinese characters), to assess ChatGPT’s ability in understanding ancient Chinese through two tasks: the task of translating ancient Chinese into modern Chinese and the task of recognizing people’s names in ancient Chinese.

In this paper, we evaluate the capability of ChatGPT on an ancient Chinese book, Shi Shuo Xin Yu, which is largely ignored by previous research. Also we study the performance of ChatGPT on ancient-to-modern translation by varying the input length for each individual query. Additionally, the personal name recognition is rarely explored in the ancient Chinese processing. To summarize, our contributions are shown as follows:

We evaluate the ancient-to-modern Chinese translation of ChatGPT on Shi Shuo Xin Yu, and observe that the translation quality depends on the context length. From our experiments, ChatGPT performs the best on ancient-to-modern translation when a few consecutive sentences are fed into the prompt instruction.
We also investigate the capability of ChatGPT on ancient person name recognition, finding that ChatGPT outperforms the commonly used method in Jieba, and its performance improves as the number of demonstrations increases. ChatGPT presents high precision on ancient Chinese name recognition, but suffers from low recall.

The article is organized as follows. Section 2 presents the related works on ChatGPT and ancient Chinese processing. Section 3 demonstrates the experimental setup and implementation code. Section 4 exhibits the experimental results and analysis. Section 5 shows the conclusion and discussion.

2 Related Works

ChatGPT and other large language models (LLMs) have indeed garnered significant attention due to their impressive capabilities Ciesla (2024). However, these large models are not perfect and still have limitations, including producing inconsistent or nonsensical responses, or writing information that seems factual but is actually completely made up. Therefore, there are many studies that explore and evaluate the performance of ChatGPT on many tasks and examinations. We will illustrate related works in two subsections.

2.1 Assessing the Capabilities of ChatGPT

Many research explore the capabilities of ChatGPT across many fields. Ogundare and Araya (2023) utilizes spontaneous quality (SQ) scores to compare the performance of ChatGPT on many natural language processing (NLP) tasks such as machine translation, machine summarization, question answering, and language generation, compared with other mainstream algorithms. George and George (2023) explores how ChatGPT can enhance e-commerce through chat and other sectors such as education, entertainment, finance, health, news, and productivity, and how ChatGPT can be used to create more personalized content for users and help businesses improve the efficiency and effectiveness of customer service. Rezayi et al. (2023) explores a new field of agricultural natural language processing by studying the effectiveness of pre-training transformer-based language models using food-related text corpora. Wang et al. (2023b) investigates ChatGPT’s ability to infer dynamic network structures from temporal text data, especially financial news. Khondaker et al. (2023) conducts a large-scale evaluation of ChatGPT on a wide range of Arabic NLP tasks. Wu et al. (2023b) evaluates ChatGPT’s ability in knowledge in the medical field on the Chinese National Medical Licensing Examination (CNMLE).

Wang et al. (2023c), taking document-level machine translation as an experimental platform, conducts an in-depth evaluation of ChatGPT’s discourse modeling ability in three aspects: the role of discourse awareness cue, the comparison of translation models, and the analysis of discourse modeling ability. Zhang, Ouni, and Eger (2024) explores ChatGPT’s ability to summarize and evaluate on cross-lingual cross-time summarization (CLCTS) tasks. Ronanki, Cabrero-Daniel, and Berger (2023) explores the application of ChatGPT for user story quality assessment and compares its performance to existing benchmarks. Sun et al. (2023) proposes a set of generic modules, attempting to break through the limitations of ChatGPT on various NLP tasks, including question answering, common sense reasoning, natural language reasoning, etc. Antoun et al. (2023) proposes a method for developing and evaluating ChatGPT detectors for French text, focusing on their robustness against extra-territorially based data and common attack schemes. Wang and Demszky (2023) explores whether ChatGPT can be a cost-effective supplement to expert feedback by acting as an automated teacher or coach.

2.2 Application of Large Language Models in Ancient Chinese

However, there is relatively little research on ChatGPT’s understanding of ancient Chinese. The research on the understanding of ancient Chinese by language model is mainly based on BERT model. Wang et al. (2022) and Chang et al. (2023) build a pre-trained language model of SikuBERT and SikuGPT for ancient text intelligent processing tasks based on BERT deep language model framework using the verified high-quality full-text corpus of Si Ku Quan Shu as a training set. Some researchers built GuwenBERT based on RoBERTa model trained on a large number of ancient literature corpus. Tsinghua University has constructed BERT CCPoem,^[1] a pre-training model based on BERT specifically for ancient Chinese poetry, which is trained on the complete collection of ancient Chinese poetry CCPC (Li et al. 2021) and can be used for intelligent poetry retrieval, recommendation and sentiment analysis. Wang et al. (2023d) introduces two language models, GujiBERT and GujiGPT, which are foundational models specifically designed for intelligent information processing of ancient texts. Chang et al. (2022) proposes SIKU-RoBERTa, pre-trained on the high-quality full-text corpus of SiKuQuanShu, and studies the part-of-speech tagging on ancient Chinese texts.

3 Experiments

This section details the experimental setup, including the datasets, and python code for calling ChatGPT API, and results.

3.1 Data Preparation

The main data comes from Github open source project NiuTrans/classic-Modern,^[2] which is a very comprehensive ancient Chinese-modern parallel corpus, including a large number of classical ancient works. This dataset divides and displays each ancient book by chapters, and the text is stored in the txt documents under each chapter. These bilingual data are provided in sentence level, and three data formats are provided: original text, translation and bilingual.

In this study, the team mainly utilized the bilingual data of the book, Shi Shuo Xin Yu, including 36 chapters, to satisfy our needs in the experiments. And most articles in this book are about biographies of famous people in ancient China, which can be used for ancient-to-modern Chinese translation and person name recognition.

Due to the input length limitation of ChatGPT, we split the whole book into 3,923 sentences and each time we feed the model a sentence. For the ancient-to-modern Chinese translation task, we include all sentences, but for the ancient Chinese name recognition task, we randomly select a total of 300 sentences because of time constraint. Also we manually label Chinese names for the 300 sentences. The data is shown in Table 1 as follows.

Table 1:

A few examples used in the experiments. The sentences are extracted from Shi Shuo Xin Yu, and Chinese names are manually labeled by the authors.

Sentence	Names
孙秀既恨石崇不与绿珠, 又憾潘岳昔遇之不以礼。	孙秀、石崇、潘岳
后秀为中书令, 岳省内见之, 因唤曰: 孙令, 忆畴昔周旋不?	孙秀、潘岳
岳于是始知必不免。	潘岳
冀神理绵绵常, 不与气运俱尽耳!	No names

3.2 Access to ChatGPT

ChatGPT, a pre-trained large language model, simply takes a text input following a prompt, then it provides a response to the question. ChatGPT supports HTTP requests in multiple languages. In this experiment, we access ChatGPT via the module openai in Python. There are many pre-trained models available in this module, like gpt-4o, gpt-4, gpt-3.5-turbo etc. In this experiments, we choose gpt-3.5-turbo and gpt-4o, because they are the most popular variants.

3.3 Translation of Ancient Chinese by ChatGPT

Once having access to ChatGPT in Python, we can simply design a sensible prompt to perform the ancient-to-modern Chinese translation task. We tried a variety of prompt input methods, such as “Please translate the following ancient Chinese:”, “Translate the ancient Chinese after the colon into modern Chinese:”, “Translate the ancient Chinese into modern Chinese” and so on. We found that it is necessary to include “translation” and “modern Chinese” in order to perform the ancient-to-modern Chinese translation robustly, otherwise the output of ChatGPT may have redundant and wrong information. In addition, we also found that even if we input only ancient Chinese sentences without any instruction, ChatGPT can sometimes return the correct translation results in modern Chinese.

After many trials, we choose a straightforward “将文言文翻译成现代汉语(translate ancient Chinese into modern Chinese)” as the prompt instruction for the translation task. This prompt almost never produces error information, and no superfluous information. We extract the text content that contains only the modern Chinese translation of the ancient Chinese. Due to the input length constraints set by OpenAI, it is unfeasible to command ChatGPT to translate the entire Shi Shuo Xin Yu book directly. Consequently, we use a systematic FOR loop to sequentially translate preloaded segments of the book.

To proportionately evaluate its performance, we adjust the size of the input – the amount of text that needs to be translated. We start off with a single sentence to observe how well the program can perform a basic translation task. Then, they slowly increase the input length to three, five, and eight sentences. This gradual scaling may challenge the program and test its limits, especially in maintaining cohesiveness and continuity in the translation of longer passages.

For the ancient-to-modern translation task, we evaluate the performance of ChatGPT with BLEU (Papineni et al. 2002) and BERT-Score (Zhang et al. 2019). BLEU, is the de facto standard for machine translation performance evaluation because it is easy to calculate regardless of the languages involved. BLEU compares n-grams of the candidate translation with n-grams of the reference translation and counts the number of matches; the more the matches, the better the candidate translation. Unlike BLEU that majorly makes use of token or phrasal level syntactic overlaps between hypothesis and reference text pieces, BERTScore on the other hand captures the semantic aspect by using the contextualized embeddings generated by the BERT model.

3.4 Person Name Recognition from Ancient Chinese Literature

Besides translation, we assess the competence of ChatGPT in recognizing names referenced within the ancient Chinese text. It is noteworthy that the effectiveness of ChatGPT in executing this task is contingent upon the prompt instruction. We utilize a trial-and-error approach to ascertain a suitable prompt instruction, “已知文言文的原文如下, 找出该文言文中的人名”, which means that “Given the following ancient Chinese text, find the people names in it”.

The Jieba module in Python is widely employed for word segmentation and rudimentary named entity recognition in Chinese texts. Hence, for our study on the recognition of people names in ancient Chinese text, we incorporate Jieba as a baseline method for comparison. SIKU-RoBERTa is a pre-training language model based on the high-quality full-text corpus of Si Ku Quan Shu, and a portion of Zuo Zhuan that has been word segmented and tagged is used as training sets to build a deep network model based on BERT for part-of-speech (POS) tagging experiments. We also include SIKU-RoBERTa as a baseline method for ancient Chinese names recognition. For the this task, we evaluate the performance of ChatGPT with commonly used metrics, like precision, recall and F1 score (Jiang, Banchs, and Li 2016).

4 Results and Analysis

In this section, we exhibit the experimental outcomes and deliver an in-depth analysis.

4.1 Assessing the Efficacy of ChatGPT in the Translation of Ancient Chinese Texts

As mentioned in the experimental setup Section 3, we deploy ChatGPT to convert sentences from the ancient Chinese text, Shi Shuo Xin Yu, into modern Chinese in configurations of 1, 3, 5, and 8 sentences per query. These translations are then evaluated and contrasted using quantitative metrics such as BLEU, and BERT-Score. Table 2 illustrates the BLEU scores resulting from the translation of ancient to modern Chinese from ChatGPT and Baidu’s ERNIE-Bot under four different configurations: one sentence (1-sent.), three sentences (3-sent.), five sentences (5-sent.), and eight sentences (8-sent.) per individual query. In this table, we conduct an evaluation of the BLEU scores across three different levels of granularity: 1-g, 2-g, and 3-g. From Table 2, it is observed that ChatGPT is on par with ERNIE-Bot on ancient-to-modern Chinese translation in terms of 2-g BLEU score, where both GPT-3.5 and ERNIE-Bot achieve 0.20 when fed with three sequential ancient Chinese sentences (3-sent.). This suggests that contextual information enhances the translation quality from ancient to modern Chinese. Interestingly, the quality of translation does not exhibit improvement when the context increases to eight sentences, and a possible reason is the increased complexity of long piece of ancient Chinese text. Another important finding is that the BLEU scores are still very low, all models reaching 0 at 3-g BLEU in all four settings, which means that both ChatGPT variants and ERNIE-Bot still perform very poorly in ancient-to-modern Chinese translation.

Table 2:

The 1-g, 2-g, and 3-g BLEU scores of translated modern Chinese texts from GPT-3.5, GPT-4o and ERNIE-Bot under four settings: one sentence (1-sent.), three sentences (3-sent.), five sentences (5-sent.), and eight sentences (8-sent.) per query. The reported values are multiplied by 100. The bold values indicate the best performance across four four xontextual lengths.

	GPT-3.5			GPT-4o			ERNIE-Bot
	1-g	2-g	3-g	1-g	2-g	3-g	1-g	2-g	3-g
1-sent.	26.86	0.09	0.00	30.19	0.12	0.00	31.00	0.11	0.00
3-sent.	22.60	0.20	0.00	23.58	0.17	0.00	27.00	0.20	0.00
5-sent.	20.11	0.19	0.00	21.75	0.17	0.01	23.57	0.12	0.00
8-sent.	18.22	0.14	0.00	18.86	0.12	0.00	20.20	0.14	0.01

Table 3 shares the same structure as Table 2, but displays the BERT-Scores, precision, recall, and F1 scores associated with the translation of ancient to modern Chinese by two ChatGPT variants and ERNIE-Bot of Baidu. From this table, both ChatGPT variants perform the best when translating three consecutive ancient Chinese sentences (3-sent.), with GPT-3.5 achieving 78.5 F1 score and GPT-4o achieving 81.65 F1 score. But Baidu’s ERNIE-Bot surpasses GPT-4o significantly, reaching 87.69 F1 score when translating five consecutive ancient Chinese sentences (5-sent.). This is understandable as ERNIE-Bot is released by the Chinese company Baidu, so it might be pre-trained by larger ancient Chinese corpus than GPT models.

Table 3:

The BERT-Scores, precision, recall and F1 score, of ancient-to-modern Chinese translation from GPT-3.5, GPT-4o and ERNIE-Bot under four settings: one sentence (1-sent.), three sentences (3-sent.), five sentences (5-sent.), and eight sentences (8-sent.) per query. The reported values are multiplied by 100. The bold values indicate the best performance across four four xontextual lengths.

	GPT-3.5			GPT-4o			ERNIE-Bot
	Prec.	Rec.	F1	Prec.	Rec.	F1	Prec.	Rec.	F1
1-sent.	77.28	75.85	76.51	81.23	79.20	80.15	83.86	83.36	83.56
3-sent.	78.49	77.31	77.87	82.52	80.84	81.65	87.33	87.12	87.20
5-sent.	78.04	76.96	77.48	81.90	80.11	80.97	87.56	87.64	87.59
8-sent.	77.97	76.00	76.97	81.34	79.21	80.25	87.78	87.19	87.48

To visualize the effects of input sentence number on the translation quality of GPT-3.5, Figure 1 presents the line plots of BERT scores versus input sentence number in the prompt. From this figure, it can be observed that BERT scores (recall, precision and F1 score) peak at 3. This can be explained by the fact that ancient Chinese is complex and context is very important for understanding the pronouns and references. But long context may further confuse the model, so the performance is best at a moderate length of 3 sentences.

Figure 1:

The BERT-scores of ancient-to-modern Chinese translation when ChatGPT (GPT-3.5) is fed with 1, 3, 5, and 8 consecutive sentences.

Error Analysis: To gain deep insights, we select and analyze a specific sentence as a case study. The original ancient Chinese sentence is “孙秀既恨石崇不与绿珠, 又憾潘岳昔遇之不以礼。”, which means that Sun Xiu both resented Shi Chong for not giving him Zhu Lv (name of a woman), and was dissatisfied with Pan Yue for his past discourteous behavior towards him. The groundtruth modern Chinese and translated texts from four settings are exhibited in Table 4. To aid readers understand the translated Chinese, we also provide an English version for all methods. As shown in Table 4, the translated modern Chinese texts from GPT-3.5 is closest to the groundtruth when feeding three sentences (3-sent.). The wrong points are highlighted by the underlines.

Table 4:

The groundtruth (GT) and translated texts from GPT-3.5 under four settings: one sentence (1-sent.), three sentences (3-sent.), five sentences (5-sent.), and eight sentences (8-sent.) per query.

Method	Translation
GT	[CN]: 孙秀既怨恨石崇不肯送出绿珠, 又不满潘岳从前对自已不礼貌。
	[EN]: Sun Xiu both resented Shi Chong for not giving him Zhu Lv (name of a woman), and was dissatisfied with Pan Yue for his past discourteous behavior towards him.
1-sent.	[CN]: 孙秀既恨石崇不娶绿珠, 又遗憾潘岳以前对待他不够礼貌。
	[EN]: Sun Xiu both held resentment towards Shi Chong for not marrying Zhu Lv, and also lamented over Pan Yue’s past discourteous behavior towards him.
3-sent.	[CN]: 孙秀既恨石崇不给绿珠, 又懊悔潘岳以前的不礼貌之举。
	[EN]: Sun Xiu felt resentment towards Shi Chong for not giving the Zhu Lv and regret over Pan Yue’s past discourteous actions.
5-sent.	[CN]: 孙秀既因为石崇未能跟绿珠在一起而怨恨他, 又因为潘岳当年未以礼待他而不满。
	[EN]: Sun Xiu both resented Shi Chong for his inability to be with Zhu Lv and was dissatisfied due to Pan Yue’s lack of respect towards him in the past.
8-sent.	[CN]: 孙秀既憎恨石崇没有娶绿珠为妻, 又心怀憾恨潘岳昔日见面时没有礼遇自己。
	[EN]: Sun Xiu harbored resentment towards Shi Chong for not marrying Zhu Lv, and also felt regret over Pan Yue’s discourtesy towards him in their past encounter.

4.2 Assessing the Efficacy of Person Name Recognition in Ancient Chinese using ChatGPT

Person name recognition task can assess the capability of models on understanding ancient Chinese. For the person name recognition task, a total of 300 sentences are randomly selected, and a few examples are shown in Table 1.

Table 5 presents the performance of ChatGPT variants (GPT-3.5 and GPT-4o), SIKU-RoBERTa, and Jieba in recognizing people names from ancient Chinese texts in few-shot settings. From this table, on the ancient person name recognition task, ChatGPT variants significantly outperforms Jieba in zero-shot setting, with GPT-4o achieving 69.59 F1 score. But their performance is worse than SIKU-RoBERTa, which was trained on Zuo Zhuan for POS tagging. However, the precision of GPT-4o is at 94.93, mich higher than other models. The main problem of LLMs lie in the recall of ancient names.

Table 5:

Comparative performance of GPT-3.5, GPT-4o, SIKU-RoBERTa, and Jieba in recognizing people names from 300 sentences in Shi Shuo Xin Yu. The reported values are multiplied by 100. The bold values indicate the best performance across four four xontextual lengths.

	Precision	Recall	F1 score
Zero-shot performance
Jieba	44.63	52.94	48.43
SIKU-RoBERTa	80.33	87.04	83.55
GPT-3.5	72.66	55.11	62.68
GPT-4o	94.93	54.93	69.59
ERNIE-Bot	90.96	56.47	69.51
GPT-3.5 few-shot performance
One-shot	72.76	55.30	62.90
Five-shot	72.86	55.68	63.12
Ten-shot	73.80	56.82	64.21

For the few-shot setting, the performance of GPT-3.5 gradually improves as the number of demonstrations increases in the prompt. Figure 2 exhibits this increasing trend as well. As the number of instances increases from 0 to 10, the F1 score rises from 0.628 to 0.642.

Figure 2:

The performance of GPT-3.5 on person name recognition versus the number of examples in the prompt.

Error Analysis: To gain better understanding, we perform an error analysis of GPT-3.5 on person name recognition as shown in Table 6. From these cases, GPT-3.5 suffers from low recall rate in the first two examples. For the third example, GPT-3.5 recognized the wrong person name, and missed another name. For the last example, GPT-3.5 identified two more erroneous names.

Table 6:

A few erroneous examples in person name recognition. The sentences are extracted from Shi Shuo Xin Yu, and ground-truth names are manually labeled by the authors, the output names are provided by ChatGPT (GPT-3.5).

Sentence	Groundtruth	GPT-3.5 output
后秀为中书令, 岳省内见之, 因唤曰: 孙令, 忆畴昔周旋不?	秀、岳、孙令	孙令
潘后至, 石谓潘曰: 安仁, 卿亦复尔邪?	潘、石、安仁	潘
有一客道: 谯王丞致祸, 非大将军意, 正是平南所为耳。	谯王丞、平南	谯王
蓝田于会稽丁艰, 停山阴治丧。	蓝田	蓝田、会稽、山阴

4.3 Summary

From our experiments on ancient-to-modern Chinese translation, we can observe that ChatGPT performs well on ancient-to-modern Chinese translation in terms of semantic similarity, but suffering poor performance in terms of syntactic overlapping. On the person name recognition task, ChatGPT surpasses Jieba by a clear margin, showing good promise in extracting people name from ancient Chinese text.

During our experiments, we find that ChatGPT faces many challenges in both ancient-to-modern translation and person name recognition from ancient Chinese texts. Fully understanding ancient Chinese sentences rely heavily on the context of the sentence. Sometimes a sentence may omit the name of the person, or replace it with other names. Therefore, feeding multiple consecutive sentences in the prompt of ChatGPT enhances the performance as well. We leave it for future research.

5 Conclusion and Discussion

Through experiments on the capacity of ChatGPT on ancient-to-modern Chinese translation and person name recognition, we find that there are still much space for improvement. One possible explanation might be that ChatGPT is mainly pre-trained on English corpora with limited amount of ancient Chinese corpora. While the proficiency of ChatGPT in ancient Chinese remains somewhat restricted, we assert that engagements of deeper research and enhancements – coupled with an abundant corpus of ancient Chinese and specialized knowledge – can ameliorate its proficiency in comprehending and translating and understanding ancient Chinese text. This improvement can potentially expand the possibilities for studying ancient history and culture, along with fostering the preservation and progression of traditional cultural heritage.

Corresponding author: Yugui Zhang, School of Economics and Finance, Shanghai International Studies University Shanghai, Shanghai, China, E-mail: yuguizhang@126.com

Shijing Si and Siqing Zhou are co-first authors. This work was supported by the Fundamental Research Funds for the Central Universities, and the 2023 Special Project of Philosophy and Social Sciences Research funded by the Ministry of Education of China, titled “Research on the Global Significance of Chinese-Style Modernization” (Project Number: 23JD20028).

References

Anderson, A., S. Gordin, B. Li, Y. Liu, and M. C. Passarotti. 2023. “Proceedings of the Ancient Language Processing Workshop.” In Proceedings of the Ancient Language Processing Workshop.Search in Google Scholar

Antoun, W., V. Mouilleron, B. Sagot, and D. Seddah. 2023. “Towards a Robust Detection of Language Model Generated Text: Is Chatgpt that Easy to Detect?” arXiv preprint arXiv:2306.05871.Search in Google Scholar

Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, et al.. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–901.Search in Google Scholar

Chang, Y., P. Zhu, C. Wang, and C. Wang. 2022. “Automatic Word Segmentation and Part-Of-Speech Tagging of Ancient Chinese Based on BERT Model.” In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages. Marseille, France: European Language Resources Association [Online], edited by R. Sprugnoli, and M. Passarotti, 141–5. Also available at: https://aclanthology.org/2022.lt4hala-1.20.Search in Google Scholar

Chang, L., W. Dongbo, Z. Zhixiao, H. Die, W. Mengcheng, L. Litao, S. Si, et al.. 2023. “Sikugpt: A Generative Pre-trained Model for Intelligent Information Processing of Ancient Texts from the Perspective of Digital Humanities.” arXiv preprint arXiv:2304.07778.Search in Google Scholar

Cheng, N., B. Li, L. Xiao, C. Xu, S. Ge, X. Hao, and M. Feng. 2020. “Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese Based on Bilstm-Crf Model.” In Proceedings of LT4HALA 2020-1st Workshop on Language Technologies for Historical and Ancient Languages, 52–8.Search in Google Scholar

Ciesla, R. 2024. The Book of Chatbots – from ELIZA to ChatGPT. [Online]. Cham, Switzerland: Springer.10.1007/978-3-031-51004-5Search in Google Scholar

Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. “Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv preprint arXiv:1810.04805.Search in Google Scholar

Fang, T., S. Yang, K. Lan, D. F. Wong, J. Hu, L. S. Chao, and Y. Zhang. 2023. Is Chatgpt a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation.Search in Google Scholar

George, A. S., and A. H. George. 2023. “A Review of Chatgpt Ai’s Impact on Several Business Sectors.” Partners Universal International Innovation Journal 1 (1): 9–23.Search in Google Scholar

Guo, G., J. Yang, F. Lu, J. Qin, T. Tang, and W. X. Zhao. 2023. “Towards Effective Ancient Chinese Translation: Dataset, Model, and Evaluation.” In CCF International Conference on Natural Language Processing and Chinese Computing, 416–27. Springer.10.1007/978-3-031-44696-2_33Search in Google Scholar

Jiang, R., R. E. Banchs, and H. Li. 2016. “Evaluating and Combining Name Entity Recognition Systems.” In Proceedings of the Sixth Named Entity Workshop, 21–7.10.18653/v1/W16-2703Search in Google Scholar

Jin, K., D. Zhao, and W. Liu. 2023. “Morphological and Semantic Evaluation of Ancient Chinese Machine Translation.” In Proceedings of the Ancient Language Processing Workshop, 96–102.Search in Google Scholar

Khondaker, M. T. I., A. Waheed, E. M. B. Nagoudi, and M. Abdul-Mageed. 2023. Gptaraeval: A Comprehensive Evaluation of Chatgpt on Arabic nlp.10.18653/v1/2023.emnlp-main.16Search in Google Scholar

Lai, V. D., N. T. Ngo, A. P. B. Veyseh, H. Man, F. Dernoncourt, T. Bui, and T. H. Nguyen. 2023. Chatgpt Beyond English: towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning.10.18653/v1/2023.findings-emnlp.878Search in Google Scholar

Lan, P. 2023. “Chatgpt on I Ching at Six Levels.” International Journal of Multidisciplinary Research and Publications (IJMRAP) 5 (9): 173–83.Search in Google Scholar

Lee, J. S., and T.-S. Wong. 2012. “Glimpses of Ancient china from Classical Chinese Poems.” In Proceedings of COLING 2012: Posters, 621–32.Search in Google Scholar

Li, W., F. Qi, M. Sun, X. Yi, and J. Zhang. 2021. “CCPM: A Chinese Classical Poetry Matching Dataset.” CoRR abs/2106.01979. https://arxiv.org/abs/2106.01979.Search in Google Scholar

Liu, J., C. Liu, P. Zhou, R. Lv, K. Zhou, and Y. Zhang. 2023. Is Chatgpt a Good Recommender? A Preliminary Study.Search in Google Scholar

Ogundare, O., and G. Q. Araya. 2023. Comparative Analysis of Chatgpt and the Evolution of Language Models.10.22541/au.168062641.15097484/v1Search in Google Scholar

Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–8.10.3115/1073083.1073135Search in Google Scholar

Rezayi, S., Z. Liu, Z. Wu, C. Dhakal, B. Ge, H. Dai, G. Mai, et al.. 2023. Exploring New Frontiers in Agricultural nlp: Investigating the Potential of Large Language Models for Food Applications.10.1109/TBDATA.2024.3442542Search in Google Scholar

Ronanki, K., B. Cabrero-Daniel, and C. Berger. 2023. Chatgpt as a Tool for User Story Quality Evaluation: Trustworthy out of the Box?10.1007/978-3-031-48550-3_17Search in Google Scholar

Si, S., W. Zheng, L. Zhou, and M. Zhang. 2019. “Sentence Similarity Computation in Question Answering Robot.” Journal of Physics: Conference Series 1237 (2): 022093. https://doi.org/10.1088/1742-6596/1237/2/022093.Search in Google Scholar

Si, S., R. Wang, J. Wosik, H. Zhang, D. Dov, G. Wang, and L. Carin. 2020. “Students Need More Attention: Bert-Based Attention Model for Small Data with Application to Automatic Patient Message Triage.” In Machine Learning for Healthcare Conference, 436–56. PMLR.Search in Google Scholar

Sun, X., L. Dong, X. Li, Z. Wan, S. Wang, T. Zhang, J. Li, et al.. 2023. Pushing the Limits of Chatgpt on nlp Tasks.Search in Google Scholar

Sundararaman, D., S. Si, V. Subramanian, G. Wang, D. Hazarika, and L. Carin. 2020. “Methods for Numeracy-Preserving Word Embeddings.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4742–53.10.18653/v1/2020.emnlp-main.384Search in Google Scholar

Tang, B., B. Lin, and S. Li. 2022. “Simple Tagging System with Roberta for Ancient Chinese.” In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, 159–63.Search in Google Scholar

Tian, H., K. Yang, D. Liu, and J. Lv. 2021. “Anchibert: A Pre-trained Model for Ancient Chinese Language Understanding and Generation.” In 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.10.1109/IJCNN52387.2021.9534342Search in Google Scholar

Wang, R. E., and D. Demszky. 2023. Is Chatgpt a Good Teacher Coach? Measuring Zero-Shot Performance for Scoring and Providing Actionable Insights on Classroom Instruction.10.35542/osf.io/5vrbySearch in Google Scholar

Wang, D., C. Liu, Z. Zhu, J. Liu, H. Hu, S. Shen, and B. Li. 2022. “Sikubert and Sikuroberta: The Construction and Application of Pre-trained Models for Digital Hunanity Oriented Sikuquanshu.” Forum of Library 42 (6): 14.Search in Google Scholar

Wang, P., S. Zhang, Z. Li, and J. Hou. 2023a. “Enhancing Ancient Chinese Understanding with Derived Noisy Syntax Trees.” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 83–92.10.18653/v1/2023.acl-srw.15Search in Google Scholar

Wang, Z., S. Mao, W. Wu, Y. Xia, Y. Deng, and J. Tien. 2023b. Assessing Phrase Break of esl Speech with Pre-trained Language Models and Large Language Models.10.21437/Interspeech.2023-910Search in Google Scholar

Wang, L., C. Lyu, T. Ji, Z. Zhang, D. Yu, S. Shi, and Z. Tu. 2023c. Document-level Machine Translation with Large Language Models.10.18653/v1/2023.emnlp-main.1036Search in Google Scholar

Wang, D., C. Liu, Z. Zhao, S. Shen, L. Liu, B. Li, H. Hu, et al.. 2023d. “Gujibert and Gujigpt: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts.” arXiv preprint arXiv:2307.05354.Search in Google Scholar

Wu, T., S. He, J. Liu, S. Sun, K. Liu, Q.-L. Han, and Y. Tang. 2023a. “A Brief Overview of Chatgpt: The History, Status Quo and Potential Future Development.” IEEE/CAA Journal of Automatica Sinica 10 (5): 1122–36. https://doi.org/10.1109/jas.2023.123618.Search in Google Scholar

Wu, J., X. Wu, Z. Qiu, M. Li, Y. Zheng, and J. Yang. 2023b. Qualifying Chinese Medical Licensing Examination with Knowledge Enhanced Generative Pre-training Model.Search in Google Scholar

Yu, X., and W. Huangfu. 2019. “A Machine Learning Model for the Dating of Ancient Chinese Texts.” In 2019 International Conference on Asian Language Processing (IALP), 115–20. IEEE.10.1109/IALP48816.2019.9037653Search in Google Scholar

Zhang, T., V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. 2019. “Bertscore: Evaluating Text Generation with Bert.” In International Conference on Learning Representations.Search in Google Scholar

Zhang, R., J. Ouni, and S. Eger. 2024. “Cross-lingual Cross-Temporal Summarization.” Dataset, Models, Evaluation 50 (3): 1001–47.10.1162/coli_a_00519Search in Google Scholar

Zhang, H., H. Zhu, J. Ruan, and R. Ding. 2021. “People Name Recognition from Ancient Chinese Literature Using Distant Supervision and Deep Learning.” In 2021 2nd International Conference on Artificial Intelligence and Information Systems, 1–6.10.1145/3469213.3470270Search in Google Scholar

Received: 2024-06-28

Accepted: 2024-09-22

Published Online: 2024-10-17

Published in Print: 2024-11-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/csh-2024-0017

Keywords for this article

large language model; ChatGPT; machine translation; ancient Chinese; person name recognition

Creative Commons

BY 4.0