Legal Translation: An In-Depth Review of ChatGPT and Neural Network Translation (NMT) System Performance

AI Knowledge Base8mos agoupdate AI Sharing Circle

17.7K 00

In the ever-changing wave of translation technology, theChatGPT (Chat Generative Pre-trained Transformer) has undoubtedly attracted global attention. As an advanced macrolanguage model (Large Language Models (LLM)), ChatGPT has demonstrated impressive natural language processing capabilities, and in some translation tasks, its performance is even comparable to that of professional translation tools. However, in the legal translation field, which is known for its high degree of rigor and professionalism, can ChatGPT really shake the current mainstream translation tools? neural machine translation (Neural Machine Translation, NMT) system status?

In this paper, we will take a closer look at a recent study that compares the performance of ChatGPT-4 with the four major NMT systems in terms of Translation of English-Chinese and Chinese-English legal texts The study not only reveals the performance differences between the two in different translation directions, but also analyzes in detail their performance in different translation directions. The study not only reveals the performance differences between the two in different translation directions, but also analyzes in detail their performance in the Translation of terminology, grammatical structure and stylistic norms Typical errors made in such areas as.

Background of the Study: Development of Machine Translation Technology and Challenges of Legal Translation

In recent years, artificial intelligence translation technology has made rapid development, in which neural machine translation technology is particularly prominent. Numerous scholars have devoted themselves to the research and optimization of NMT, striving to further improve it through technological innovation machine translation In Feng and Zhang's (2022) study, it is pointed out that NMT technology has entered the stage of large-scale practical application, especially in the field of English-Chinese translation, the translation accuracy of ordinary text has exceeded 90%, which can fully satisfy the translation needs of daily scenarios, such as news reports, product descriptions, traffic information, etc. Li's (2021) study also observes that five types of neural network-based techniques can be used for translation, which can be used in a wide range of applications. Li's (2021) study also observes that five types of neural network-based translation techniques can be used for translation. Online machine translation The Online Machine Translation (OMT) system has reached an acceptable level of translation quality, but there is still room for improvement in the pursuit of excellence.

Meanwhile.macrolanguage model The potential in the field of translation is also gradually emerging, with some studies showing that their performance in certain translation tasks is already on par with, or even better than, some professional translation platforms on the market. For example, Yang's (2023) study found that ChatGPT did not show significant advantages over other machine translation systems and human translators in translating Vietnamese legal texts. However, it is worth noting that ChatGPT has made significant progress in the areas of natural language processing, problem understanding, and user interaction, and even in terms of syntactic complexity, ChatGPT's translation results have been comparable to those of human translators and DeepL Translation Similar.

However, most of the above studies have used generalized corpora and the direction of translation covers multiple languages, and few studies have focused on ChatGPT The specific performance in the field of English-Chinese legal translation, not to mention the lack of an in-depth comparison of the differences between the ChatGPT and NMT systems in terms of legal translation quality.

In the context of increasing globalization, the demand for English-Chinese legal translations continues to grow.As the most advanced translation technologies, a comparative analysis of the advantages and disadvantages of ChatGPT and NMT can not only provide useful references for the improvement of the translation system, but also help legal translation practitioners to better understand the boundaries of the capabilities of these technologies, so as to more wisely choose and use the translation tools.

The purpose of this study is to systematically evaluate the effectiveness of ChatGPT-4 in the field of legal translation by comparing its performance with that of four mainstream NMT systems (Youdao Translator, Baidu Translator, Google Translator, and DeepL Translator) in translating English-Chinese and Chinese-English legal texts. The core issues of the study include:

Which performs better in English-Chinese and Chinese-English legal text translation, ChatGPT or NMT system?
Under the same evaluation criteria, which translation direction performs better in English-Chinese translation and Chinese-English translation between ChatGPT and NMT systems?
What are the differences between ChatGPT and NMT systems in terms of the typical types of errors each produces in the translation of legal texts?

Study design: a rigorous rubric

In order to ensure the validity and reliability of the results of the study, the source text (source texts, ST) were selected in strict accordance with the following principles:

comprehensiveness: The texts selected cover a wide range of legal subfields, including civil, criminal, commercial and administrative law, with a view to making the findings broadly applicable and representative.
timeliness: Only legal texts currently in force have been selected in order to truly reflect the actual needs and challenges of current legal translations.
Diversity: The legal texts selected varied in structure, difficulty, and context in order to comprehensively assess the quality of NMT's and ChatGPT's translations of different types of legal texts.
authenticity: Selected laws and regulations are from publicly available sources to facilitate peer review and to verify the objectivity of the findings.
referentiality: Selected texts are referenced to official or authoritative translations in order to automatically assess the quality of NMT and ChatGPT translations.

Based on the above principles, the researchers selected 15 Chinese texts from 14 different Chinese laws as the source texts for the Chinese-English translations, and the length of each text was controlled to be 500 to 550 characters. To ensure the accuracy and authority of the translation evaluation, the official English translations provided by the Chinese Legal Information Database were used as the source texts. target text The reference translations of the Hong Kong law texts (target texts, TT) were used as the English-Chinese translations. Similarly, for comparison with the Chinese-English translations, 15 corresponding English legal texts from the electronic version of Hong Kong Laws, also of 500 to 550 words in length, were used as source texts for the English-Chinese translations. The official Chinese versions of these English texts (also from the electronic version of Hong Kong Laws) were used as reference translations of the target texts.

In terms of research methodology, ChatGPT-4 as well as the current mainstream NMT systems were selected in this study and used Bilingual assessment substitutes (Bilingual Evaluation Understudy. BLEUBLEU is an internationally recognized index for evaluating machine translation, and the higher the score, the better the translation quality. The research team utilizes the translation evaluation tool provided by the Trial Translation Platform to calculate the BLEU score, so as to quantitatively evaluate the translation quality of each system.

The specific steps of the study are as follows: first, 30 source texts were imported into NMT systems such as Youdao Translate, Baidu Translate, Google Translate and DeepL Translate for translation, while ChatGPT-4 was used for translation. Then, the target texts generated by NMT system and ChatGPT-4 are copied into Word documents. Then, the BLEU score of the target text was calculated using the "Trial Translator - Translation Evaluation Tool". Finally, the BLEU values of the target text were statistically analyzed using SPSS 27 statistical software.

Results: quantitative assessment and statistical analysis

Chinese-English Translation Quality Comparison

In Chinese-English translation.ChatGPT had the lowest mean BLEU score and the highest standard deviationThis shows that the quality of Chinese-English legal translations is not only lower than that of the NMT system, but also less stable than the NMT system.
Youtube translation achieved the highest average BLEU scores.Google Translate Right behind.DeepL Translation cap (a poem) Baidu's translation The scores are closer.
The ANOVA results showed that the BLEU scores among the systems The difference is not significant (p = 0.119).
However, multiple comparison tests further revealed thatSignificant Differences Between ChatGPT and Yodo TranslationIn addition, there is a significant difference between Baidu Translation and Youdao Translation within the NMT system.
Overall, the quality of ChatGPT in Chinese-English legal translation is slightly lower than that of the NMT system, but the difference between the two does not reach a significant level (p = 0.258).

Comparison of the quality of English-Chinese translations

In English-Chinese translation.ChatGPT continues to have the lowest average BLEU score, while Yodo Translator again has the highest average score!The DeepL translator follows Arata Translator, with Baidu Translator and Google Translator scoring relatively close behind.
The absolute values of kurtosis and skewness of the data for each system score were tested to be greater than 1.96, indicating that the data not normally distributedThe
Therefore, the study used Kruskal-Wallis nonparametric test and the results showed that there were BLEU scores between the five systems significant difference (p < 0.001).
The two-by-two comparative analysis further revealed that the differences between ChatGPT and the remaining four NMT systems all reached the significant level, while the differences between the four NMT systems and each other were insignificantThe
Taken together.The quality of the NMT system was significantly higher than that of ChatGPT in translating English-Chinese legal texts.The

Overall Comparison of English-Chinese and Chinese-English Translation Quality

The results of the independent samples t-test show that there is a significant difference (p < 0.001) in the translation quality of both the ChatGPT and NMT systems in both the English-Chinese and Chinese-English translation directions.
It is worth noting thatBLEU scores for Chinese-English translations were significantly higher than those for English-Chinese translations.This shows that both ChatGPT and NMT systems perform better in the Chinese-English legal translation task.

Discussion: Error Type Analysis and System Strengths and Weaknesses

In order to gain a deeper understanding of the performance of the ChatGPT and NMT systems in legal text translation, this study further employs the case study method to scrutinize the types of errors they make in legal text translation. The study categorized the major errors into the following three main groups: terminology translation errors, grammatical and syntactic structure errors, and style and formatting errors.

Chinese-English Translation Error Analysis

terminology: In terms of legal terminology translation, the ChatGPT and NMT systems show similar levels of accuracy, and it is difficult to distinguish between the best and the worst. For example, terms such as "fixed-term imprisonment" and "life imprisonment" can be accurately translated by both systems. However, for the translation of "criminal detention", there are discrepancies between some systems and the reference translation "limited incarceration", e.g., DeepL translates For example, translates "control" as "control", which is slightly less precise.
Grammar and syntactic structure: Each system also has its own strengths and weaknesses in terms of grammar and syntactic structure. For example, when translating "more than ten years' imprisonment", Google Translate's translation contains obvious logical errors and contradictions. When translating the complex phrase "causing death or inflicting serious injuries by particularly cruel means resulting in serious disabilities", ChatGPT's translation is relatively concise and clear, while the translations of some of the NMT systems are potentially ambiguous.
Style and format: In terms of style and format, neither the ChatGPT nor the NMT systems showed any obvious formatting errors, and the structure of the translation results was consistent with the original text, which basically conformed to the typical formatting requirements for legal documents. However, some of the translations of the NMT system are slightly deficient in style, for example, DeepL translates "intentionally inflicts bodily harm" as "intentionally inflicts bodily harm", which is slightly raw. harm", which is slightly stiff, and Baidu Translate's use of "those who..." is also relatively uncommon in legal English.

English-Chinese Translation Error Analysis

terminology: In the English-Chinese translation, ChatGPT is slightly less accurate in its grasp of legal terminology. For example, ChatGPT translates "with intent to murder" as "with murder", which is too simple and fails to fully reflect the legal intent implied in the original text. Another example is that ChatGPT translates "be guilty of an offense triable upon indictment" as "commit an offense that can be prosecuted," ignoring the fact that "indictment", which ignores the critical legal procedural step. In contrast, the NMT system is able to provide more accurate translation results in terms of Chinese and English legal terminology.
Grammar and syntactic structureThe NMT system is better than ChatGPT in terms of grammatical accuracy and standardization of sentence structure. Take DeepL as an example, it translates "shall be guilty of an offence triable upon indictment, and shall be liable to imprisonment for life" as "shall be guilty of an offence triable upon indictment, and shall be liable to imprisonment for life". Take as an example, it translates "shall be guilty of an offence triable upon indictment, and shall be liable to imprisonment for life" as "shall be guilty of an offence triable upon indictment, and shall be liable to imprisonment for life", which has a clear and rigorous sentence structure and is in line with the expression habits of legal texts.
Style and format: When translating common amendment clauses in legal texts, the NMT system is more standardized and closer to the expression habits of Chinese legal texts.

All in all, in the English-Chinese legal translation task, the NMT system not only outperforms in terminology translation accuracy, but also shows better performance in terms of grammatical structure, direct translation accuracy and formal expression.

Link to paper:https://tpls.academypublication.com/index.php/tpls/article/view/8692