Modern Chinese characters
According to the latest data of Ethnologue,[1] "Mandarin Chinese (the modern standard Chinese, also called Putonghua) is the largest language in the world, if you count only native speakers. If you count both native and non-native speakers, English is the largest (with Chinese being the 2nd largest)." And Chinese is written in Chinese characters.
While traditional studies on Chinese characters[2][3] and the existing Wiki article "Chinese characters" pay substantial attention to ancient Chinese characters and the historical development of Chinese writing systems, the present article is focused on Chinese characters at their current status and applications.
Modern Chinese characters (Pinyin: xiàndài hànzì; Traditional Chinese: 現代漢字; Simplified Chinese: 现代汉字) are the Chinese characters used in the modern society, especially in the modern Chinese language.[4] The purpose of this article (and its branching articles) is to provide a comprehensive introduction to the attributes, properties and applications of modern Chinese characters, including their forms (字形), sounds (字音), meanings (字義, 字义), sets (字集), numbers (字量), frequencies (字頻, 字频), orders (字序), computer processing, and teaching and learning, etc. [5][6]
An overview of Chinese characters
After becoming a relatively mature and complete writing system, Chinese characters have gone through an uninterrupted development history of more than 3,000 years, with the stages of Oracle bone script (甲骨文, 甲骨文), Bronze script (金文), Seal script (篆書, 篆书), Clerical script (隸書, 隶书), Regular script (楷書, 楷书) and Modern Regular script,[7] as illustrated by the development of character 馬 (马, horse): [lower-alpha 1]
| Oracle | Bronze | Bigseal | Seal | Clerical | Regular | Simplified |
|---|---|---|---|---|---|---|
| File:馬-kaishu.svg | File:马-kaishu.svg |
Chinese characters were invented for writing Chinese language, and later have been employed in some other East Asian languages, where they have their own development. Now, there are Simplified Chinese characters (used in Chinese Mainland, Singapore, Malaisia, etc.), Traditional Chinese characters (used in Taiwan, Hong Kong, Macau, etc.), Kanji (in Japanese), Hanja (in Korean), and Chữ Nôm (in Vietnamese).[8]
Compared with the Latin letters used in English and many other languages, Chinese characters have many differences and characteristics, for examples: [9]
- There are tens of thousands of different characters,
- A character is in a two-dimensional block structure,
- A character may have dozens of strokes,
- A character is normally ideographic and monosyllabic,
- Texts written in Chinese characters are intelligible to readers of different dialects and different dynasties.
etc.
The sources of modern Chinese characters include: [10]
- Traditional Chinese characters, accounting for about 75%, e.g., 日(sun), 月 (moon), 金 (metal), 木 (wood), 水 (water), 火 (fire), 土 (earth);
- Newly coined characters, about 2.7%, e.g., 氨 (ammonia), 碘 (iodine), 乒乓 (ping pong, table tennis), 甭 (don't), 煲 (pot);
- Borrowed words, about 1.3%, re-using ancient characters, with pronunciations and meanings different from the ancient times, e.g., 她 (she), 旮 (used in 旮旯, corner);
- Simplified characters, about 20%, e.g., 汉语 (漢語, Chinese language), 学习 (學習, study).
Number and sets of modern Chinese characters
Due to the dynamic development of languages, there is no definite number of modern Chinese characters. However we can get a reasonable estimation by a survey of the character sets of relevant standard lists and influential dictionaries in the countries and regions where Chinese characters are used.[11]
In Chinese mainland, the important standard documents include List of Frequently Used Characters in Modern Chinese (现代汉语常用字表, totally 3,500 characters),[12] and List of Commonly Used Characters in Modern Chinese (现代汉语通用字表, 7,000 characters, including the 3,500 characters in the previous list).[13] But the current standard is Table of General Standard Chinese Characters (Tōngyòng Guīfàn Hànzì Biǎo, 通用规范汉字表), which was released by the State Council in June 2013 to replace the previous two lists and some other standards. It includes 8,105 characters of the simplified Chinese writing system, 3,500 as primary, 3,000 as secondary, and 1,605 as tertiary. In addition, there are 2,574 traditional characters and 1,023 variants. [14] Another important character set is that of Xinhua Zidian, the most popular modern Chinese character dictionary. It includes over 13,000 characters, of simplified characters, traditional characters and some variants.[15]
In Taiwan, there are the Chart of Standard Forms of Common National Characters (Chángyòng Guózì Biāozhǔn Zìtǐ Biǎo, 常用國字標準字體表, 4,808 characters) and Chart of Standard Forms of Less-Than-Common National Characters (Cì Chángyòng Guózì Biāozhǔn Zìtǐ Biǎo, 次常用國字標準字體表, 6,341 common national characters). Both lists were released by the Ministry of Education, with totally 11,149 characters of the Traditional Chinese writing system.
In Hong Kong, there is the List of Graphemes of Commonly-Used Chinese Characters (常用字字形表), for elementary and junior secondary education, totally 4,762 characters. This list was released by the Education Bureau, and is very influential in the education circles.
In Japan, there are the jōyō kanji (常用漢字, "Frequently-used Chinese characters", designated by the Japanese Ministry of Education, including 2,136 characters), and jinmeiyō kanji (人名用漢字, "Kanji for use in personal names", currently including 983 characters).
In Korea, there are the Basic Hanja for educational use (漢文敎育用基礎漢字, a subset of 1,800 Hanja defined in 1972 by a South Korea education standard), and the Table of Hanja for Personal Name Use (人名用追加漢字表), published by the Supreme Court of Korea in March 1991.[16] The list expanded gradually, and to year 2015 there were 8,142 hanja permitted to be used in Korean names.[17]
With consideration of all the character sets mentioned above, the total number of modern Chinese characters in the world is over 10,000, probably around 15,000. [18] [19] Such an estimation should not be counted as too rough, considering that there are totally over 90,000 Chinese characters (CJK Unified Ideographs) in Unicode, and more if every Chinese character ever appeared in the world is to be included. [20]
Frequencies of modern Chinese characters
Chinese character frequencies are calculated on data of corpora. A corpus is a collection of texts representative of one or more languages. The frequency of a character is the ratio of the number of its occurrences in the corpus to the total number of characters of the corpus. The formula for calculating frequency is
Fi=(ni/N) * 100%
where, ni is the number of times a certain (i-th) Chinese character appears in the corpus, and N is the total number of characters in the corpus.[21]
The first person to make a statistic study on the frequency of Chinese characters was Chen Heqin (陳鶴琴).[22] In the 1920s, he and his assistants spent two years manually counting the characters in a corpus of 554,478 characters, and obtained 4,261 different characters with frequency information. They then compiled a book "Applied Lexis of Vernacular Chinese" (語體文應用字彙).[23] The 10 most frequently-used words in their corpus are (in frequency descending order):
"的(of), 不(no, not), 一(one, a/an), 了(had, done), 是 (be), 我(I, me), 上(on, up), 他(he, him), 有(have, has), 人(person, people)".
In 2001, the Chinese University of Hong Kong published a number of frequency lists on the Web,[24] entitled "Hong Kong, Mainland China and Taiwan Chinese Frequency: a trans-reginal diachronic survey". The frequency data came from a grand corpus with a number of sub-corpora representing the Chinese languages in the three regions of Hong Kong, Mainland China and Taiwan and in the two time periods of the 1960's and 1980/90's. Each sub-corpus includes about 5,000 different characters, as shown by their frequency lists.
From the data of these frequency lists, some important and interesting features of Chinese can be discovered:
- "的", "一" and "是" are the three most frequently-used characters across the regions and time periods of the corpora. And "的" is number one in all the frequency lists.
- The 10 most frequently-used characters across the three regions and two time periods are very consistent. That means a frequently-used character in one region or period is very likely to be frequently-used in another region or period.
- The 100 most frequently-used characters in the 80/90's cover (i.e., have an accumulated frequency of) 41.00% of the Hong Kong texts of that period, 41.34% of the Mainland texts, and 41.88% of the Taiwan texts. That is more than 4 out of every 10 characters for the three regions.
- The 1000 most frequently-used characters in the 80/90's cover 89.25% of the Hong Kong texts of that period, 90.26% of the Mainland texts, and 88.74% of the Taiwan texts.
Large-scale surveys by the Ministry of Education and the State Language Commission of PRC over the years have shown that the use of Chinese characters and words has a strong distribution pattern. The number of characters used in modern Chinese is stable at about 10,000 for quite a few years. The number of characters with a coverage rate of 80%, 90%, and 99% is about 590, 960, and 2,400 respectively. [25]
Chinese character frequency is essential to quantitative research of Chinese language, and has been applied to language teaching, dictionary composition, word lists compilation, Chinese character information processing, etc. [26]
Orders of modern Chinese characters
Chinese character order, or Chinese character sorting, is the way in which a Chinese character set is sorted into a sequence for the convenience of information retrieval. [27] It may also refer to the sequence of Chinese characters.
English dictionaries and indexes are normally arranged in alphabetical order for quick lookup. But Chinese is written in tens of thousands of different characters, not just dozens of letters in an alphabet, and that makes the sorting job much more challenging. [lower-alpha 2] The orders or sorting methods of Chinese dictionaries are traditionally divided into three categories: form-based orders, sound-based orders and meaning-based orders.[28] In modern Chinese, people also use frequency orders, where words or characters are sorted by their frequencies of use in a text corpus.
Form-based orders
In this category of orders, words are sorted according to various features of the forms or shapes of Chinese characters. Comparing with sound-based orders, form-based orders have the advantages of (a) allowing lookup of characters and words without knowing their pronunciations, and (b) effective collation of large character sets without support from other sorting methods. There are two subcategories of form-based orders: stroke-based orders and component-based orders, which further includes radical-based orders, etc. [29]
Sound-based orders
There are two major sound representation systems for Mandarin Chinese or Putonghua, i.e., Pinyin and Bopomofo. Accordingly, we have Pinyin alphabetical order and Bopomofo-based order. [30]
Meaning-based orders
Meaning-based orders, also called semantics-based orders, arrange characters and words in a hierarchical structure of semantic categories. [31]
Frequency-based orders
This category of orders have Chinese characters sorted by their frequency of uses, normally in descending order. That means the most frequently-used character is at the top of the list. A frequency list is created from a text corpus. In corpus linguistics, the frequency of a character is the ratio percentage of its number of occurrences in the corpus to the total number of characters of the corpus.[21]
Orders of words
A Chinese word consist of one or more characters. Single-character words can be sorted by a character order, and multi-character words can be sorted character by character in a similar way. [32]
Forms of modern Chinese characters
(Writing of the following sections of the article is in progress. ...)
Modern Chinese characters appear in the form or shape of square blocks. There are two methods to analyze the forms of Chinese characters, source tracing analysis (溯源分析) and current status analysis (現狀分析, 现状分析). Source tracing analysis is also called the method of character creation (造字法). It takes the form of a character when it was created as the object of analysis. Current status (or current situation) analysis takes the current regular script standard form as the object, and studies the external and internal structures of the Chinese character. As an academic subject, modern Chinese characters pay more attention to current status analysis. [33]
The analysis of the external structure (外部結構, 外部结构) of Chinese characters is also called the configuration method (構形法, 构形法), which focuses on the appearance and structure of the character, and has little involvement in pronunciation and meaning. The analysis of internal structure (內部結構, 内部结构) of Chinese characters is also called the method of character formation (構字法, 构字法). It studies the relationship between the form, sound, and meaning, so as to explain the rationale for the character formation. Both external and internal structural analysis are helpful to the learning and application of Chinese characters. [34]
External structural analysis studies how the writing units are combined layer by layer into a complete Chinese character. There are three levels of structural units of Chinese characters: strokes (筆劃, 笔画), components (部件), and whole characters (整字). [35] [lower-alpha 3] For example, the character "字"(character) is composed of two components each consisting of three stokes, which can be expressed as
字 = 宀(㇔㇔㇇) + 子(㇇㇚㇐).
Strokes
Strokes (bǐhuà; 筆劃; 笔画) are the smallest writing units of Chinese characters. When writing a Chinese character, the trace of a dot or a line left on the writing material (or writing surface, such as paper) from pen-down to pen-up is called a stroke. The strokes we use nowadays did not come into being before the appearance of the Clerical script (隸書, 隶书). [37]
Stroke number is the number of strokes of a Chinese character. It varies, for example, characters "一" and "乙" have only one stroke, while character "齉" has 36 strokes, and "龘" (three 龍s, dragons) consists of 48 strokes.
Stroke forms refer to the shapes of strokes. The stroke forms of a standard Chinese character set can be classified into a stroke table (or stroke list), for instance, the Unicode CJK strokes list has 36 types of strokes: [38]
Stroke order has two meanings:
- the order of a stroke, or the direction in which a stroke is written, for example, stroke heng (㇐, horizontal) is written from left to right, stroke shu (㇑, vertical)" from top to bottom.
- the order of strokes in a character, i.e., the order in which strokes are written to form a Chinese character, for example, the stroke order of character 字 is "㇔㇔㇇㇇㇚㇐".
Because the direction of strokes is relatively simple, people generally refer to the latter meaning when talking about stroke order.
Chinese characters can be sorted into different orders by their strokes. The important stroke-based sorting methods include: Stroke-count sorting, Stroke-count-stroke-order sorting, GB stroke-based sorting and YES sorting.
There are three types of stroke combinations between two strokes: [39]
- Separation: the strokes are separated from each other. Such as: 八, 三, 小.
- Connection: the strokes are connected, such as 匕, 正, 厂, 弓, 凹, 凸.
- Intersection: the strokes are intersected. Such as: 十, 丈, 車.
Components
Writing of a spin-off article in progress ...
Whole characters
Internal structures of modern Chinese characters
Chinese character classification
Simplification of Chinese characters
Traditional Chinese characters
Simplified-traditional Chinese conversion and proofreading
Rationalisation of modern Chinese characters
Sounds of modern Chinese characters
Computer pinyin annotation of modern Chinese characters
Meanings of modern Chinese characters
Teaching and learning of modern Chinese characters
Computer processing of Chinese characters
Computer Input of Chinese Characters
Computer output of Chinese characters
Computer Internal Representation of Chinese Characters
Information Exchange of Chinese Characters
Computer character set for Chinese characters
Simplified Chinese, traditional Chinese and multilingual information processing
Chinese characters-Pinyin automatic conversion
etc.
See also
- Chinese characters
- CJK characters
- Chinese character encoding
- Chinese input methods for computers
- 通用规范汉字表
Notes
- ↑ created with wiki image files
- ↑ Chinese dictionary#Traditional Chinese lexicography (paragraph 2).
- ↑ In some applications, there are smaller configuration units, e.g., stroke segments, turning points, and pixels. [36]
References
- ↑ "What is the most spoken language?".
- ↑ Qiu, Xigui (2000). Chinese writing. Translated by Gilbert L. Mattos; Jerry Norman. Berkeley: Society for the Study of Early China and The Institute of East Asian Studies, University of California. ISBN 978-1-55729-071-7. Search this book on
(English translation of Wénzìxué Gàiyào 文字學概要, Shangwu, 1988.)
- ↑ Qiu, Xigui (裘锡圭) (2013). 文字学概要 (Chinese Writing) (in 中文) (2nd ed.). Beijing: 商务印书馆 (Commercial Press). ISBN 978-7-100-09369-9. Search this book on
- ↑ Su, Peicheng (苏培成) (2014). 现代汉字学纲要 (Essentials of Modern Chinese Characters) (in 中文) (3rd ed.). Beijing: 商务印书馆 (Commercial Press). p. 22. ISBN 978-7-100-10440-1. Search this book on
- ↑ Yin, Jiming, et. al. (殷寄明, 汪如东) (2007). 现代汉语文字学 (Modern Chinese Writing) (in 中文). Shanghai: 复旦大学出版社 (Fudan University Press). ISBN 978-7-309-05525-2. Search this book on
- ↑ Yang, Runlu (杨润陆) (2008). 现代汉字学 (Modern Chinese Characters) (in 中文). Beijing: 北京师范大学出版社 (Beijing Normal University Publishing Group). ISBN 978-7-303-09437-0. Search this book on
- ↑ Qiu 2014, pp. 45-101.
- ↑ Su 2014, pp. 19-21.
- ↑ Su 2014, pp. 5-9.
- ↑ Su 2014, pp. 51-52.
- ↑ Su 2014, p. 47.
- ↑ 现代汉语常用字表 Archived 2016-11-13 at the Wayback Machine [List of Frequently Used Characters in Modern Chinese], Ministry of Education of the People's Republic of China, 26 Jan 1988.
- ↑ 现代汉语通用字表 Archived 2016-11-23 at the Wayback Machine [List of Commonly Used Characters in Modern Chinese], Ministry of Education of the People's Republic of China, 26 Jan 1988.
- ↑ 国务院关于公布《通用规范汉字表》的通知. Gov.cn (in 中文). State Council of the People's Republic of China. 5 June 2013.
- ↑ Linguistic Institute, the Social Science Academy of China (中国社会科学院语言研究所) (2020). Xinhua Zidian (Xinhua Chinese Character Dictionary) (in 中文) (12nd ed.). Beijing: 商务印书馆 (Shangwu). ISBN 978-7-100-17093-2. Search this book on
- ↑ National Academy of the Korean Language (1991) Archived March 19, 2016, at the Wayback Machine
- ↑ '인명용(人名用)' 한자 5761→8142자로 대폭 확대. Chosun Ilbo (in 한국어). 2014-10-20. Retrieved 2017-08-23.
- ↑ Su 2014, p. 51.
- ↑ (Lecture notes of the subject "Modern Chinese Characters and Information Technology", Dept of Chinese and Bilingual Studies, Hong Kong Polytechnical University, by Dr. Zhang Xiaoheng, June 12, 2017.)
- ↑ https://www.unicode.org/reports/tr38/#BlockListing
- ↑ 21.0 21.1 Su 2014, p. 34.
- ↑ Su 2014, p. 35.
- ↑ Chen, Heqin (陳鶴琴) (1928). 語體文應用字彙 (Applied Lexis of Vernacular Chinese) (in 中文). Beijing: Shangwu (The Commercial Press). Search this book on
- ↑ "Chinese Character Frequency Statistics for Hong Kong, Mainland China and Taiwan - A Trans-Regional, Diachronic Survey: 香港、大陸、台灣 - 跨地區、跨年代漢語常用字頻統計".
- ↑ Ministry of Education, State Language Commission (教育部、國家語委) (2013). 2012年中國語言生活狀况報告 (Report on Language Life in China 2012) (in 中文). Beijing: Shangwu (The Commercial Press). Search this book on
- ↑ Su 2014, p. 42.
- ↑ Yang 2008, p. 199.
- ↑ Su 2014, pp. 183-207.
- ↑ Su 2014, pp. 189-197.
- ↑ Su 2014, pp. 197-202.
- ↑ Su 2014, pp. 184-185.
- ↑ Su 2014, pp. 201-202.
- ↑ Su 2014, p. 73.
- ↑ Su 2014, pp. 73-74.
- ↑ Su 2014, p. 74.
- ↑ Zhang, Xiaoheng et. al (张小衡, 李笑通) (2013). 一二三笔顺检字手册 (Handbook of the YES Sorting Method) (in 中文). Beijing: 语文出版社 (The Language Press). p. 6. ISBN 978-7-80241-670-3. Search this book on
- ↑ Su 2014, pp. 74-75.
- ↑ https://www.unicode.org/charts/PDF/U31C0.pdf
- ↑ Su 2014, p. 82.
External links
- Chinese Character Strokes
- https://qxk.bnu.edu.cn/#/
- https://www.chineseconverter.com/zh-cn/convert/zhuyin
- https://www.chineseconverter.com/zh-cn/convert/chinese-to-pinyin
- https://zh.wikipedia.org/wiki/漢字標準列表
- https://zh.wikipedia.org/wiki/常用國字標準字體表
- https://zh.wikipedia.org/wiki/常用字字形表
This article "Modern Chinese characters" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Modern Chinese characters. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.
| This page exists already on Wikipedia. |
