Sino-Vietnamese characters

Sino-Vietnamese characters
Vietnamese name
Vietnamese	chữ Hán Nôm
Hán Nôm	字漢喃

Sino-Vietnamese characters (Vietnamese: Hán Nômcode: vi is deprecated ^[1]) are Chinese-style characters read as either Vietnamese or as Sino-Vietnamese. When they are used to write Vietnamese, they are called Nôm. The same characters may be used to write Chinese. In this case, the character is given a Sino-Vietnamese, or Han-Viet, reading. Han-Viet is a system that allows Vietnamese to read Chinese. It is equivalent to pinyin in English.

Some of these characters are also used in China; others are used only in Vietnam. Chinese characters were introduced to Vietnam when the Han Empire invaded the country in 111 BC. Even after Vietnam became independent in AD 939, the country continued to use Classical Chinese (Hán văncode: vi is deprecated ) for official purposes. In the 1920s, Vietnam shifted from traditional characters to the Latin alphabet. The Han-Nom Institute was founded in Hanoi in 1970 to collect and study documents written in the traditional script. The institute has submitted a list of 19,981 Sino-Vietnamese characters to Unicode for electronic encoding.^[2] This includes a core set of 9,299 characters called the Nôm Ideographs.

History

A page from the bilingual dictionary Nhật dụng thường đàm (1851). Characters representing Chinese words are explained in Nôm.

Chinese characters were introduced to Vietnam after the Han Empire conquered the country in 111 BC. Independence was achieved in 939, but the Chinese writing system was adopted for official purposes in 1010.^[3] Soon after the country achieved independence, Vietnamese began to use Chinese characters to write their own language. The Van Ban bell, engraved in 1076, is the earliest known example of a Nôm inscription.^[4] Nguyen Thuyen composed Nôm poetry in the 13th century. However, none of his work has survived.^[3] The oldest surviving Nôm text is the collected poetry of King Tran Nhan Tong, written in the 13th century.^[5]

Classical Chinese was used by the royal court and for other official purposes. The Temple of Literature in Hanoi was the best-known school for the study of Chinese. The civil service examination tested knowledge of Chinese. It was given once every three years. Students who passed the exam could go on to become magistrates. Confucian scholars saw Chinese as the language of education and looked down on Nôm. Popular opinion favored Nôm. Some kings thought that all writing should be done in Chinese. They suppressed Nôm. Other kings promoted Nôm. In 1867, King Tu Duc issued a decree encouraging the use of Nôm. Only a small percentage of the population was literate in any language. But nearly every village had at least one person who could read Nôm aloud for the other villagers.^[6] Jean-Louis Taberd wrote the first Nôm dictionary in 1838.

The blue script is modern Vietnamese, while the characters in brown and green are Nôm. Characters that are also used in Chinese are shown in green, while those specific to Vietnam are in brown. It says, "My mother eats vegetarian food at the temple every Sunday."

In 1910, the colonial school system adopted a "Franco-Vietnamese curriculum", which emphasized French and alphabetic Vietnamese. The Vietnamese alphabet is a form of the Latin alphabet that includes tone marks. On December 28, 1918, King Khai Dinh declared that the traditional writing system no longer had official status.^[7] The civil service exam was given for the last time at the imperial capital of Hue on January 4, 1919.^[7] The examination system, and the education system based on it, had been in effect for almost 900 years.^[7] China itself stop using Classical Chinese soon afterward as part of the May Fourth Movement.

Language issues

Chinese characters are used to write various languages in China and elsewhere, including Mandarin, the most widely spoken language in China, Cantonese, spoken in Hong Kong and southern China, and Classical Chinese, traditionally used for formal writing. The characters were formerly used in Korea and in Vietnam. Japan uses a mix of Chinese characters and two native phonetic writing systems. Even characters that retain their original meaning in all languages may be read in various ways. The character 十 is pronounced as shí in Chinese romanization (pinyin), jū in Japanese romanization (Hepburn), sip in Korean romanization (Revised Romanization), and thập in the Han-Viet system used in Vietnam. In all these languages, the meaning of the character is “ten.”

The majority of the characters used in Nôm are of Chinese origin, chosen because they have an appropriate pronunciation or meaning. For example, the character used to write the word "Nôm" 喃 is pronounced nán in Chinese and means “chattering.”^{[note 1]} The fit between the Chinese character and the Vietnamese word is not always exact. The word "Nôm" does not have any negative connotation in Vietnamese, but rather suggests plain talk, something easy to understand.^[8]

Nôm includes thousands of characters not found in Chinese. In contrast, Japan developed only a few hundred kokuji, most of them describing plants and animals found only in Japan. Korea had just a small number of rarely used gukja. These characters were created by writers who combined pre-existing elements. One element, called the radical, indicates the character's meaning, or at least a semantic category. The other element, called the remainder, gives pronunciation. This is similar to how most Chinese characters are written. Like Chinese, Vietnamese is a tonal language. In contrast, Japanese and Korean can be written in phonetic scripts that do not indicate tone.

Readings

When a character is read as Vietnamese, it is romanized according to its Nôm reading. When it is read as Chinese, it can be romanized into Vietnamese as Han-Viet, or into English as pinyin. The chart below uses a darker background to display the Nôm Ideographs (V0 to V3), considered to be the core Nôm character set.

Hán Nôm Ideographs
Ideograph	Composition	Readings			English	Codepoint	V Source	Status in Chinese
Ideograph	Composition	Nôm	Han-Viet	Pinyin	English	Codepoint	V Source	Status in Chinese
媄code: vi is deprecated	⿰女美	mẹ	mĩ	mĕi	mother	U+5A84	V0-347E	Kangxi, HDZ
傷code: vi is deprecated	⿰亻⿱𠂉昜	thương	thương	shāng	to love	U+50B7	V1-4C22	Kangxi, HDZ, HK glyph
𠎬code: vi is deprecated	⿰亻等	đấng	đẳng	děng	Used in đấng anh hùng (heroes)	U+203AC	V2-6E62	None
𠾾code: vi is deprecated	⿰口湿	nhấp	thấp	shī	Used in nhấp nhổm (anxious)	U+20FBE	V3-3059	None
𫆡code: vi is deprecated	⿰育个	dọc	dục	yù	Used in bực dọc (frustrated)	U+2B1A1	V4-5224	None
^[9]	⿰朝乙	giàu	triêu	cháo	wealthy	U+2B86F	V4-405E	None
	⿰月報	béo	báo	bào	fat	U+F04A5^{[note 2]}	V+63D0A^[10]	None
Key: Kangxi and HDZ (Hanyu Da Zidian) are comprehensive Chinese dictionaries. The HK glyphs are a set of nearly 5,000 glyphs taught in the Hong Kong school system. Sources: The Unicode Consortium 1991-2013, The Unicode Consortium 2012. The Nôm readings are from the Vietnamese Nôm Preservation Foundation, Han-Viet is from Hán Việt Từ Điển, and pinyin is from Purple Culture.

Encoding

In 1994, the Ideographic Rapporteur Group agreed to include Sino-Vietnamese characters in Unicode.^[11] In 1993-2001, the Han-Nom Institute assembled a collection of 9,299 “Nôm Ideographs" in four sets. These are the V0, V1, V2, and V3 characters shown below. A Sino-Vietnamese character is first assigned a V Source code, and later a codepoint. These codes are used to transmit and store the character electronically. An appropriate font must be installed to render them.

The Nôm Ideographs were extracted from two dictionaries published in the 1970s, one in Saigon^[12] and the other in Hanoi.^[13]^[14] V Source annotations were added to the glyphs that were already encoded. The rest were assigned codepoints in Extension B.^[15] The Hán Nôm Coded Character Repertoire (2008) integrates the work of the Han-Nom Institute with that of the U.S.-based Vietnamese Nôm Preservation Foundation.^[2] This book presents a comprehensive list of 19,981 Sino-Vietnamese characters, including the Nôm Ideographs, manuscript variants, characters formerly used by the Tay people of northern Vietnam, as well as numerous Chinese characters with Han-Viet readings.^[14]

Set	Characters	Unicode block	Standard	Date	Example	Sources
V0	2,246	Basic Block (593), A (138), B (1,515)	TCVN 5773:1993	2001	𨒒code: vi is deprecated mườicode: vi is deprecated ten, U+28492	Vũ Văn Kính & Nguyễn Quang Xỷ 1971
V1	3,311	Basic Block (3,110), C (1)	TCVN 6056:1995	1999	喜code: vi is deprecated hỷcode: vi is deprecated happiness, U+559C	Vũ Văn Kính & Nguyễn Quang Xỷ 1971, Hồ Lê 1976
V2	3,205	Basic Block (763), A (151), B (2,291)	VHN 01:1998	2001	𣃤code: vi is deprecated vừacode: vi is deprecated fit, match, U+230E4	Vũ Văn Kính & Nguyễn Quang Xỷ 1971, Hồ Lê 1976
V3	535	Basic Block (91), A (19), B (425)	VHN 02:1998	2001	𠁙code: vi is deprecated chảcode: vi is deprecated not, U+20059	Manuscripts
V4	785	Extension C	The V4 set is split between extensions C and E. It contains 2,230 characters.^[14]	2009	𪝌 bịcode: vi is deprecated to get, U+2A74C	Vũ Văn Kính 1994, Hoàng Triều Ân 2003, Nguyễn Quang Hồng 2006
V4	1,028	Extension E		2015	phởcode: vi is deprecated noodle soup, U+2C5BE
V5	~900	This set was proposed in 2001, but the characters were already encoded. No V Source was added.		2001	㦸code: vi is deprecated kíchcode: vi is deprecated ^[16] spear, U+39B8	Vũ Văn Kính & Nguyễn Quang Xỷ 1971, Hồ Lê 1976
V6	~8,000	Basic Block, Extension A	Assembled by the Nôm Na Group. Most of these are Chinese characters that are already encoded.	Projected	鎄code: vi is deprecated aicode: vi is deprecated einsteinium, U+9384	Trần Văn Kiệm 2004
Sources: Nguyễn Quang Hồng 2008, The Unicode Consortium 1995-2013, and The Unicode Consortium 2012