Duplicate Busters: Survey #2

Doug Ewell doug at ewellic.org
Fri Aug 1 07:02:31 CEST 2008


This is the second of two surveys being sent to both LTRU and 
ietf-languages on the subject of removing certain duplicate Description 
fields in the Language Subtag Registry.  Some of these issues affect the 
current Registry, while others affect only the proposed RFC 4646bis 
Registry being considered by LTRU.

Whereas the first survey dealt with eliminating duplicates across 
records by adding differentiating text, this survey deals with removing 
essentially duplicate Description fields within a record.  "Essentially 
duplicate" in this sense means either of two things:

1.  Two Description fields are identical, except for different 
punctuation marks (hyphens or apostrophes), or one contains letters with 
diacritical marks while the other is a pure-ASCII equivalent (i.e. all 
diacritical marks stripped).  No other types of spelling differences are 
considered (such as Kirghiz vs. Kyrgyz, or Dhivehi vs. Divehi).  The 
premise is that both Description fields convey the exact same content, 
but using slightly different typography.  The goal is to pick one and 
discard the other.

2.  Two Description fields are identical, except that one includes a 
parenthetical comment signifying a region or individual/macrolanguage 
status, and the other does not.  In each case, the description with 
comment is the ISO 639-3 name, while the description without comment is 
the ISO 639-1 and/or -2 name.  The premise is that the commented names 
convey the same content, but are less likely to be confused with other 
similarly named languages.  The goal (I hope) is to pick the commented 
(639-3) name and discard the uncommented (639-2) name; a reasonable 
alternative would be to continue to list both names.

Records are presented here in the order they will appear in the 
Registry, and are not segregated into categories 1 and 2.  Records are 
shown in hex-NCR format to allow the content to be "visible" on 
UTF-8-deprived or font-deprived systems, and in the Digest.

For each of the subtags listed below, please examine the two Description 
fields and indicate whether you think the revised Registry should keep 
the first, the second, or both.  Only the Description fields that 
conflict are shown -- for example, both "Ge'ez" and "Ge‘ez" are shown, 
but not "Ethiopic", which is also listed for the same subtag.  When the 
rate of responses slows to a trickle, I will ask the Language Subtag 
Reviewer (not myself) to make the final determination, taking list 
feedback into account as appropriate.

===

Type: language
Subtag: ms
Description: Malay (macrolanguage)
Description: Malay

---

Type: language
Subtag: sw
Description: Swahili (macrolanguage)
Description: Swahili

---

Type: language
Subtag: ain
Description: Ainu (Japan)
Description: Ainu

---

Type: language
Subtag: bas
Description: Basa (Cameroon)
Description: Basa

---

Type: language
Subtag: bem
Description: Bemba (Zambia)
Description: Bemba

---

Type: language
Subtag: chm
Description: Mari (Russia)
Description: Mari

---

Type: language
Subtag: doi
Description: Dogri (macrolanguage)
Description: Dogri

---

Type: language
Subtag: fan
Description: Fang (Equatorial Guinea)
Description: Fang

---

Type: language
Subtag: gba
Description: Gbaya (Central African Republic)
Description: Gbaya

---

Type: language
Subtag: kam
Description: Kamba (Kenya)
Description: Kamba

---

Type: language
Subtag: kok
Description: Konkani (macrolanguage)
Description: Konkani

---

Type: language
Subtag: men
Description: Mende (Sierra Leone)
Description: Mende

---

Type: language
Subtag: rup
Description: Macedo Romanian
Description: Macedo-Romanian

---

Type: language
Subtag: war
Description: Waray (Philippines)
Description: Waray

---

Type: script
Subtag: Ethi
Description: Geʻez
Description: Ge'ez

---

Type: script
Subtag: Hang
Description: Hangul
Description: Hangŭl
Description: Hangeul

(Technically I should not be including Hangeul, which is a different 
transcription of the same Korean word, not a genuinely different name. 
Make your own judgment.)

---

Type: script
Subtag: Hano
Description: Hanunoo
Description: Hanunóo


--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ



More information about the Ietf-languages mailing list