Duplicate Busters: Survey #2
Doug Ewell
doug at ewellic.org
Fri Aug 1 07:02:31 CEST 2008
This is the second of two surveys being sent to both LTRU and
ietf-languages on the subject of removing certain duplicate Description
fields in the Language Subtag Registry. Some of these issues affect the
current Registry, while others affect only the proposed RFC 4646bis
Registry being considered by LTRU.
Whereas the first survey dealt with eliminating duplicates across
records by adding differentiating text, this survey deals with removing
essentially duplicate Description fields within a record. "Essentially
duplicate" in this sense means either of two things:
1. Two Description fields are identical, except for different
punctuation marks (hyphens or apostrophes), or one contains letters with
diacritical marks while the other is a pure-ASCII equivalent (i.e. all
diacritical marks stripped). No other types of spelling differences are
considered (such as Kirghiz vs. Kyrgyz, or Dhivehi vs. Divehi). The
premise is that both Description fields convey the exact same content,
but using slightly different typography. The goal is to pick one and
discard the other.
2. Two Description fields are identical, except that one includes a
parenthetical comment signifying a region or individual/macrolanguage
status, and the other does not. In each case, the description with
comment is the ISO 639-3 name, while the description without comment is
the ISO 639-1 and/or -2 name. The premise is that the commented names
convey the same content, but are less likely to be confused with other
similarly named languages. The goal (I hope) is to pick the commented
(639-3) name and discard the uncommented (639-2) name; a reasonable
alternative would be to continue to list both names.
Records are presented here in the order they will appear in the
Registry, and are not segregated into categories 1 and 2. Records are
shown in hex-NCR format to allow the content to be "visible" on
UTF-8-deprived or font-deprived systems, and in the Digest.
For each of the subtags listed below, please examine the two Description
fields and indicate whether you think the revised Registry should keep
the first, the second, or both. Only the Description fields that
conflict are shown -- for example, both "Ge'ez" and "Ge‘ez" are shown,
but not "Ethiopic", which is also listed for the same subtag. When the
rate of responses slows to a trickle, I will ask the Language Subtag
Reviewer (not myself) to make the final determination, taking list
feedback into account as appropriate.
===
Type: language
Subtag: ms
Description: Malay (macrolanguage)
Description: Malay
---
Type: language
Subtag: sw
Description: Swahili (macrolanguage)
Description: Swahili
---
Type: language
Subtag: ain
Description: Ainu (Japan)
Description: Ainu
---
Type: language
Subtag: bas
Description: Basa (Cameroon)
Description: Basa
---
Type: language
Subtag: bem
Description: Bemba (Zambia)
Description: Bemba
---
Type: language
Subtag: chm
Description: Mari (Russia)
Description: Mari
---
Type: language
Subtag: doi
Description: Dogri (macrolanguage)
Description: Dogri
---
Type: language
Subtag: fan
Description: Fang (Equatorial Guinea)
Description: Fang
---
Type: language
Subtag: gba
Description: Gbaya (Central African Republic)
Description: Gbaya
---
Type: language
Subtag: kam
Description: Kamba (Kenya)
Description: Kamba
---
Type: language
Subtag: kok
Description: Konkani (macrolanguage)
Description: Konkani
---
Type: language
Subtag: men
Description: Mende (Sierra Leone)
Description: Mende
---
Type: language
Subtag: rup
Description: Macedo Romanian
Description: Macedo-Romanian
---
Type: language
Subtag: war
Description: Waray (Philippines)
Description: Waray
---
Type: script
Subtag: Ethi
Description: Geʻez
Description: Ge'ez
---
Type: script
Subtag: Hang
Description: Hangul
Description: Hangŭl
Description: Hangeul
(Technically I should not be including Hangeul, which is a different
transcription of the same Korean word, not a genuinely different name.
Make your own judgment.)
---
Type: script
Subtag: Hano
Description: Hanunoo
Description: Hanunóo
--
Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
More information about the Ietf-languages
mailing list