Flavors of Hepburn

Mon Sep 28 02:09:54 CEST 2009

Mark Crispin <mrc plus ietf at Panda dot COM> wrote:

> I contend that "any romanization of Japanese that fits the Hepburn 
> model better than it fits other models" is a good definition, is 
> reasonably concise, and ought to be used in the registration.

I wouldn't object to having something like that in the registration 
form.  I think putting it in the Description field would look silly, and 
be unnecessary.

> My experience suggests that the less normative the language, the more 
> that it will be misinterpreted regardless of whether the desired 
> interpretation is to be liberal or strict.
>
> More normative language is always a good thing.

I think "always" is overstating matters.  Sometimes a suggestion is just 
a suggestion.

> If I tag something as ja-Latn-Hepburn, I don't want to receive a bug 
> report that I misused the Hepburn tag for content that does not 
> precisely comply with Hepburn's 1887 dictionary.

If you do, you will have a lot of company.  People pretty much always 
use the term "Hepburn" to describe variant forms.

> If I tag something as ja-Latn, I don't want to receive a bug report 
> that I failed to tag it as Hepburn because it wrote "shi" instead of 
> "si" (but is not strictly Hepburn in other ways).

It is never a bug to tag something with less-than-maximum detail.  "ja" 
would also be an acceptable tag for such data -- probably not the best 
possible tag, but absolutely not a bug.

> If you are a little developer, the specification is vague, and what 
> you do disagrees with a [big vendor] product to the point that bad 
> results happen to users, then you are wrong by definition.
>
> And if you yield to [big vendor]'s interpretation, and your having 
> yielded subsequently creates bad results to users of [other big 
> vendor]'s product, then you are wrong by definition.

These scenarios can happen no matter how airtight the specification is. 
Just think how often you have seen the phrase "8-bit ASCII."

> At this point, you get teed off at the specification for being vague. 
> Vague specifications are not the developer's friend.

Below is a list of all the Description fields for all variant subtags 
that currently exist in the Registry.  Some are quite specific, some are 
quite vague.  I contend that there is no systematic requirement in 
evidence that these fields need to be maximally specific to a particular 
formal definition of the variant.  In particular, I contend that most of 
those which include specific years do so largely to justify the choice 
of subtag, not to provide a reference to an external standard ('1996' is 
an exception).

"Academic" ("governmental") variant of Belarusian as codified in 1959
Aluku dialect
Aukan dialect
Belarusian in Taraskievica orthography
Boni dialect
Boontling
Common Cornish orthography of Revived Cornish
Early Modern French
Eastern Armenian
German orthography of 1996
International Phonetic Alphabet
Late Middle French (to 1606)
Monotonic Greek
Nadiza dialect
Natisone dialect
Ndyuka dialect
Pamaka dialect
Pinyin romanization
Polytonic Greek
Resian
Resianic
Rezijan
Scottish Standard English
Scouse
Standardized Resian orthography
The Bila dialect of Resian
The Gniva dialect of Resian
The Lipovaz dialect of Resian
The Lipovec dialect of Resian
The Njiva dialect of Resian
The Oseacco dialect of Resian
The Osojane dialect of Resian
The San Giorgio dialect of Resian
The Solbica dialect of Resian
The Stolvizza dialect of Resian
Traditional German orthography
Unified Cornish orthography of Revived Cornish
Unified Cornish Revised orthography of Revived Cornish
Unified Turkic Latin Alphabet (Historical)
Uralic Phonetic Alphabet
Valencian
Wade-Giles romanization
Western Armenian

--
Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s