Variants of Japanese (was: Re: Unilingua)

Tex Texin tex at
Mon Sep 19 01:08:52 CEST 2005

Doug, John, 

The point about a Japanese variant, is not for lay people to know they have
a document that is in this variant language and tag it accordingly.
The problem is that if it is not known that a variant exists, a person would
label other (typical) Japanese content ja, rather than ja-JP.

We will not have reliably precise tags in a system where you use one label
while you are ignorant of variations and then once you become aware you use
a more precise tag.

Laypeople will not know of (nor research) many variants. Members of this
list will do a better job, but may still be unaware of variants for some
languages. You really need an expert to say with confidence there is only
one language without variation in existence.

A lot of people don't realize Canadian French is different from France's
Or for that matter that Canadian English is different from American and
British English.

It makes more sense to me to recommend for tagging purposes that people be
consistent and use region always, to reflect as closely as possible the
author's language and/or the intended audience, and for matching purposes to
be as least restrictive as needed. So tag ja-JP, but match on ja (or "ja-JP,


Doug Ewell wrote:
> Tex Texin <tex at xencraft dot com> wrote:
> > How do you know there are no other varieties of Japanese, so that ja
> > is the right answer?
> > I am under the impression there is a variant in Hokkaido and maybe
> > others. In any event, it is difficult to prove that none exists, and
> > it is a claim that can only be made by experts with knowledge of where
> > Japanese speakers are and how well their language conforms, not lay
> > people tagging content.
> If there is such a variant that lay people can identify, which bears on
> the intelligibility of the text in question, the text should be tagged
> accordingly.  If no such variation is relevant, the text should be
> tagged accordingly.  Both RFC 3066 and RFC 3066bis specifically warn
> against using tags with more information than circumstances warrant.
> > Maybe it is a version of Japanese I intend to use worldwide so I
> > should use ja-001?
> If it were established that regional variants of Japanese existed, such
> that ja-JP differed from ja-wherever in a way that affected vocabulary
> or usage or spell-checking of something, then ja-001 would indicate a
> variety of Japanese that was suitable worldwide.  It may or may not be
> true that such a distinction is necessary.
> There is such a thing as en-US and en-GB, where en-US "cookie" is
> equivalent to en-GB "biscuit" in a way that affects intelligibility (a
> "biscuit" in en-US means something different).  My understanding of 001
> is that the use of en-001 would indicate that the text thus tagged is
> free of such potential ambiguities.  The same logic would apply to any
> other language with significant regional variations.
> --
> Doug Ewell
> Fullerton, California

Tex Texin   cell: +1 781 789 1898   mailto:Tex at
Xen Master                
Making e-Business Work Around the World

More information about the Ietf-languages mailing list