[YES] The Linguasphere proposal is suited to RFC 3066(oritssuccessors) and its consuming protocols

Tue Jun 22 00:59:57 CEST 2004

Debbie Garside wrote (and her text is included in "quotes" as blow:

"The LS 639 referential scale would not be used for tagging data."

1. Why then are we discussing LS 639 in this forum?

The ietf-languages at iana.org list, and indeed RFC 3066 and its predecessors
(and successors) are only for tagging data.

2. There is still no information from the LS proponents about the needs
statement for a standard based on an alpha4 mapping of Linguasphere codes,
despite requests from other list members for this.

Given 1. and 2., there are also serious doubts in my mind as to whether
what is described as LS 639 should also be part of ISO 639 work items too,
as well as questions about any role in relation to RFC 3066bis etc.

Other comments by Debbie give rise to further doubts:

"For the purposes of tagging just the static Alpha4 langtag is required,"

But what is the _purpose_ of tagging?

"the Linguasphere system does the rest."

The rest of what exactly, please?

"AND
 the system will (and does already) quite easily map to other
standards."

Any system can be made to map to any other, if you have a finite number of
identical entities.

Mapping to which standards?

What are the benefits of such mappings?

Are all mappings oneto-one with the codes on other standards?

All the above statements from Debbie seem rather vague.

"The crux of the matter seems to be focusing on the question of USE for a
system of such detailed granularity.  We can discuss the various
technicalities ad infinitum,"

indeed it has been, but a USE has not been shown within the context of
RFC3066bis or ietf-languages at iana.org, as requested.

"but the system does work."

How does it work? And to what purpose? And with what relevance to either
(a) other parts of ISO 639 or (b) RFC3066 or its successors?

Currently I fail to see the relevance of it either to (a) other parts of
ISO 639 or (b) RFC3066 or its successors.

Misha's original question

   Does anyone here consider the Linguasphere stuff to be suited
   to RFC 3066* and its consuming protocols?

seems to be a resounding NO, from nearly all participants, as far as I can
work out.

John Clews

---------------------------- Original Message ----------------------------
Subject: RE: [YES] The Linguasphere proposal is suited to RFC
3066(oritssuccessors) and its consuming protocols
From:    "Debbie Garside" <debbie at ictmarketing.co.uk>
Date:    Mon, June 7, 2004 5:51 pm
To:      "Debbie Garside" <debbie at ictmarketing.co.uk>
         "Clay Compton" <clayco at microsoft.com>
         ietf-languages at iana.org
--------------------------------------------------------------------------

Off the cuff

Here is my response to a few of the questions/statements made during the
course of these discussions
 and I apologise for the repetition in
advance
 All that I say now is in rapid response, pending the return of
David Dalby, the architect of LS 639.

John Cowan sub-script

>The worst problem I see with the Linguasphere identifiers is the >extreme
difficulty of relating the more general to the less general, >as must be
done if requests are to be appropriately satisfied.  It may >make sense to
assign distinct 4-letter codes to such linguistic >entities as:

	English
	Hiberno-English
	Hiberno-English, spoken
	Hiberno-English, spoken in Dublin
	Hiberno-English, spoken in Dublin on the North Circular Road
	Hiberno-English, spoken in Dublin on the North Circular Road (south side)

>but a supplier of information that has content tagged with the last code
will not be able to reply to a request for simply "English" >unless it
grasps this particular branch of the entire system (which leads up to
"Germanic" and "Indo-European" at higher levels, if I understand
correctly).

>In order to do this, it must have the Linguasphere key (hierarchical
identifier) corresponding to the 4-letter code, but this is (a) >unstable
and (b) brittle, with its fixed maximum hierarchical depth of 8 and >its
limited fanout of 10 to 26 siblings at each level.

The LS 639 referential scale would not be used for tagging data.  The
structure is as follows:

Each linguistic item within the Linguasphere is allocated a place within
the referential scale (Flexible).

Each linguistic item within the Linguasphere is allocated a category
number - usually fixed can be flexible - as it would have been with the
Serbia-Croatian situation (which I am sure David Dalby will explain if
required) which denotes where within the database it fits:  e.g. 40 for
Language, 41 for Language Variety, 42 for Component of Language Variety.,
50 for language written variety, 51 for component of written variety etc.
I can see a very relevant use for this in cataloguing for library purposes
so that the data inputter is not faced with 000’s of codes.

Each linguistic item is allocated an alpha4 “langtag” (Fixed) COMPLETELY
FIXED
Each linguistic item is also allocated its PRECEDING alpha4 “langtag”
(Fixed – but could possibly be changed with any changes annotated) thus
forming the relationships between the “languages” and it is this that
makes it such a simple hierarchical world language map.  It is a simple
relational database based on the Linguasphere
Register 1999/2000 which is superb for the purpose."
This system means that the referential scale can be changed, giving the
required flexibility for the purpose of linguistics, whilst providing a
fixed system (alpha4 langtag) for coding purposes
Each aspect of the Linguasphere can be viewed/used in its own right or as
part of a hierarchy.  The system can be used as a “bare bones” system with
just “Language name” and alpha4 code or “Language Variety” and Alpha4 code
etc. or as the Linguasphere map using the preceding alpha4 langtag. It is
not recommended to use the referential scale for tagging purposes as,
quite rightly pointed out by John, this could/can change at any time with
a cascade update feature when other linguistic items are added (although
this will also be annotated within the system).

Sample Data (this is not all the fields obviously but merely the ones in
question here)

Layer		Referential Scale	Language Name		Alpha4	      Preceding
											Alpha4
40		00AAAAa		Bamanan.kan (Bambara)	bmnk		bmnn

NB This sample data does not display the current mapping with other
standards.

For the purposes of tagging just the static Alpha4 langtag is required,
the Linguasphere system does the rest.

AND
 the system will (and does already) quite easily map to other standards.

The crux of the matter seems to be focusing on the question of USE for a
system of such detailed granularity.  We can discuss the various
technicalities ad infinitum, but the system does work.

One thing I can say before answering the question on use is: Given 4
billion IP addresses, who would have predicted a need for a greater range?

So
 to the question of USE
 and detailed granularity – please see:
http://www.linguasphere.com/grassroots.asp

Peter Constable sub-script

>and I *really* would like to
>see better analysis justifying the need. In the absence of such
>analysis, I'm not sure I could recommend to the US TAG that they vote in
favour of accepting a NWIP.

Michael Everson sub-script

>We have the same concern in Ireland.

I hope the issues of use and granularity are clearer.

Having received several personal emails
 hate to do it again
 but one or
two from "eminent professors"
, and I have to say “thank you” to the
United Nations (cos even I was impressed by that) I am beginning to
understand that some people are experiencing a certain amount of
trepidation towards entering this forum (can’t think why).  Therefore, I
invite private questions to be sent to my personal email address where
they will be treated in the strictest confidence and answered by the
appropriate person within the organisation.

One final thank you
 to the 5 Corporate Directors and the Globalisation
team in Canada for  “taking time” and teaching me the true “value” of the
Linguasphere and to the European Vice-President and his staff for
facilitating the process.  The table is set and the Moules Mariniere are
cooking

Re: crystal balls and Euro 2004
 Sorry
 I’m a tennis fan

Crystal Ball says Henman for Wimbledon Champ
. Champagne, strawberries and
cream
 and the best of British sport 
 and
 if my crystal ball (currently
residing on my bedside cabinet) proves correct a ticket to the men’s
singles final would be gratefully received

Debbie

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no
[mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Debbie Garside
Sent: 07 June 2004 10:43
To: Clay Compton; ietf-languages at iana.org
Subject: RE: [YES] The Linguasphere proposal is suited to RFC 3066
(oritssuccessors) and its consuming protocols

>"cy-cyde-prsl" is a perfectly valid tag in RFC 3066 today, it accurately
reflects that the tagged language variety is related to Welsh (which makes
it more aesthetically satisfying)

I agree... I'm not a programmer but that structure seems completely
logical to me...

I will show, later today, how the Linguasphere system can work in exactly
this way... I am compiling my response to the questions raised...

>If the implications of the proposal for RFC 3066 are to allow subtags based
on the language varieties and communities in the LS Register, this is an
occasion for wild celebration...

Be ready for wild celebration...

Debbie

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no
[mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Clay Compton
Sent: 04 June 2004 22:17
To: ietf-languages at iana.org
Subject: [YES] The Linguasphere proposal is suited to RFC 3066 (or
itssuccessors) and its consuming protocols

Comments:

What can I say; maybe I just enjoy being contrary.  However, I think
adding *parts* of the Linguasphere proposal the RFC 3066 can be
beneficial.  For one thing, it would cut back on the number of custom tags
requested in this forum, which most RFC 3066 implementers don't seem to
notice, anyway. My continued support depends on how RFC 3066 gets extended
to support the LS 639 tags.  Clearly, "ineu" (Indo-European) is not a
language and should never be used for tagging content.  By the same token,
neither is "prsl" (Preseli Welsh).  However, "cy-cyde-prsl" is a perfectly
valid tag in RFC 3066 today, it accurately reflects that the tagged
language variety is related to Welsh (which makes it more aesthetically
satisfying), and lega cy
systems that parse the subtags in the tag (which they shouldn't do, but do
anyway) would correctly fall back to "cy".  If the implications of the
proposal for RFC 3066 are to allow subtags based on the language varieties
and communities in the LS Register, this is an occasion for wild
celebration.  Of course, I'd like to hear the Linguasphere folks pledge
that they'll avoid any tag name collisions with ISO 15924.
It's true that there would be a lot of tags in LS 639, but I'm not
complaining.  I think they (we) can handle the change as long as RFC
3066's hypothetical successor has an
"LS639-tags-as-subtags-for-language-varieties-only" rule that generates
tags like the one I suggest above.

Clay Compton

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no
[mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Misha Wolf Sent:
Friday, June 04, 2004 12:24 PM
To: ietf-languages at iana.org
Subject: [YES/NO] The Linguasphere proposal is suited to RFC 3066 (or its
successors) and its consuming protocols

Ooops.  This version is better :-)

Misha

-----Original Message-----
From: Misha Wolf
Sent: 04 June 2004 20:22
To: 'ietf-languages at iana.org'
Subject: The Linguasphere proposal is suited to RFC 3066 (or its
successors) and its consuming protocols -- [YES/NO]

I'd like to carry out an experiment and hope the list moderator
doesn't object.  This is based on a system Michael Sperberg-McQueen used
with the W3C XML Schema WG.  The WG had a vast number of
members and lots of decisions to make.  Sometimes email ballots
were used, with the question and the vote both placed in the
Subject line for automated processing.  I seem to recall that the
idea was that there was no need to read the mail itself, as the
only relevant information was in the Subject line.

If you agree with this experiment and have an opinion, please reply to
this mail, deleting either the "YES" or the "NO" from the Subject line.

If you agree with this experiment and do not have an opinion, please skip
to the next mail in your Inbox.

If you do not agree with this experiment and want to write a mail
saying that it is a load of nonsense, please leave both the "YES"
and the "NO" in place.

Thanks

Misha Wolf
Standards Manager
Product and Platform Architecture Group
Reuters Limited

-----Original Message-----
From: Misha Wolf
Sent: 04 June 2004 19:47
To: ietf-languages at iana.org
Subject: RE: Linguasphere -- An appeal for clarity

Can we have a straw poll re Q2 ...?

   Does anyone here consider the Linguasphere stuff to be suited
   to RFC 3066* and its consuming protocols?