(no subject)

Thu, 5 Dec 2002 16:49:34 -0600

On 12/05/2002 02:37:45 PM John Cowan wrote:

>> In a message I just wrote, I made reference to HTML lang and xml:lang,
= > but
>> the issue extends to all protocols that deal with metadata of this sort
= > .
>> HTTP is one more example.
>
>I believe it's fundamentally different.  The first two tag text that you
>already have, whereas the last provides information about text that you
>have not yet got and may wish to influence the receipt of.

Yes, the situations are different, but my general point was that you have
to ask the question about metadata for *any* protocols in which that kind
of information might be relevant.

It is clearly an issue for cataloguing / retrieval systems of all types. I
think it still is an issue in document markup infrastructures, though. In
part, the reason is that it is an issue for *document processors* even if
it isn't for the documents themselves. I.e. even if you don't need
something akin to xml:lang that identifies script because you can simply
tell from the characters in the content what the script is, you still need
something comparable in the software that processes the data because the
software needs to manage recourses related to the given writing system --
you can't inspect a software resource to see what characters it contains.

Also, there is more than just the script involved. Two English strings may
both contain characters in the range a-z, but one may be a phonetic
transcription that just happens not to contain characters beyond that
range. A system may need attributes on an element that tells it this so
that it knows what other data within a system to associate it with. Then
there's the matter of orthography: two orthographic conventions for a given
language may well not differ at all in terms of characters (indeed, that's
probably the typical case), but the content needs to be tagged for this as
you can't always tell from a given segment of content which is being used
-- again, the need for the system to know what other data within the system
to associate it with.

So, I still think this is relevant for HTML lang and xml:lang. The needs in
cataloguing / retrieval e.g. with HTTP are more immediately evident,
though.

(Usage-scenario issues such as these are also topics that I explored in my
IUC 21 paper.)

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485