<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

In CLDR (<a class="moz-txt-link-freetext" href="http://unicode.org/cldr/">http://unicode.org/cldr/</a>) we have just recently agreed to add

transliterations (which include roundtripping and non-roundtripping

(aka transcriptions) to the registry of locale data, so they should be

available in the next release. Of course, the data will initially be

limited, but that is the purpose of the registry, to provide a common

location for collecting such data.<br>

<br>

Mark<br>

<br>

Luc Pardon wrote:

<blockquote cite="mid433A6104.85A5BCEF@skopos.be" type="cite">

  <blockquote type="cite">

    <pre wrap="">"it's Greek, and the script is Latin; for all other properties - guess".

    </pre>

  </blockquote>

  <pre wrap=""><!---->

  That sums it up quite nicely, I think.

  In the case of contemporary Greek, it could have been

transliterated/transcribed (TL/TS'd) according to ISO 843 or ELOT 743 or

anything else (just Google for "Greeklish" and you'll see what I mean).

Never mind that international standards tend to rely heavily on the

English language's sound system. If you're TL/TS-ing for a non-English

audience, some twists and tweaks may be needed, especially if your

purpose is educational.

  Many of the TL/TS mappings are not fully reversible, especially not

for a computer. I can see only one practical way to spell-check an

el-Latn document and that is a) to agree with the author on a set of

TL/TS rules (the hardest part ;-) and b) check against a dictionary that

is obtained by applying those same rules to the words from a "real"

Greek dictionary. A spell checker manufacturer could provide several

such el-Latn dictionaries, each one made with a different TL/TS

"standard". In my case, I would look for "Greek in Latin script,

transliterated for a Dutch-speaking audience" in the drop-down list,

rather than "Greek transliterated with ISO843". 

  Of course this applies not just to Greek. I have been thinking that

it's a pity that RFC3033bis doesn't address this issue explicitly. A

"transliteration ruleset used" subtag, underneath the script subtag,

would have been solved the problem - in theory. Not that I see how that

would be practical or possible, given such an open-ended set of TL/TS

methods.  Maybe it could be handled with a registry that requires the

requester to provide a "public domained" computer algorithm that

describes the mapping and/or working, open-sourced computer code. Easier

said than done. And I suppose it's off-topic for this list anyway.

  While in philosophy mode, I'll allow myself to note that I don't think

this issue is limited to script subtags only. It applies to the entire

tagging system as a whole. Intended as it is to tag human-to-human

communication, it does not - and can not - eliminate 100% of the

guesswork. There will always be some ambiguity.

   Luc Pardon

   Belgium

_______________________________________________

Ietf-languages mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Ietf-languages@alvestrand.no">Ietf-languages@alvestrand.no</a>

<a class="moz-txt-link-freetext" href="http://www.alvestrand.no/mailman/listinfo/ietf-languages">http://www.alvestrand.no/mailman/listinfo/ietf-languages</a>

  </pre>

</blockquote>

</body>

</html>