Unifon script?

Thu Oct 3 00:50:10 CEST 2013

From: Doug Ewell [mailto:doug at ewellic.org] 

> I understand now. This would entail a new field type in the Registry, which 
> would require resurrecting the LTRU WG so that RFC 5646 could be revised. 
> We would have to decide if that effort would be worth it.

Yes, it would involve that. And a new rev of BCP 47 is not low cost, so I'm not pushing for just this change. Something to keep in mind if we get other reasons to consider a revision.

> The new field type would apply to phonetic alphabets and transcription 
> systems like 'fonipa' and 'fonxsamp', orthographic reforms or choices like 
> 'bohoric' and 'baku1926', as well as romanizations like 'wadegile' and 
> 'jyutping' (such that the former could have its Prefix expanded from 
> "zh-Latn" to "zh"). But it would not apply to variants that have no relation 
> to orthography or script, such as 'aluku' and 'jauer' and 'valencia'.

Agreed. I don't see any particular problem in that: it would not have to be a required field, and nothing would be implied if the field is not present.

>> If I see "en-fonipa", then the 'en' subtag has the suppress-script 
>> field that tells me that 'Latn' can be assumed. But if I see 
>> "huy-fonipa", nothing in the LSTR tells me that. Yet that would be an 
>> equally appropriate assumption.

> Actually, you can't assume 'Latn' from "en-fonipa" either. 
> Suppress-Script identifies the *default* script for the language when 
> written normally, but the presence of a variant for a phonetic alphabet 
> completely overrides this default. 

Well, that is not specified anywhere -- that's just something that you are assuming. Of course, if you had a tag like "ru-fonipa", then it would be entirely appropriate to ignore the fact that 'ar' has a suppress-script value of "Arab" given knowledge that 'fonipa' implies Latn script. But LSTR doesn't provide that knowledge in data; implementers would have to build in additional knowledge they provide on their own. Hence my point: it would be an appropriate thing if LSTR data included a field to provide that knowledge.

In practice, I suspect you'd find implementations that make use of the LSTR data that will make the assumption that a tag with "en" and without any contra-indicating script subtag will imply "Latn", regardless of what variant is present. Perhaps there are some implementations out there that carry custom data regarding scripts implied by particular variant subtags, but I'll bet those are rare.

> Similarly, the fact that Unifon is attested for languages whose normal, 
> default writing system is (or may be) Latin has no bearing on whether 
> Unifon itself is Latin-based.

I certainly never intended to imply otherwise.

Peter