Language Subtag Registration Form: variant "signed"

Tue Feb 28 08:22:12 CET 2006

I had an hour conversation today with an SIL linguist that has been assisting teams working on sign language projects for a number of years. We discussed various things, including the phenomenon of "signed-spoken-lang-X" varieties.

In thinking about how to construct tags for these languages, the crucial question in my mind is this (using "signed English" in the US as an example): 

Is signed English 

1) basically English expressed in a different modality,
2) basically ASL with some kind of modification or qualification -- e.g. a dialect or register, or
3) a pidgin of English and ASL?

Whichever it is, we should tag it accordingly (and if three, then possibly code as though it were a distinct language).

I was thinking we might use a subtag "-signed" based on the assumption that (1) was applicable. I know understand that these language varieties typically fall into (2), though perhaps sometimes into (3); generally these are probably best thought of as registers of the relevant signed language. They typically use phonology and lexica from the sign language and impose elements of the syntax (possibly including morphology) of the spoken language.

Thus, here's my thoughts on a good approach to tagging signed languages:

- We treat "sgn" as though it were a macrolanguage. (My contact said that really wasn't that much of a stretch.)

- We use ISO 639-3 IDs as extlang subtags together with a primary subtag of "sgn"; e.g., "sgn-ase" for ASL. All signed languages (SLs proper, not the Signed English cases) use tags constructed this way.

- We generally treat varieties like Signed English as registers of the signed language they are associated with. Thus, the tag is formed by adding a variant subtag to the tag for the sign language. Because there can be multiple varieties for a given sign language/spoken language combination, registered variant subtags provide the necessary level of flexibility. E.g. something like "sgn-ase-enexact" "sng-ase-enexact2" for SEE and SEE2; and "sgn-ase-esbaja" for the signed Spanish spoken in southern Baja California (which is based on ASL).

- For cases of signed-spoken-lang-X that are pidgins, we treat these based on whatever general principles we adopt for pidgins. (I'm not sure at the moment what that should be; the problem with pidgins is that they are transitional and not yet stable -- they may stabilize as a creole or they may continue to mutate or they may disappear.)

This should give generally good results for matching algorithms. Having "sgn" at the start immediately sets apart the modality, which will generally trump any other distinction in importance to the user. A request for "sgn-ase" will return results that include (the hypothetical) "sgn-ase-enexact" and "sgn-ase-esbaja", which is likely to be reasonable: an ASL speaker is going to recognize all or nearly all of the lexical items in either case and so will have about as much problem in comprehension as we'd expect from dialect or register differences.

This is a preliminary take. I'm know my understanding is still limited, and that are likely to be several parts of what I've outlined that need work and further discussion.

Peter Constable