We are starting to get somewhere. It would help me if you would look over the strawman criteria that I put out, just to see where we are agreeing or not. Below, I substituted what you appear to have as a criterion (and also fixed the omission that Randy noted). With these changes, is this what you are thinking of?<div>

<br></div><div>====</div><div><br></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(51, 51, 51); "><div>A. If </div><div><ol><li>X is being encoded,</li>

<li><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: arial; font-size: small; "><b>NEW: A major industry body has been tagging X as Y (rightly or wrongly)</b></span></li><ol>

<li><i>OLD: A reasonable person, based on information in the registry, could have tagged X-content as Y in the past</i></li></ol><li>There is good evidence that a substantial amount of data has been so tagged,</li><li>and X and the standard/predominent version of Y are not mutually comprehensible (at least to the degree that say Scots English and Mississippi English are)</li>

</ol>Then Y should be made into a macrolanguage, and a new Z should be encoded to represent the standard form of Y.</div><div><br></div><div>B. For matching, Y should match <b>Y, </b>X and Z. (X should match X, and Z should match Z).</div>

<div><br></div><div>C. For lookup, Y should fetch content marked with Z. (X should fetch X, and Z should fetch Z).</div><div><br></div></span>Mark<br>

<br><br><div class="gmail_quote">On Fri, Dec 4, 2009 at 08:41, Peter Constable <span dir="ltr">&lt;<a href="mailto:petercon@microsoft.com">petercon@microsoft.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

From: <a href="mailto:ietf-languages-bounces@alvestrand.no">ietf-languages-bounces@alvestrand.no</a> [mailto:<a href="mailto:ietf-languages-bounces@alvestrand.no">ietf-languages-bounces@alvestrand.no</a>] On Behalf Of Mark Davis ?<br>


<div class="im"><br>

&gt; A strict approach would be that if Latgalian is indeed a different<br>

&gt; language from (mutually incomprehensible with) Latvian, then it<br>

&gt; was incorrect to tag any Latgalian with &quot;lav&quot;, and we just encode<br>

&gt; a new language and move on. Same for Walliserdeutsch.<br>

<br>

</div>That sounds entirely reasonable. It also sounded reasonable that Unicode should not encode any precomposed characters but rather use a dynamic-composition model. In both cases, legacy practice realistically keeps us from doing all the things that seem most reasonable. A major industry body has clearly been using &quot;lav&quot; for Latgalian (albeit this appears to have started only in the past 6 years); I&#39;m not aware of indicators of any, let alone reasonably-widespread, use of either &quot;de&quot; or &quot;gsw&quot; for Walliserdeutsch, and so if Walliserdeutsch is deemed a separate language then I wouldn&#39;t saddle de or gsw with the hassles of a macrolanguage.<br>


<font color="#888888"><br>

<br>

<br>

Peter<br>

<br>

</font></blockquote></div><br></div>