North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Non-English Domain Names Likely Delayed

  • From: Neil Harris
  • Date: Mon Jul 18 11:41:50 2005


Michael, your idea of mapping confusable characters to a single "master" character was one of the options which was considered, but rejected.

To see why, consider the Turkish dotless-i in your second example. Now, to most non-Turkish readers, dotless-i is a homograph of the more common dotted-i character. If we map both to ASCII code 105, we've eliminated the homograph for non-Turkish users, but we then deny Turkish users the useful distinction between the two letters. Adding epicycles to this scheme with character-set tags, or filter rules based on locale setting on the client unfortunately make things worse not better.

This example actually illustrates rather nicely why it is so important that different TLDs, particularly ccTLDs, should be able to have different rules. For example, it's possible (I don't know Turkish) that there may be some pair of names in Turkish for which may be distinguished entirely by the difference between dotted and dotless-i.

Any procedure for preventing spoofing must bear in mind the fact that registries process vast numbers of registrations daily, and human oversight is not generally possible in the general case.

Bundling using confusables-tables, with appropriate considerations for cultural variations in what is confusable, is a much more effective approach, and allows subtle distinctions to be retained for those labels for which they are useful.

For example, the example of registering a dotless-i in a name registered in .fr could be easily dealt with by bundling, even if for French purposes dotted and dotless-i were normalized to the same equivalence set of confusable characters, provided that no potentially confusable French name had been registered first.

-- Neil