Convert domain names to punicode when needed
-
Cloudron SHOULD NOT have to automatically detect such domain names because the user themself might enter the non-ASCI domain incorrectly. I myself have more than one language/leyboard set up on my Mac, and rarely (nowadays) I forget to switch back to my main US keyboard. Here is an example:
shaneсooke.com = shaneсooke.com
shаnecooke.com = shаnecooke.com
shanecookе.com = shanecookе.comSo, what, if I mistakenly type in my non-ASCI domain and end up with the wrong punycode due to Cloudron doing exactly what it is supposed to do - I'm then going to open up tickets claiming Cloudron got my domain wrong...?
Look, I love how much easier Cloudron makes our online digital lives easier. But it can't, and shouldn't, do everything! Particularly, it should not be made responsible for aspects that the user should be managing properly, like making sure they enter the punycode version on their non-ASCI domain name which they should already know.
-
-
I went looking for how the DNS providers treat this in the API and also UI. Just a random sampling:
- DigitalOcean requires Punycode input from user - https://docs.digitalocean.com/support/how-do-i-add-a-domain-that-contains-special-characters/
- Vultr will do automatic Punycode conversion - https://www.vultr.com/docs/introduction-to-vultr-dns/?gspk=dHJveWNsYXJrNzM4Ng&gsxid=wQpnVuVRFBud#FAQ
- Cloudflare supports it directly in the UI - https://blog.cloudflare.com/non-latinutf8-domains-now-fully-supported/
- Route53 allows you to register with unicode but you must convert for subdomain records - https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DomainNameFormat.html#domain-name-format-idns
-
@scooke I didn't follow the initial argument completely. Is the issue that user might confuse himself and enter the wrong domain when adding it into Cloudron ?
Atleast, in german, the Ü, Ä, Ö are just normal alphabets and part of many words. So, why restrict to just ascii?
-
My point is that it is simple enough for a user to already convert the domain name to punycode. In fact, most likely it is displayed as punycode in the dashboard of their registrar. This is most useful - just copy and paste that. If the user were to try to type their domain name in natively, they need to be sure their keyboard is set to enter the correct characters. I suppose if they only ever work in two language - their own with the special characters, and English, it should be fine. But for others, like myself, I have multiple keyboards for different languages I work in, and sometimes I will either not switch back, or accidently key strike and switch the keyboard, and I will get a few words in before realizing ('cause I'm looking at the keys, not the screen). The examples above didn't actually work out (I should have used "code"), but the English C is the same as a Cyrillic s-sound, their A's look the same, as well as the E's (I want to add that this isn't for Russian but another language that uses a modified Cyrillic alphabet). Even now I still sometimes type Chahe, rather than Shane.
Who would make this error? Well, I suppose the subpoint is that if the user is wanting to use special characters but doesn't know about IDNs and punycode, that same user is probably going to make errors like this. They will also discover, depending on the language, that not all punycode domains will actually render properly in the address bar of a web browser! Will they then ask Cloudron to solve that problem too?
EDIT: I found an example: https://stackoverflow.com/questions/66335852/what-is-the-difference-between-ö-and-ö . The Ö's look the same, but are not the same. Another RtL example involves some vowels like ۇ, which can be written either with U+06C7, or U+0648 + U+064F. Technically only the first is correct, but enough users out there use the second to make this a potential IDN problem. Those two punycode domains would be totally different.
So, kudos to Cloudron once more for being open to Requests, but this one really ought to be the user's responsibility.
-
So as far as I understand the issue, this is only an UI/dashboard feature which checks if a domain has non-ascii characters and then converts it to punycode for the API.
Also reversed, if the UI displays domains with punycode it can convert it to unicode domains.
@scooke would this work for you? Otherwise I may not fully understand your concern here. The only thing I can otherwise think of, is that you are concerned that domains converted to unicode may look visually the same, but they are not the same? If so I guess we could show the punycode in brackets or tooltip?
-
Hi @nebulon and @girish, hey, you are free to fulfill whichever Feature Request you want! I'm just trying to make the point that, what with ALL that Cloudron already does, adding this particular feature need not be done since the user has to enter it properly anyway. And if a user manages to figure out what their punycode domain is, enters it, and then discovers it is too niche of a language to actually resolve in the browser, then Cloudron will take the blame... again. My own example - Aboriginal Syllabic domains do not resolve, even with punycode. But I'm not going to come blaming you guys, or asking you guys to make it work for me, as though it is some shortcoming of Cloudron. Many of my responses on this forum lean that way - promote Cloudron, defend the devs, help where help will be used.
-
The topic is a bit bigger and is called UA (Universal Acceptance). There is a working group at ICAN on this topic https://www.icann.org/ua / https://www.icann.org/uasg-en
@nebulon remember my flash talk at Cloudfest about "the internet is broken" a few years ago?
Most of the time it's not a problem on the browser http/https side. Try sending mails from the donnerdöner.de domain -
@scooke ha ha, no worries. We are just trying to understand your concerns really and if there is something we need to be concerned about if/when we implement this. Some of the other providers seem to have unicode field for domains already and our UI seems lagging behind here. I understand some of your concerns though like the homograph attack.