The internationalized domain name (IDN) homograph attack (sometimes written as homoglyph attack) is a method used by malicious parties to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike (i.e., they rely on homoglyphs to deceive visitors).
For example, the Cyrillic, Greek and Latin alphabets each have a letter ⟨o⟩ that has the same shape but represents different sounds or phonemes in their respective writing systems.
Unicode incorporates numerous scripts (writing systems), and, for a number of reasons, similar-looking characters such as Greek Ο, Latin O, and Cyrillic О were not assigned the same code.
Indeed, it would be a rare accident for a web user to type, for example, a Cyrillic letter within an otherwise English word, turning "bank" into "bаnk".
There are cases in which a registration can be both typosquatting and homograph spoofing; the pairs of l/I, i/j, and 0/O are all both close together on keyboards and, depending on the typeface, may be difficult or impossible to distinguish visually.
This was common in medieval blackletter, which did not connect the vertical columns on the letters i, m, n, or u, making them difficult to distinguish when several were in a row.
In certain narrow-spaced fonts such as Tahoma (the default in the address bar in Windows XP), placing a c in front of a j, l or i will produce homoglyphs such as cl cj ci (d g a).
To prove the feasibility of this kind of attack, the researchers successfully registered a variant of the domain name microsoft.com which incorporated Cyrillic characters.
[citation needed] On February 6, 2005, Cory Doctorow reported that this exploit was disclosed by 3ric Johanson at the hacker conference Shmoocon.
Cyrillic non-Russian problematic letters are і and i, ј and j, ԛ and q, ѕ and s, ԝ and w, Ү and Y, while Ғ and F, Ԍ and G bear some resemblance to each other.
While Komi De (ԁ), shha (һ), palochka (Ӏ) and izhitsa (ѵ) bear strong resemblance to Latin d, h, l and v, these letters are either rare or archaic and are not widely supported in most standard fonts (they are not included in the WGL-4).
ค (A), ท (n), น (u), บ (U), ป (J), พ (W), ร (S), and ล (a) are among the Thai glyphs that can closely resemble Latin.
Other Unicode scripts in which homographs can be found include Number Forms (Roman numerals), CJK Compatibility and Enclosed CJK Letters and Months (certain abbreviations), Latin (certain digraphs), Currency Symbols, Mathematical Alphanumeric Symbols, and Alphabetic Presentation Forms (typographic ligatures).
The sole purpose of the site was to spread an April Fool's Day joke regarding the Governor of Idaho issuing a supposed ban on the sale of music by Justin Bieber.
As an additional defense, Internet Explorer 7, Firefox 2.0 and above, and Opera 9.10 include phishing filters that attempt to alert users when they visit malicious websites.
[17][18][19] As of April 2017, several browsers (including Chrome, Firefox, and Opera) were displaying IDNs consisting purely of Cyrillic characters normally (not as punycode), allowing spoofing attacks.
[20][21] Browser extensions like No Homo-Graphs are available for Google Chrome and Firefox that check whether the user is visiting a website which is a homograph of another domain from a user-defined list.
Homographic URLs that house malicious software can still be distributed, without being displayed as Punycode, through e-mail, social networking or other websites without being detected until the user actually clicks the link.
[citation needed] The IDN homographs database is a Python library that allows developers to defend against this using machine learning-based character recognition.
Proposed IDN TLDs .бг (Bulgaria), .укр (Ukraine) and .ελ (Greece) have been rejected or stalled because of their perceived resemblance to Latin letters.
The Russian registry operator Coordination Center for TLD RU only accepts Cyrillic names for the top-level domain .рф, forbidding a mix with Latin or Greek characters.
[25] In their 2019 study, Suzuki et al. introduced ShamFinder,[26] a program for recognizing IDNs, shedding light on their prevalence in real-world scenarios.