For the last couple of months Ramin Yazdani has been looking into phishing
domains using Unicode characters to appear like the target domain. In this
process he developed a new ‘confusables’ table of Unicode characters which can
easily be mistaken for their ASCII counterpart. The table is based on the
‘Unicode Confusables list’ and the ‘Unicode Similarity List’.
The proposed Unicode Confusables table can be found here.
The dataset is supplied as a ‘csv’ file where the first column represents the
decimal codepoints of the Unicode characters. The following columns together
represent the homoglyph for this character (if there is a string to character
mapping you would see multiple homoglyph parts, otherwise only one part).
Additionally, Ramin used the confusables table to find domains which have a ASCII
counterpart. The research is aimed at finding malicious Unicode homoglyph
domains. To this end Ramin compared his findings with entries from the
following blacklists:
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email