Monday, June 21, 2010

Removing Diacritics

diacritic-symbols.jpg

Had to solve a problem of removing diacritics from words that were to be submitted in a form, with the help of Kaito found this code:

private static String RemoveDiacritics(String s) {
   String normalizedString = s.Normalize(NormalizationForm.FormD);
   StringBuilder stringBuilder = new StringBuilder();
   for (int i = 0; i < normalizedString.Length; i++) {
     Char c = normalizedString[i];
     if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
        stringBuilder.Append(c);
   } 
   return stringBuilder.ToString();
}

and the result of this function is as follows:

RemoveDiacritics ("é") -> "e"
RemoveDiacritics ("ü") -> "u"
RemoveDiacritics ("á") -> 'a'

Posted via email from Rocha's place

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.