There are various methods to remove unicode characters from a String in .NET.
Below i will show you some methods and the benchmark results.
Before choosing a method, take a look at the Benchmark result and the Framework Compatibility.
Benchmark Summary
A for Loop removed 100 000 times the unicode characters of the string value
ᾭHeὣlݬl♫oѪ₪ Wor♀ld . This has been repeated 40 Times for each method. All Methods returned the right result Hello World .
Method
Average runtime (ms)
Regex
2 433 204
Regex (compiled)
1 646 337
String Normalization
1 016 305
Encodings
2 183 387
LINQ
492 708
Methods
Remove Unicode Characters using Regex
C# Version
public static String StripUnicodeCharactersFromString ( string inputValue )
{
return Regex . Replace ( inputValue , @"[^\u0000-\u007F]" , String . Empty ) ;
}
VB.NET Version
Private Function ConvertToASCIIUsingRegex ( inputValue As String ) As String
Return Regex . Replace ( inputValue , "[^\u0000-\u007F]" , String . Empty )
End Function
Compatibility: .NET 2.0 .NET 3.0 .NET 3.5 .NET 4.0 .NET 4.5
Remove Unicode Characters using Regex (Compiled)
C# Version
private static Regex _compiledUnicodeRegex = new Regex ( @"[^\u0000-\u007F]" , RegexOptions . Compiled ) ;
public static String StripUnicodeCharactersFromString ( string inputValue )
{
return _compiledUnicodeRegex . Replace ( inputValue , String . Empty ) ;
}
VB.NET Version
Private _compiledUnicodeRegex As New Regex ( "[^\u0000-\u007F]" , RegexOptions . Compiled )
Public Function StripUnicodeCharactersFromString ( inputValue As String ) As [ String ]
Return _compiledUnicodeRegex . Replace ( inputValue , [ String ] . Empty )
End Function
Compatibility: .NET 2.0 .NET 3.0 .NET 3.5 .NET 4.0 .NET 4.5
Remove Unicode Characters using String Normalization
C# Version
public static String StripUnicodeCharactersFromString ( string inputValue )
{
StringBuilder newStringBuilder = new StringBuilder ( ) ;
newStringBuilder . Append ( inputValue . Normalize ( NormalizationForm . FormKD ) . Where ( x = > x < 128 ) . ToArray ( ) ) ;
return newStringBuilder . ToString ( ) ;
}
VB.NET Version
Public Function StripUnicodeCharactersFromString ( inputValue As String ) As [ String ]
Dim newStringBuilder As New StringBuilder ( )
newStringBuilder . Append ( inputValue . Normalize ( NormalizationForm . FormKD ) . Where ( Function ( x ) Convert . ToInt32 ( x ) < 128 ) . ToArray ( ) )
Return newStringBuilder . ToString ( )
End Function
Compatibility: .NET 2.0 .NET 3.0 .NET 3.5 .NET 4.0 .NET 4.5
Remove Unicode Characters using Encodings
C# Version
public static String StripUnicodeCharactersFromString ( string inputValue )
{
return Encoding . ASCII . GetString ( Encoding . Convert ( Encoding . UTF8 , Encoding . GetEncoding ( Encoding . ASCII . EncodingName , new EncoderReplacementFallback ( String . Empty ) , new DecoderExceptionFallback ( ) ) , Encoding . UTF8 . GetBytes ( inputValue ) ) ) ;
}
VB.NET Version
Public Function StripUnicodeCharactersFromString ( inputValue As String ) As [ String ]
Return Encoding . ASCII . GetString ( Encoding . Convert ( Encoding . UTF8 , Encoding . GetEncoding ( Encoding . ASCII . EncodingName , New EncoderReplacementFallback ( [ String ] . Empty ) , New DecoderExceptionFallback ( ) ) , Encoding . UTF8 . GetBytes ( inputValue ) ) )
End Function
Compatibility: .NET 2.0 .NET 3.0 .NET 3.5 .NET 4.0 .NET 4.5
Remove Unicode Characters using LINQ
C# Version
public static String StripUnicodeCharactersFromString ( string inputValue )
{
return new string ( inputValue . Where ( c = > c <= sbyte . MaxValue ) . ToArray ( ) ) ;
}
VB.NET Version
Public Function StripUnicodeCharactersFromString ( inputValue As String ) As [ String ]
Return New String ( inputValue . Where ( Function ( c ) Convert . ToInt32 ( c ) <= SByte . MaxValue ) . ToArray ( ) )
End Function
Compatibility: .NET 2.0 .NET 3.0 .NET 3.5 .NET 4.0 .NET 4.5
Do you have an alternate or faster method for removing unicode?
If you have any questions or suggestions feel free to rate this snippet, post a comment or Contact Us via Email .
Related links: