There are various methods to remove unicode characters from a String in .NET.
Below i will show you some methods and the benchmark results.
Before choosing a method, take a look at the Benchmark result and the Framework Compatibility.
Benchmark Summary
A for Loop removed 100 000 times the unicode characters of the string value
ᾭHeὣlݬl♫oѪ₪ Wor♀ld. This has been repeated 40 Times for each method. All Methods returned the right result Hello World.
Method | Average runtime (ms) |
Regex | 2 433 204 |
Regex (compiled) | 1 646 337 |
String Normalization | 1 016 305 |
Encodings | 2 183 387 |
LINQ | 492 708 |
Methods
Remove Unicode Characters using Regex
C# Version
public static String StripUnicodeCharactersFromString(string inputValue) { return Regex.Replace(inputValue, @"[^\u0000-\u007F]", String.Empty); }
VB.NET Version
Private Function ConvertToASCIIUsingRegex(inputValue As String) As String Return Regex.Replace(inputValue, "[^\u0000-\u007F]", String.Empty) End Function
Remove Unicode Characters using Regex (Compiled)
C# Version
private static Regex _compiledUnicodeRegex = new Regex(@"[^\u0000-\u007F]", RegexOptions.Compiled); public static String StripUnicodeCharactersFromString(string inputValue) { return _compiledUnicodeRegex.Replace(inputValue, String.Empty); }
VB.NET Version
Private _compiledUnicodeRegex As New Regex("[^\u0000-\u007F]", RegexOptions.Compiled) Public Function StripUnicodeCharactersFromString(inputValue As String) As [String] Return _compiledUnicodeRegex.Replace(inputValue, [String].Empty) End Function
Remove Unicode Characters using String Normalization
C# Version
public static String StripUnicodeCharactersFromString(string inputValue) { StringBuilder newStringBuilder = new StringBuilder(); newStringBuilder.Append(inputValue.Normalize(NormalizationForm.FormKD).Where(x => x < 128).ToArray()); return newStringBuilder.ToString(); }
VB.NET Version
Public Function StripUnicodeCharactersFromString(inputValue As String) As [String] Dim newStringBuilder As New StringBuilder() newStringBuilder.Append(inputValue.Normalize(NormalizationForm.FormKD).Where(Function(x) Convert.ToInt32(x) < 128).ToArray()) Return newStringBuilder.ToString() End Function
Remove Unicode Characters using Encodings
C# Version
public static String StripUnicodeCharactersFromString(string inputValue) { return Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, new EncoderReplacementFallback(String.Empty), new DecoderExceptionFallback()), Encoding.UTF8.GetBytes(inputValue))); }
VB.NET Version
Public Function StripUnicodeCharactersFromString(inputValue As String) As [String] Return Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, New EncoderReplacementFallback([String].Empty), New DecoderExceptionFallback()), Encoding.UTF8.GetBytes(inputValue))) End Function
Remove Unicode Characters using LINQ
C# Version
public static String StripUnicodeCharactersFromString(string inputValue) { return new string(inputValue.Where(c => c <= sbyte.MaxValue).ToArray()); }
VB.NET Version
Public Function StripUnicodeCharactersFromString(inputValue As String) As [String] Return New String(inputValue.Where(Function(c) Convert.ToInt32(c) <= SByte.MaxValue).ToArray()) End Function
Do you have an alternate or faster method for removing unicode?
If you have any questions or suggestions feel free to rate this snippet, post a comment or Contact Us via Email.
Related links:
- ASCII Table and Description
- Unicode
- Difference between ascii and unicode
- How does RegexOptions.Compiled work?
- To Compile or Not To Compile
RT @CodeSnippetsNET: How to remove unicode characters from a string NET http://t.co/Xtq0NaPkuT #csharp #vb