There are various methods to remove unicode characters from a String in .NET.
Below i will show you some methods and the benchmark results.
Before choosing a method, take a look at the Benchmark result and the Framework Compatibility.

Benchmark Summary

A for Loop removed 100 000 times the unicode characters of the string value
ᾭHeὣlݬl♫oѪ₪ Wor♀ld. This has been repeated 40 Times for each method. All Methods returned the right result Hello World.

Method Average runtime (ms)
Regex 2 433 204
Regex (compiled) 1 646 337
String Normalization 1 016 305
Encodings 2 183 387
LINQ 492 708

 

Methods

Remove Unicode Characters using Regex

C# Version

public static String StripUnicodeCharactersFromString(string inputValue)
{
       return Regex.Replace(inputValue, @"[^\u0000-\u007F]", String.Empty);
}

VB.NET Version

Private Function ConvertToASCIIUsingRegex(inputValue As String) As String
	Return Regex.Replace(inputValue, "[^\u0000-\u007F]", String.Empty)
End Function
Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using Regex (Compiled)

C# Version

private static Regex _compiledUnicodeRegex = new Regex(@"[^\u0000-\u007F]", RegexOptions.Compiled);

public static String StripUnicodeCharactersFromString(string inputValue)
{
	return _compiledUnicodeRegex.Replace(inputValue, String.Empty);
}

VB.NET Version

Private _compiledUnicodeRegex As New Regex("[^\u0000-\u007F]", RegexOptions.Compiled)
Public Function StripUnicodeCharactersFromString(inputValue As String) As [String]
	Return _compiledUnicodeRegex.Replace(inputValue, [String].Empty)
End Function
Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using String Normalization

C# Version

public static String StripUnicodeCharactersFromString(string inputValue)
{
	StringBuilder newStringBuilder = new StringBuilder();
	newStringBuilder.Append(inputValue.Normalize(NormalizationForm.FormKD).Where(x => x < 128).ToArray());
	return newStringBuilder.ToString();
}

VB.NET Version

Public Function StripUnicodeCharactersFromString(inputValue As String) As [String]
	Dim newStringBuilder As New StringBuilder()
	newStringBuilder.Append(inputValue.Normalize(NormalizationForm.FormKD).Where(Function(x) Convert.ToInt32(x) < 128).ToArray())
	Return newStringBuilder.ToString()
End Function
Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using Encodings

C# Version

public static String StripUnicodeCharactersFromString(string inputValue)
{
	return Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, new EncoderReplacementFallback(String.Empty), new DecoderExceptionFallback()), Encoding.UTF8.GetBytes(inputValue)));
}

VB.NET Version

Public Function StripUnicodeCharactersFromString(inputValue As String) As [String]
	Return Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, New EncoderReplacementFallback([String].Empty), New DecoderExceptionFallback()), Encoding.UTF8.GetBytes(inputValue)))
End Function
Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using LINQ

C# Version

public static String StripUnicodeCharactersFromString(string inputValue)
{
	return new string(inputValue.Where(c => c <= sbyte.MaxValue).ToArray());
}

VB.NET Version

Public Function StripUnicodeCharactersFromString(inputValue As String) As [String]
	Return New String(inputValue.Where(Function(c) Convert.ToInt32(c) <= SByte.MaxValue).ToArray())
End Function
Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

Do you have an alternate or faster method for removing unicode?

If you have any questions or suggestions feel free to rate this snippet, post a comment or Contact Us via Email.

Related links:

One thought on “How to remove unicode characters from a string in C# and VB.NET”

Leave a Reply