How to change the encoding of a String using C# and VB.NET

To change the encoding of a String using .NET you can use this Extension Method which is part of the Fesslersoft.Extensions. This method needs a source and a target encoding. Some people might find the source encoding parameter needless, but as Joel stated in his excellent blogpost

“It does not make sense to have a string without knowing what encoding it uses” (Joel Spolsky)

Samples

Sample C#

VB.NET Sample

If you have any questions or suggestions feel free to rate this snippet, post a comment or Contact Us via Email.

Related links:

ChangeEncoding extension method for C# and VB.NET

This snippet will give you the ChangeEncoding extension method for C# and VB.NET.

Sample C#

public static string ChangeEncoding(this string input, Encoding encoding)
{
	var bytes = encoding.GetBytes(input);
	return encoding.GetString(bytes);
}

Sample VB.NET

<System.Runtime.CompilerServices.Extension> _
Public Shared Function ChangeEncoding(input As String, encoding As Encoding) As String
	Dim bytes = encoding.GetBytes(input)
	Return encoding.GetString(bytes)
End Function

How to check if a string is unicode in C# and VB.NET

To check if a string is unicode in C# and VB.NET you can use the following snippet.

Sample C#

public static bool IsUnicode(string input)
{
	var asciiBytesCount = Encoding.ASCII.GetByteCount(input);
	var unicodBytesCount = Encoding.UTF8.GetByteCount(input);
	return asciiBytesCount != unicodBytesCount;
}

Sample VB.NET

Public Shared Function IsUnicode(input As String) As Boolean
	Dim asciiBytesCount = Encoding.ASCII.GetByteCount(input)
	Dim unicodBytesCount = Encoding.UTF8.GetByteCount(input)
	Return asciiBytesCount <> unicodBytesCount
End Function

How to remove unicode characters from a string in C# and VB.NET

There are various methods to remove unicode characters from a String in .NET.
Below i will show you some methods and the benchmark results.
Before choosing a method, take a look at the Benchmark result and the Framework Compatibility.

Benchmark Summary

A for Loop removed 100 000 times the unicode characters of the string value
ᾭHeὣlݬl♫oѪ₪ Wor♀ld. This has been repeated 40 Times for each method. All Methods returned the right result Hello World.

Method Average runtime (ms)
Regex 2 433 204
Regex (compiled) 1 646 337
String Normalization 1 016 305
Encodings 2 183 387
LINQ 492 708

 

Methods

Remove Unicode Characters using Regex

C# Version

VB.NET Version

Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using Regex (Compiled)

C# Version

VB.NET Version

Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using String Normalization

C# Version

VB.NET Version

Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using Encodings

C# Version

VB.NET Version

Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

 

Remove Unicode Characters using LINQ

C# Version

VB.NET Version

Compatibility: working .NET 2.0 working .NET 3.0 not tested .NET 3.5 not working .NET 4.0 not working .NET 4.5

Do you have an alternate or faster method for removing unicode?

If you have any questions or suggestions feel free to rate this snippet, post a comment or Contact Us via Email.

Related links: