Decode string with hex character codes to UTF-8 characters
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
From a system we receive messages that contain codes that represent UTF-8 characters.
For example :
var str="Test =64 =C2=AE =E1=A6=92 test";
To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:
protected static string ReplaceHexCodesInString(string input)
var output = input;
var encoding = Encoding.UTF8;
var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));
var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");
output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));
var regRemainingHex = new Regex("=([0-9A-F]2)");
output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));
return output;
This seems to work as expected for what's currently in those messages.
Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = ð¸)
But can this be simplified?
Perhaps there's already a standard function?
I searched, but haven't found a good standard build-in C# function that already does this type of conversion.
Well, except for an example that uses a function from System.Net.Mail
.
But it seems very error-prone and requires a very specific format.
var input = "bl=61=C2=B0";
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;
c# utf-8
add a comment |Â
up vote
3
down vote
favorite
From a system we receive messages that contain codes that represent UTF-8 characters.
For example :
var str="Test =64 =C2=AE =E1=A6=92 test";
To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:
protected static string ReplaceHexCodesInString(string input)
var output = input;
var encoding = Encoding.UTF8;
var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));
var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");
output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));
var regRemainingHex = new Regex("=([0-9A-F]2)");
output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));
return output;
This seems to work as expected for what's currently in those messages.
Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = ð¸)
But can this be simplified?
Perhaps there's already a standard function?
I searched, but haven't found a good standard build-in C# function that already does this type of conversion.
Well, except for an example that uses a function from System.Net.Mail
.
But it seems very error-prone and requires a very specific format.
var input = "bl=61=C2=B0";
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;
c# utf-8
Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
â Roland Illig
Feb 5 at 17:51
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
From a system we receive messages that contain codes that represent UTF-8 characters.
For example :
var str="Test =64 =C2=AE =E1=A6=92 test";
To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:
protected static string ReplaceHexCodesInString(string input)
var output = input;
var encoding = Encoding.UTF8;
var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));
var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");
output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));
var regRemainingHex = new Regex("=([0-9A-F]2)");
output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));
return output;
This seems to work as expected for what's currently in those messages.
Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = ð¸)
But can this be simplified?
Perhaps there's already a standard function?
I searched, but haven't found a good standard build-in C# function that already does this type of conversion.
Well, except for an example that uses a function from System.Net.Mail
.
But it seems very error-prone and requires a very specific format.
var input = "bl=61=C2=B0";
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;
c# utf-8
From a system we receive messages that contain codes that represent UTF-8 characters.
For example :
var str="Test =64 =C2=AE =E1=A6=92 test";
To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:
protected static string ReplaceHexCodesInString(string input)
var output = input;
var encoding = Encoding.UTF8;
var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));
var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");
output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));
var regRemainingHex = new Regex("=([0-9A-F]2)");
output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));
return output;
This seems to work as expected for what's currently in those messages.
Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = ð¸)
But can this be simplified?
Perhaps there's already a standard function?
I searched, but haven't found a good standard build-in C# function that already does this type of conversion.
Well, except for an example that uses a function from System.Net.Mail
.
But it seems very error-prone and requires a very specific format.
var input = "bl=61=C2=B0";
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;
c# utf-8
edited Feb 5 at 17:28
t3chb0t
32.1k54195
32.1k54195
asked Feb 5 at 16:06
LukStorms
1184
1184
Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
â Roland Illig
Feb 5 at 17:51
add a comment |Â
Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
â Roland Illig
Feb 5 at 17:51
Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
â Roland Illig
Feb 5 at 17:51
Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
â Roland Illig
Feb 5 at 17:51
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
Are you willing to use %
instead of =
?
If so Uri.UnescapeDataString
shall be sufficient. if not you can always Replace("=", "%")
and use UnescapeDataString
anyway.
Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d î ᦠtest
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
I've changed it to a oneliner that only targets those hex codes:new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
â LukStorms
Feb 6 at 9:49
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Are you willing to use %
instead of =
?
If so Uri.UnescapeDataString
shall be sufficient. if not you can always Replace("=", "%")
and use UnescapeDataString
anyway.
Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d î ᦠtest
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
I've changed it to a oneliner that only targets those hex codes:new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
â LukStorms
Feb 6 at 9:49
add a comment |Â
up vote
2
down vote
accepted
Are you willing to use %
instead of =
?
If so Uri.UnescapeDataString
shall be sufficient. if not you can always Replace("=", "%")
and use UnescapeDataString
anyway.
Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d î ᦠtest
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
I've changed it to a oneliner that only targets those hex codes:new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
â LukStorms
Feb 6 at 9:49
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Are you willing to use %
instead of =
?
If so Uri.UnescapeDataString
shall be sufficient. if not you can always Replace("=", "%")
and use UnescapeDataString
anyway.
Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d î ᦠtest
Are you willing to use %
instead of =
?
If so Uri.UnescapeDataString
shall be sufficient. if not you can always Replace("=", "%")
and use UnescapeDataString
anyway.
Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d î ᦠtest
answered Feb 5 at 16:42
Bruno Costa
5,0011339
5,0011339
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
I've changed it to a oneliner that only targets those hex codes:new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
â LukStorms
Feb 6 at 9:49
add a comment |Â
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
I've changed it to a oneliner that only targets those hex codes:new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
â LukStorms
Feb 6 at 9:49
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
â LukStorms
Feb 5 at 16:47
I've changed it to a oneliner that only targets those hex codes:
new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.â LukStorms
Feb 6 at 9:49
I've changed it to a oneliner that only targets those hex codes:
new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%")))
And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.â LukStorms
Feb 6 at 9:49
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186843%2fdecode-string-with-hex-character-codes-to-utf-8-characters%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
â Roland Illig
Feb 5 at 17:51