Decode string with hex character codes to UTF-8 characters

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












From a system we receive messages that contain codes that represent UTF-8 characters.



For example :



var str="Test =64 =C2=AE =E1=A6=92 test";


To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:



protected static string ReplaceHexCodesInString(string input)


var output = input;
var encoding = Encoding.UTF8;

var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));

var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");

output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));

var regRemainingHex = new Regex("=([0-9A-F]2)");

output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));

return output;



This seems to work as expected for what's currently in those messages.

Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = 𐌸)



But can this be simplified?

Perhaps there's already a standard function?



I searched, but haven't found a good standard build-in C# function that already does this type of conversion.



Well, except for an example that uses a function from System.Net.Mail.

But it seems very error-prone and requires a very specific format.



var input = "bl=61=C2=B0"; 
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;






share|improve this question





















  • Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
    – Roland Illig
    Feb 5 at 17:51
















up vote
3
down vote

favorite












From a system we receive messages that contain codes that represent UTF-8 characters.



For example :



var str="Test =64 =C2=AE =E1=A6=92 test";


To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:



protected static string ReplaceHexCodesInString(string input)


var output = input;
var encoding = Encoding.UTF8;

var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));

var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");

output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));

var regRemainingHex = new Regex("=([0-9A-F]2)");

output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));

return output;



This seems to work as expected for what's currently in those messages.

Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = 𐌸)



But can this be simplified?

Perhaps there's already a standard function?



I searched, but haven't found a good standard build-in C# function that already does this type of conversion.



Well, except for an example that uses a function from System.Net.Mail.

But it seems very error-prone and requires a very specific format.



var input = "bl=61=C2=B0"; 
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;






share|improve this question





















  • Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
    – Roland Illig
    Feb 5 at 17:51












up vote
3
down vote

favorite









up vote
3
down vote

favorite











From a system we receive messages that contain codes that represent UTF-8 characters.



For example :



var str="Test =64 =C2=AE =E1=A6=92 test";


To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:



protected static string ReplaceHexCodesInString(string input)


var output = input;
var encoding = Encoding.UTF8;

var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));

var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");

output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));

var regRemainingHex = new Regex("=([0-9A-F]2)");

output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));

return output;



This seems to work as expected for what's currently in those messages.

Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = 𐌸)



But can this be simplified?

Perhaps there's already a standard function?



I searched, but haven't found a good standard build-in C# function that already does this type of conversion.



Well, except for an example that uses a function from System.Net.Mail.

But it seems very error-prone and requires a very specific format.



var input = "bl=61=C2=B0"; 
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;






share|improve this question













From a system we receive messages that contain codes that represent UTF-8 characters.



For example :



var str="Test =64 =C2=AE =E1=A6=92 test";


To decode these codes to utf-8 I've added a simple function that does 3 regex replacements:



protected static string ReplaceHexCodesInString(string input)


var output = input;
var encoding = Encoding.UTF8;

var regTripleHex = new Regex("=(E[0-9A-F])=([0-9A-F]2)=([0-9A-F]2)");
output = regTripleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[3].Value, System.Globalization.NumberStyles.HexNumber)
));

var regDoubleHex = new Regex("=([C-D][0-9A-F])=([0-9A-F]2)");

output = regDoubleHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber),
byte.Parse(m.Groups[2].Value, System.Globalization.NumberStyles.HexNumber)
));

var regRemainingHex = new Regex("=([0-9A-F]2)");

output = regRemainingHex.Replace(output, m => encoding.GetString(new
byte.Parse(m.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)
));

return output;



This seems to work as expected for what's currently in those messages.

Note that messages don't contain 4-bit utf-8 characters
(f.e. 0xf0 0x90 0x8c 0xb8 = 𐌸)



But can this be simplified?

Perhaps there's already a standard function?



I searched, but haven't found a good standard build-in C# function that already does this type of conversion.



Well, except for an example that uses a function from System.Net.Mail.

But it seems very error-prone and requires a very specific format.



var input = "bl=61=C2=B0"; 
var output = System.Net.Mail.Attachment.CreateAttachmentFromString("", "=?utf-8?Q?" + input.Trim() +"?=").Name;








share|improve this question












share|improve this question




share|improve this question








edited Feb 5 at 17:28









t3chb0t

32.1k54195




32.1k54195









asked Feb 5 at 16:06









LukStorms

1184




1184











  • Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
    – Roland Illig
    Feb 5 at 17:51
















  • Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
    – Roland Illig
    Feb 5 at 17:51















Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
– Roland Illig
Feb 5 at 17:51




Your data is encoded as quoted printable. Maybe this keyword helps you find an existing library function. It definitely exists somewhere.
– Roland Illig
Feb 5 at 17:51










1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










Are you willing to use % instead of =?



If so Uri.UnescapeDataString shall be sufficient. if not you can always Replace("=", "%") and use UnescapeDataString anyway.



Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d ® ᦒ test





share|improve this answer





















  • Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
    – LukStorms
    Feb 5 at 16:47










  • I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
    – LukStorms
    Feb 6 at 9:49











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186843%2fdecode-string-with-hex-character-codes-to-utf-8-characters%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










Are you willing to use % instead of =?



If so Uri.UnescapeDataString shall be sufficient. if not you can always Replace("=", "%") and use UnescapeDataString anyway.



Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d ® ᦒ test





share|improve this answer





















  • Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
    – LukStorms
    Feb 5 at 16:47










  • I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
    – LukStorms
    Feb 6 at 9:49















up vote
2
down vote



accepted










Are you willing to use % instead of =?



If so Uri.UnescapeDataString shall be sufficient. if not you can always Replace("=", "%") and use UnescapeDataString anyway.



Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d ® ᦒ test





share|improve this answer





















  • Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
    – LukStorms
    Feb 5 at 16:47










  • I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
    – LukStorms
    Feb 6 at 9:49













up vote
2
down vote



accepted







up vote
2
down vote



accepted






Are you willing to use % instead of =?



If so Uri.UnescapeDataString shall be sufficient. if not you can always Replace("=", "%") and use UnescapeDataString anyway.



Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d ® ᦒ test





share|improve this answer













Are you willing to use % instead of =?



If so Uri.UnescapeDataString shall be sufficient. if not you can always Replace("=", "%") and use UnescapeDataString anyway.



Uri.UnescapeDataString("Test =64 =C2=AE =E1=A6=92 test".Replace("=", "%"))
//Test d ® ᦒ test






share|improve this answer













share|improve this answer



share|improve this answer











answered Feb 5 at 16:42









Bruno Costa

5,0011339




5,0011339











  • Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
    – LukStorms
    Feb 5 at 16:47










  • I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
    – LukStorms
    Feb 6 at 9:49

















  • Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
    – LukStorms
    Feb 5 at 16:47










  • I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
    – LukStorms
    Feb 6 at 9:49
















Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
– LukStorms
Feb 5 at 16:47




Seems nice at first glance. I'll test that tomorrow. Probably needs some tweaking on the replace. Because not all the "=" will be for a hex code.
– LukStorms
Feb 5 at 16:47












I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
– LukStorms
Feb 6 at 9:49





I've changed it to a oneliner that only targets those hex codes: new Regex("(?:=[0-9A-F]2)+").Replace(input, m => Uri.UnescapeDataString(m.Value.Replace("=","%"))) And after looking up quoted-printable it seems also the "=" signs get hexed. Thanks.
– LukStorms
Feb 6 at 9:49













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186843%2fdecode-string-with-hex-character-codes-to-utf-8-characters%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation