Checksum computation in x86

Checksum computation in x86_64

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

This is an implementation of the TCP/IP checksum computation as described in https://tools.ietf.org/html/rfc1071.

The function is writen in DynASM, an assembler preprocessor that is part of LuaJIT. I didn't use much of the fancy stuff of DynASM so pretty much it looks like native X86_64 I think.

 -- Prologue.
 | push rbp
 | mov rbp, rsp
 -- Accumulative sum (16-bit).
 | xor r9, r9 -- Clear out r9. Stores accumulated sum.
 | mov rcx, 0 -- Set index to 0.
 | 1:
 | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
 | jg >2 -- Go to branch '2' if equals.
 | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
 | mov al, [rdi + rcx+1]
 | add r9, rax -- Sum and accumulate into r9.
 | add rcx, 2 -- Increase index by 2.
 | jmp <1 -- Go to beginning of loop.
 | 2:
 | mov rax, r9 -- Save accumulated sum in rax.
 -- Sum carry until done.
 | 4:
 | shr r9, 16 -- Right-shift r9 16-bit to obtain value of carry.
 | cmp r9, 0 -- Check whether there's carry.
 | jz >5 -- In case not, go to '4'.
 | and rax, 0xffff -- Clear out higher part of rax.
 | add rax, r9 -- Sum carry to rax.
 | mov r9, rax -- Assign rax to r9 to check if there's carry again.
 | jmp <4 -- Go to beggining of loop.
 -- One-complement.
 | 5:
 | not rax -- One-complement of rax.
 | and rax, 0xffff -- Clear out higher part of rax.
 -- Epilogue.
 | mov rsp, rbp
 | pop rbp
 -- Return.
 | ret

I verified against a Lua implementation the result is correct. I was wondering if I could write the code in a different way to use less instructions. For instance, the RFC says it's possible to sum the octets as a dword. I tried but I didn't manage to get a correct checksum.

edited Jan 18 at 8:34

Heslacher

43.9k359152

asked Jan 18 at 0:14

Diego Pino

29329

1

It looks like you don't handle odd length messages correctly. Can you verify?
â€“Â vnp
Jan 18 at 0:28

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 18 at 8:34

@Hestacher Sorry, I didn't know. I appreciate you pointed that out. On the other hand, the guidelines say that is OK to post improved code as an answer ("If you want to show everyone how you improved your code, but don't want to ask another question, then post an answer to your own question. Selfie answers are acceptable on Stack Exchange sites, and even encouraged"). Thus, I posted the update as an answer.
â€“Â Diego Pino
Jan 18 at 9:02

add a commentÂ |Â

up vote
3
down vote

favorite

This is an implementation of the TCP/IP checksum computation as described in https://tools.ietf.org/html/rfc1071.

The function is writen in DynASM, an assembler preprocessor that is part of LuaJIT. I didn't use much of the fancy stuff of DynASM so pretty much it looks like native X86_64 I think.

 -- Prologue.
 | push rbp
 | mov rbp, rsp
 -- Accumulative sum (16-bit).
 | xor r9, r9 -- Clear out r9. Stores accumulated sum.
 | mov rcx, 0 -- Set index to 0.
 | 1:
 | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
 | jg >2 -- Go to branch '2' if equals.
 | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
 | mov al, [rdi + rcx+1]
 | add r9, rax -- Sum and accumulate into r9.
 | add rcx, 2 -- Increase index by 2.
 | jmp <1 -- Go to beginning of loop.
 | 2:
 | mov rax, r9 -- Save accumulated sum in rax.
 -- Sum carry until done.
 | 4:
 | shr r9, 16 -- Right-shift r9 16-bit to obtain value of carry.
 | cmp r9, 0 -- Check whether there's carry.
 | jz >5 -- In case not, go to '4'.
 | and rax, 0xffff -- Clear out higher part of rax.
 | add rax, r9 -- Sum carry to rax.
 | mov r9, rax -- Assign rax to r9 to check if there's carry again.
 | jmp <4 -- Go to beggining of loop.
 -- One-complement.
 | 5:
 | not rax -- One-complement of rax.
 | and rax, 0xffff -- Clear out higher part of rax.
 -- Epilogue.
 | mov rsp, rbp
 | pop rbp
 -- Return.
 | ret

edited Jan 18 at 8:34

Heslacher

43.9k359152

asked Jan 18 at 0:14

Diego Pino

29329

1

It looks like you don't handle odd length messages correctly. Can you verify?
â€“Â vnp
Jan 18 at 0:28

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 18 at 8:34

@Hestacher Sorry, I didn't know. I appreciate you pointed that out. On the other hand, the guidelines say that is OK to post improved code as an answer ("If you want to show everyone how you improved your code, but don't want to ask another question, then post an answer to your own question. Selfie answers are acceptable on Stack Exchange sites, and even encouraged"). Thus, I posted the update as an answer.
â€“Â Diego Pino
Jan 18 at 9:02

add a commentÂ |Â

up vote
3
down vote

favorite

This is an implementation of the TCP/IP checksum computation as described in https://tools.ietf.org/html/rfc1071.

The function is writen in DynASM, an assembler preprocessor that is part of LuaJIT. I didn't use much of the fancy stuff of DynASM so pretty much it looks like native X86_64 I think.

 -- Prologue.
 | push rbp
 | mov rbp, rsp
 -- Accumulative sum (16-bit).
 | xor r9, r9 -- Clear out r9. Stores accumulated sum.
 | mov rcx, 0 -- Set index to 0.
 | 1:
 | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
 | jg >2 -- Go to branch '2' if equals.
 | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
 | mov al, [rdi + rcx+1]
 | add r9, rax -- Sum and accumulate into r9.
 | add rcx, 2 -- Increase index by 2.
 | jmp <1 -- Go to beginning of loop.
 | 2:
 | mov rax, r9 -- Save accumulated sum in rax.
 -- Sum carry until done.
 | 4:
 | shr r9, 16 -- Right-shift r9 16-bit to obtain value of carry.
 | cmp r9, 0 -- Check whether there's carry.
 | jz >5 -- In case not, go to '4'.
 | and rax, 0xffff -- Clear out higher part of rax.
 | add rax, r9 -- Sum carry to rax.
 | mov r9, rax -- Assign rax to r9 to check if there's carry again.
 | jmp <4 -- Go to beggining of loop.
 -- One-complement.
 | 5:
 | not rax -- One-complement of rax.
 | and rax, 0xffff -- Clear out higher part of rax.
 -- Epilogue.
 | mov rsp, rbp
 | pop rbp
 -- Return.
 | ret

edited Jan 18 at 8:34

Heslacher

43.9k359152

asked Jan 18 at 0:14

Diego Pino

29329

This is an implementation of the TCP/IP checksum computation as described in https://tools.ietf.org/html/rfc1071.

The function is writen in DynASM, an assembler preprocessor that is part of LuaJIT. I didn't use much of the fancy stuff of DynASM so pretty much it looks like native X86_64 I think.

 -- Prologue.
 | push rbp
 | mov rbp, rsp
 -- Accumulative sum (16-bit).
 | xor r9, r9 -- Clear out r9. Stores accumulated sum.
 | mov rcx, 0 -- Set index to 0.
 | 1:
 | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
 | jg >2 -- Go to branch '2' if equals.
 | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
 | mov al, [rdi + rcx+1]
 | add r9, rax -- Sum and accumulate into r9.
 | add rcx, 2 -- Increase index by 2.
 | jmp <1 -- Go to beginning of loop.
 | 2:
 | mov rax, r9 -- Save accumulated sum in rax.
 -- Sum carry until done.
 | 4:
 | shr r9, 16 -- Right-shift r9 16-bit to obtain value of carry.
 | cmp r9, 0 -- Check whether there's carry.
 | jz >5 -- In case not, go to '4'.
 | and rax, 0xffff -- Clear out higher part of rax.
 | add rax, r9 -- Sum carry to rax.
 | mov r9, rax -- Assign rax to r9 to check if there's carry again.
 | jmp <4 -- Go to beggining of loop.
 -- One-complement.
 | 5:
 | not rax -- One-complement of rax.
 | and rax, 0xffff -- Clear out higher part of rax.
 -- Epilogue.
 | mov rsp, rbp
 | pop rbp
 -- Return.
 | ret

edited Jan 18 at 8:34

Heslacher

43.9k359152

asked Jan 18 at 0:14

Diego Pino

29329

edited Jan 18 at 8:34

Heslacher

43.9k359152

edited Jan 18 at 8:34

Heslacher

43.9k359152

edited Jan 18 at 8:34

Heslacher

43.9k359152

asked Jan 18 at 0:14

Diego Pino

29329

asked Jan 18 at 0:14

Diego Pino

29329

asked Jan 18 at 0:14

Diego Pino

29329

1

It looks like you don't handle odd length messages correctly. Can you verify?
â€“Â vnp
Jan 18 at 0:28

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 18 at 8:34

@Hestacher Sorry, I didn't know. I appreciate you pointed that out. On the other hand, the guidelines say that is OK to post improved code as an answer ("If you want to show everyone how you improved your code, but don't want to ask another question, then post an answer to your own question. Selfie answers are acceptable on Stack Exchange sites, and even encouraged"). Thus, I posted the update as an answer.
â€“Â Diego Pino
Jan 18 at 9:02

add a commentÂ |Â

1

It looks like you don't handle odd length messages correctly. Can you verify?
â€“Â vnp
Jan 18 at 0:28

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 18 at 8:34

@Hestacher Sorry, I didn't know. I appreciate you pointed that out. On the other hand, the guidelines say that is OK to post improved code as an answer ("If you want to show everyone how you improved your code, but don't want to ask another question, then post an answer to your own question. Selfie answers are acceptable on Stack Exchange sites, and even encouraged"). Thus, I posted the update as an answer.
â€“Â Diego Pino
Jan 18 at 9:02

It looks like you don't handle odd length messages correctly. Can you verify?
â€“Â vnp
Jan 18 at 0:28

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 18 at 8:34

@Hestacher Sorry, I didn't know. I appreciate you pointed that out. On the other hand, the guidelines say that is OK to post improved code as an answer ("If you want to show everyone how you improved your code, but don't want to ask another question, then post an answer to your own question. Selfie answers are acceptable on Stack Exchange sites, and even encouraged"). Thus, I posted the update as an answer.
â€“Â Diego Pino
Jan 18 at 9:02

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Your code, as shown, has a bug in that it will read past the end of the buffer you're computing the checksum for (by either 1 or 2 bytes, depending on if the length is odd or even). As that is easy enough to fix, I'll assume you make the appropriate changes to the code and move on to answering your question.

If you read over Section 2, paragraphs (B), of the RFC, you'll see that the addition can be done in either byte order. So rather than using two instructions to byte-swap the two bytes you're reading, just do one (mov ax,[rdi+rcx]). This will result in the final checksum being byte-swapped, so you'll have to do the swap (xchg al,ah) to get the same value to return from this function. If you're not doing anything with the checksum other than storing it into the TCP/IP packet, don't swap the return value, and don't swap the bytes when you write them out.

You can remove the cmp r9,0 instruction in the 4 block, as the shr instruction will set the z flag if the new register value is 0.

Now, to expand this method to utilize 32 or 64 bit registers, you can replace the (newly added) mov ax instruction with mov eax or mov rax (with appropriate changes to the value of ecx). This will need adjustments at the end to shift, add, and handle the carry to collapse the multiple bytes pairs into one, and the carry handling with the 64 bit register will also need to be different (inserting an adc r9,0 after the add r9,rax instruction). Also, the handling for packets with sizes that are not multiples of 4 (or 8) will need to be added.

Details of this technique are discussed in the RFC.

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

Thanks a lot for the suggestions. I updated the question with a diff of the changes implemented.
â€“Â Diego Pino
Jan 18 at 8:25

add a commentÂ |Â

up vote
0
down vote

Here is a diff with the changes suggested by @1201ProgramAlarm. The most important update is not overreading the buffer and handling the last byte in case the size of the array is an odd number. Then there are other suggestions implemented such as reading one 16-bit word in one read and not swapping the bytes. I agree it makes more sense the function returns the checksum in network-byte order and only convert it to host-byte order if it's needed to print out.

6c6
< | mov rcx, 0 -- Set index to 0.
---
> | mov rcx, rsi -- Init index to second-parameter (size).
8,11c8,12
< | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
< | jg >2 -- Go to branch '2' if equals.
< | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
< | mov al, [rdi + rcx+1]
---
> | cmp rcx, 1 -- If index is less or equal to 1.
> | jle >2 -- Go to branch '2'.
> | mov r8, rsi -- Set size to r8.
> | sub r8, rcx -- Decrease r8 by index.
> | mov ax, [rdi + r8] -- Fetch 16-bit word from data + r8.
13c14
< | add rcx, 2 -- Increase index by 2.
---
> | sub rcx, 2 -- Decrease index by 2.
14a16
> -- Handle last byte of data.
15a18,24
> | cmp rcx, 1 -- If rcx was equals to 1.
> | jne >3
> | mov rcx, rsi -- Set rcx to rsi-1.
> | sub rcx, 1
> | mov ax, [rdi + rcx] -- Read last byte of data.
> | add r9, rax -- And add it to r9.
> | 3:
20d28
< | cmp r9, 0 -- Check whether there's carry.

answered Jan 18 at 9:02

Diego Pino

29329

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185354%2fchecksum-computation-in-x86-64%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can remove the cmp r9,0 instruction in the 4 block, as the shr instruction will set the z flag if the new register value is 0.

Details of this technique are discussed in the RFC.

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

Thanks a lot for the suggestions. I updated the question with a diff of the changes implemented.
â€“Â Diego Pino
Jan 18 at 8:25

add a commentÂ |Â

up vote
2
down vote

accepted

You can remove the cmp r9,0 instruction in the 4 block, as the shr instruction will set the z flag if the new register value is 0.

Details of this technique are discussed in the RFC.

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

Thanks a lot for the suggestions. I updated the question with a diff of the changes implemented.
â€“Â Diego Pino
Jan 18 at 8:25

add a commentÂ |Â

up vote
2
down vote

accepted

You can remove the cmp r9,0 instruction in the 4 block, as the shr instruction will set the z flag if the new register value is 0.

Details of this technique are discussed in the RFC.

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

You can remove the cmp r9,0 instruction in the 4 block, as the shr instruction will set the z flag if the new register value is 0.

Details of this technique are discussed in the RFC.

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

answered Jan 18 at 1:28

1201ProgramAlarm

2,5852618

Thanks a lot for the suggestions. I updated the question with a diff of the changes implemented.
â€“Â Diego Pino
Jan 18 at 8:25

add a commentÂ |Â

Thanks a lot for the suggestions. I updated the question with a diff of the changes implemented.
â€“Â Diego Pino
Jan 18 at 8:25

Thanks a lot for the suggestions. I updated the question with a diff of the changes implemented.
â€“Â Diego Pino
Jan 18 at 8:25

add a commentÂ |Â

up vote
0
down vote

6c6
< | mov rcx, 0 -- Set index to 0.
---
> | mov rcx, rsi -- Init index to second-parameter (size).
8,11c8,12
< | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
< | jg >2 -- Go to branch '2' if equals.
< | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
< | mov al, [rdi + rcx+1]
---
> | cmp rcx, 1 -- If index is less or equal to 1.
> | jle >2 -- Go to branch '2'.
> | mov r8, rsi -- Set size to r8.
> | sub r8, rcx -- Decrease r8 by index.
> | mov ax, [rdi + r8] -- Fetch 16-bit word from data + r8.
13c14
< | add rcx, 2 -- Increase index by 2.
---
> | sub rcx, 2 -- Decrease index by 2.
14a16
> -- Handle last byte of data.
15a18,24
> | cmp rcx, 1 -- If rcx was equals to 1.
> | jne >3
> | mov rcx, rsi -- Set rcx to rsi-1.
> | sub rcx, 1
> | mov ax, [rdi + rcx] -- Read last byte of data.
> | add r9, rax -- And add it to r9.
> | 3:
20d28
< | cmp r9, 0 -- Check whether there's carry.

answered Jan 18 at 9:02

Diego Pino

29329

add a commentÂ |Â

up vote
0
down vote

6c6
< | mov rcx, 0 -- Set index to 0.
---
> | mov rcx, rsi -- Init index to second-parameter (size).
8,11c8,12
< | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
< | jg >2 -- Go to branch '2' if equals.
< | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
< | mov al, [rdi + rcx+1]
---
> | cmp rcx, 1 -- If index is less or equal to 1.
> | jle >2 -- Go to branch '2'.
> | mov r8, rsi -- Set size to r8.
> | sub r8, rcx -- Decrease r8 by index.
> | mov ax, [rdi + r8] -- Fetch 16-bit word from data + r8.
13c14
< | add rcx, 2 -- Increase index by 2.
---
> | sub rcx, 2 -- Decrease index by 2.
14a16
> -- Handle last byte of data.
15a18,24
> | cmp rcx, 1 -- If rcx was equals to 1.
> | jne >3
> | mov rcx, rsi -- Set rcx to rsi-1.
> | sub rcx, 1
> | mov ax, [rdi + rcx] -- Read last byte of data.
> | add r9, rax -- And add it to r9.
> | 3:
20d28
< | cmp r9, 0 -- Check whether there's carry.

answered Jan 18 at 9:02

Diego Pino

29329

add a commentÂ |Â

up vote
0
down vote

6c6
< | mov rcx, 0 -- Set index to 0.
---
> | mov rcx, rsi -- Init index to second-parameter (size).
8,11c8,12
< | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
< | jg >2 -- Go to branch '2' if equals.
< | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
< | mov al, [rdi + rcx+1]
---
> | cmp rcx, 1 -- If index is less or equal to 1.
> | jle >2 -- Go to branch '2'.
> | mov r8, rsi -- Set size to r8.
> | sub r8, rcx -- Decrease r8 by index.
> | mov ax, [rdi + r8] -- Fetch 16-bit word from data + r8.
13c14
< | add rcx, 2 -- Increase index by 2.
---
> | sub rcx, 2 -- Decrease index by 2.
14a16
> -- Handle last byte of data.
15a18,24
> | cmp rcx, 1 -- If rcx was equals to 1.
> | jne >3
> | mov rcx, rsi -- Set rcx to rsi-1.
> | sub rcx, 1
> | mov ax, [rdi + rcx] -- Read last byte of data.
> | add r9, rax -- And add it to r9.
> | 3:
20d28
< | cmp r9, 0 -- Check whether there's carry.

answered Jan 18 at 9:02

Diego Pino

29329

6c6
< | mov rcx, 0 -- Set index to 0.
---
> | mov rcx, rsi -- Init index to second-parameter (size).
8,11c8,12
< | cmp rcx, rsi -- Rsi stores size arg. Compare index to size.
< | jg >2 -- Go to branch '2' if equals.
< | mov ah, [rdi + rcx] -- Read 16-bit of data and swapping byte order.
< | mov al, [rdi + rcx+1]
---
> | cmp rcx, 1 -- If index is less or equal to 1.
> | jle >2 -- Go to branch '2'.
> | mov r8, rsi -- Set size to r8.
> | sub r8, rcx -- Decrease r8 by index.
> | mov ax, [rdi + r8] -- Fetch 16-bit word from data + r8.
13c14
< | add rcx, 2 -- Increase index by 2.
---
> | sub rcx, 2 -- Decrease index by 2.
14a16
> -- Handle last byte of data.
15a18,24
> | cmp rcx, 1 -- If rcx was equals to 1.
> | jne >3
> | mov rcx, rsi -- Set rcx to rsi-1.
> | sub rcx, 1
> | mov ax, [rdi + rcx] -- Read last byte of data.
> | add r9, rax -- And add it to r9.
> | 3:
20d28
< | cmp r9, 0 -- Check whether there's carry.

answered Jan 18 at 9:02

Diego Pino

29329

answered Jan 18 at 9:02

Diego Pino

29329

answered Jan 18 at 9:02

Diego Pino

29329

answered Jan 18 at 9:02

Diego Pino

29329

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr