string length in x64 assembly (fasm)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
10
down vote

favorite












Please critique this very, very basic routine which returns the length of a given char buffer or "string."



strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
push rdi
push rcx
xor rcx, rcx
mov rcx, -1

xor al, al
cld

repne scasb
neg rcx
sub rcx, 1
mov rax, rcx
pop rcx
pop rdi
ret






share|improve this question

















  • 2




    I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
    – Jonathon Reinhart
    Jul 31 at 17:41











  • don't use xor al, al. In general avoid partial register update like that
    – phuclv
    Aug 1 at 2:08
















up vote
10
down vote

favorite












Please critique this very, very basic routine which returns the length of a given char buffer or "string."



strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
push rdi
push rcx
xor rcx, rcx
mov rcx, -1

xor al, al
cld

repne scasb
neg rcx
sub rcx, 1
mov rax, rcx
pop rcx
pop rdi
ret






share|improve this question

















  • 2




    I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
    – Jonathon Reinhart
    Jul 31 at 17:41











  • don't use xor al, al. In general avoid partial register update like that
    – phuclv
    Aug 1 at 2:08












up vote
10
down vote

favorite









up vote
10
down vote

favorite











Please critique this very, very basic routine which returns the length of a given char buffer or "string."



strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
push rdi
push rcx
xor rcx, rcx
mov rcx, -1

xor al, al
cld

repne scasb
neg rcx
sub rcx, 1
mov rax, rcx
pop rcx
pop rdi
ret






share|improve this question













Please critique this very, very basic routine which returns the length of a given char buffer or "string."



strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
push rdi
push rcx
xor rcx, rcx
mov rcx, -1

xor al, al
cld

repne scasb
neg rcx
sub rcx, 1
mov rax, rcx
pop rcx
pop rdi
ret








share|improve this question












share|improve this question




share|improve this question








edited Jul 31 at 17:20









200_success

123k14143398




123k14143398









asked Jul 31 at 14:51









the_endian

37119




37119







  • 2




    I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
    – Jonathon Reinhart
    Jul 31 at 17:41











  • don't use xor al, al. In general avoid partial register update like that
    – phuclv
    Aug 1 at 2:08












  • 2




    I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
    – Jonathon Reinhart
    Jul 31 at 17:41











  • don't use xor al, al. In general avoid partial register update like that
    – phuclv
    Aug 1 at 2:08







2




2




I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
– Jonathon Reinhart
Jul 31 at 17:41





I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
– Jonathon Reinhart
Jul 31 at 17:41













don't use xor al, al. In general avoid partial register update like that
– phuclv
Aug 1 at 2:08




don't use xor al, al. In general avoid partial register update like that
– phuclv
Aug 1 at 2:08










1 Answer
1






active

oldest

votes

















up vote
9
down vote













Saving rcx is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).



xor rcx, rcx
mov rcx, -1


The xor is not useful, rcx does not need to be zeroed before overwriting it for correctness reasons, and simply mov-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.



Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,



neg rcx 
sub rcx, 1
mov rax, rcx


is equivalent to:



not rcx
mov rax, rcx


So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):



strlen:
push rdi
push rcx
mov rcx, -1
xor eax, eax
repne scasb
not rcx
mov rax, rcx
pop rcx
pop rdi
ret





share|improve this answer



















  • 1




    How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
    – David Wohlferd
    Jul 31 at 23:25











  • xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
    – phuclv
    Aug 1 at 2:05










  • @phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
    – David Wohlferd
    Aug 1 at 2:53










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200669%2fstring-length-in-x64-assembly-fasm%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
9
down vote













Saving rcx is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).



xor rcx, rcx
mov rcx, -1


The xor is not useful, rcx does not need to be zeroed before overwriting it for correctness reasons, and simply mov-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.



Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,



neg rcx 
sub rcx, 1
mov rax, rcx


is equivalent to:



not rcx
mov rax, rcx


So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):



strlen:
push rdi
push rcx
mov rcx, -1
xor eax, eax
repne scasb
not rcx
mov rax, rcx
pop rcx
pop rdi
ret





share|improve this answer



















  • 1




    How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
    – David Wohlferd
    Jul 31 at 23:25











  • xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
    – phuclv
    Aug 1 at 2:05










  • @phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
    – David Wohlferd
    Aug 1 at 2:53














up vote
9
down vote













Saving rcx is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).



xor rcx, rcx
mov rcx, -1


The xor is not useful, rcx does not need to be zeroed before overwriting it for correctness reasons, and simply mov-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.



Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,



neg rcx 
sub rcx, 1
mov rax, rcx


is equivalent to:



not rcx
mov rax, rcx


So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):



strlen:
push rdi
push rcx
mov rcx, -1
xor eax, eax
repne scasb
not rcx
mov rax, rcx
pop rcx
pop rdi
ret





share|improve this answer



















  • 1




    How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
    – David Wohlferd
    Jul 31 at 23:25











  • xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
    – phuclv
    Aug 1 at 2:05










  • @phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
    – David Wohlferd
    Aug 1 at 2:53












up vote
9
down vote










up vote
9
down vote









Saving rcx is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).



xor rcx, rcx
mov rcx, -1


The xor is not useful, rcx does not need to be zeroed before overwriting it for correctness reasons, and simply mov-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.



Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,



neg rcx 
sub rcx, 1
mov rax, rcx


is equivalent to:



not rcx
mov rax, rcx


So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):



strlen:
push rdi
push rcx
mov rcx, -1
xor eax, eax
repne scasb
not rcx
mov rax, rcx
pop rcx
pop rdi
ret





share|improve this answer















Saving rcx is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).



xor rcx, rcx
mov rcx, -1


The xor is not useful, rcx does not need to be zeroed before overwriting it for correctness reasons, and simply mov-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.



Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,



neg rcx 
sub rcx, 1
mov rax, rcx


is equivalent to:



not rcx
mov rax, rcx


So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):



strlen:
push rdi
push rcx
mov rcx, -1
xor eax, eax
repne scasb
not rcx
mov rax, rcx
pop rcx
pop rdi
ret






share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 31 at 17:18


























answered Jul 31 at 17:11









harold

59625




59625







  • 1




    How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
    – David Wohlferd
    Jul 31 at 23:25











  • xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
    – phuclv
    Aug 1 at 2:05










  • @phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
    – David Wohlferd
    Aug 1 at 2:53












  • 1




    How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
    – David Wohlferd
    Jul 31 at 23:25











  • xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
    – phuclv
    Aug 1 at 2:05










  • @phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
    – David Wohlferd
    Aug 1 at 2:53







1




1




How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
– David Wohlferd
Jul 31 at 23:25





How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
– David Wohlferd
Jul 31 at 23:25













xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
– phuclv
Aug 1 at 2:05




xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
– phuclv
Aug 1 at 2:05












@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
– David Wohlferd
Aug 1 at 2:53




@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
– David Wohlferd
Aug 1 at 2:53












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200669%2fstring-length-in-x64-assembly-fasm%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation