string length in x64 assembly (fasm)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
10
down vote

favorite

Please critique this very, very basic routine which returns the length of a given char buffer or "string."

strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
 push rdi
 push rcx
 xor rcx, rcx
 mov rcx, -1 

 xor al, al
 cld

 repne scasb 
 neg rcx 
 sub rcx, 1 
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:20

200_success

123k14143398

asked Jul 31 at 14:51

the_endian

37119

2

I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
â€“Â Jonathon Reinhart
Jul 31 at 17:41

don't use xor al, al. In general avoid partial register update like that
â€“Â phuclv
Aug 1 at 2:08

add a commentÂ |Â

up vote
10
down vote

favorite

Please critique this very, very basic routine which returns the length of a given char buffer or "string."

strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
 push rdi
 push rcx
 xor rcx, rcx
 mov rcx, -1 

 xor al, al
 cld

 repne scasb 
 neg rcx 
 sub rcx, 1 
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:20

200_success

123k14143398

asked Jul 31 at 14:51

the_endian

37119

2

I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
â€“Â Jonathon Reinhart
Jul 31 at 17:41

don't use xor al, al. In general avoid partial register update like that
â€“Â phuclv
Aug 1 at 2:08

add a commentÂ |Â

up vote
10
down vote

favorite

Please critique this very, very basic routine which returns the length of a given char buffer or "string."

strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
 push rdi
 push rcx
 xor rcx, rcx
 mov rcx, -1 

 xor al, al
 cld

 repne scasb 
 neg rcx 
 sub rcx, 1 
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:20

200_success

123k14143398

asked Jul 31 at 14:51

the_endian

37119

Please critique this very, very basic routine which returns the length of a given char buffer or "string."

strlen: ; NOTE: RDI IS THE DEFAULT SRC FOR SCASB
 push rdi
 push rcx
 xor rcx, rcx
 mov rcx, -1 

 xor al, al
 cld

 repne scasb 
 neg rcx 
 sub rcx, 1 
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:20

200_success

123k14143398

asked Jul 31 at 14:51

the_endian

37119

edited Jul 31 at 17:20

200_success

123k14143398

edited Jul 31 at 17:20

200_success

123k14143398

edited Jul 31 at 17:20

200_success

123k14143398

asked Jul 31 at 14:51

the_endian

37119

asked Jul 31 at 14:51

the_endian

37119

asked Jul 31 at 14:51

the_endian

37119

2

I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
â€“Â Jonathon Reinhart
Jul 31 at 17:41

don't use xor al, al. In general avoid partial register update like that
â€“Â phuclv
Aug 1 at 2:08

add a commentÂ |Â

2

I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
â€“Â Jonathon Reinhart
Jul 31 at 17:41

don't use xor al, al. In general avoid partial register update like that
â€“Â phuclv
Aug 1 at 2:08

I would be helpful to indicate whether you're coding to the Sys V ABI or the Microsoft ABI for AMD64.
â€“Â Jonathon Reinhart
Jul 31 at 17:41

don't use xor al, al. In general avoid partial register update like that
â€“Â phuclv
Aug 1 at 2:08

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
9
down vote

Saving rcx is usually not necessary, it is not callee-save in common calling conventions. On Linux (and similar) rdi also does not need to be saved, I guess you're using that since the Win64 calling convention does not pass an argument in rdi. You can save them anyway if you want, which can be useful if you're using custom calling conventions. Saving an even number of registers makes the stack not-16-aligned though, you will probably get away with that now, but for example if you call some function that uses XMM registers it may save them at locations that it assumes are aligned (and there are some other cases where it causes trouble).

xor rcx, rcx
mov rcx, -1

The xor is not useful, rcx does not need to be zeroed before overwriting it for correctness reasons, and simply mov-ing into a 64 (or 32) bit register already has no dependency on the previous value. By the way, when you do want to zero a 64bit register, you can use a 32bit xor since writing to the low 32 bits of a register zeroes out the top half of the 64 bit register. There is not really an immediate performance difference, but using the 32bit version often lets you save the REX prefix, unless of course one of the "numbered registers" is an operand.

Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,

neg rcx 
sub rcx, 1 
mov rax, rcx

is equivalent to:

not rcx
mov rax, rcx

So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):

strlen:
 push rdi
 push rcx
 mov rcx, -1
 xor eax, eax
 repne scasb 
 not rcx
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:18

answered Jul 31 at 17:11

harold

59625

1

How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
â€“Â David Wohlferd
Jul 31 at 23:25

xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
â€“Â phuclv
Aug 1 at 2:05

@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
â€“Â David Wohlferd
Aug 1 at 2:53

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200669%2fstring-length-in-x64-assembly-fasm%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
9
down vote

xor rcx, rcx
mov rcx, -1

Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,

neg rcx 
sub rcx, 1 
mov rax, rcx

is equivalent to:

not rcx
mov rax, rcx

So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):

strlen:
 push rdi
 push rcx
 mov rcx, -1
 xor eax, eax
 repne scasb 
 not rcx
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:18

answered Jul 31 at 17:11

harold

59625

1

How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
â€“Â David Wohlferd
Jul 31 at 23:25

xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
â€“Â phuclv
Aug 1 at 2:05

@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
â€“Â David Wohlferd
Aug 1 at 2:53

add a commentÂ |Â

up vote
9
down vote

xor rcx, rcx
mov rcx, -1

Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,

neg rcx 
sub rcx, 1 
mov rax, rcx

is equivalent to:

not rcx
mov rax, rcx

So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):

strlen:
 push rdi
 push rcx
 mov rcx, -1
 xor eax, eax
 repne scasb 
 not rcx
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:18

answered Jul 31 at 17:11

harold

59625

1

How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
â€“Â David Wohlferd
Jul 31 at 23:25

xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
â€“Â phuclv
Aug 1 at 2:05

@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
â€“Â David Wohlferd
Aug 1 at 2:53

add a commentÂ |Â

up vote
9
down vote

xor rcx, rcx
mov rcx, -1

Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,

neg rcx 
sub rcx, 1 
mov rax, rcx

is equivalent to:

not rcx
mov rax, rcx

So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):

strlen:
 push rdi
 push rcx
 mov rcx, -1
 xor eax, eax
 repne scasb 
 not rcx
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:18

answered Jul 31 at 17:11

harold

59625

xor rcx, rcx
mov rcx, -1

Because -x - 1= ~x + 1 - 1 = ~x (using the definition of two's complement, -x = ~x + 1) and you don't use the flags set by the sub,

neg rcx 
sub rcx, 1 
mov rax, rcx

is equivalent to:

not rcx
mov rax, rcx

So all combined, this function could be simplified slightly to (assuming saving rdi and rcx is useful):

strlen:
 push rdi
 push rcx
 mov rcx, -1
 xor eax, eax
 repne scasb 
 not rcx
 mov rax, rcx 
 pop rcx
 pop rdi
 ret

edited Jul 31 at 17:18

answered Jul 31 at 17:11

harold

59625

edited Jul 31 at 17:18

answered Jul 31 at 17:11

harold

59625

answered Jul 31 at 17:11

harold

59625

answered Jul 31 at 17:11

harold

59625

1

How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
â€“Â David Wohlferd
Jul 31 at 23:25

xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
â€“Â phuclv
Aug 1 at 2:05

@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
â€“Â David Wohlferd
Aug 1 at 2:53

add a commentÂ |Â

1

How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
â€“Â David Wohlferd
Jul 31 at 23:25

xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
â€“Â phuclv
Aug 1 at 2:05

@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
â€“Â David Wohlferd
Aug 1 at 2:53

How do you feel about xor ecx, ecx ; dec rcx (5 bytes) instead of mov rcx, -1 (7 bytes)? Or even lea rcx, -1[rax] (4 bytes)? But more importantly: comments. When it comes to asm, I'm a big fan of lots of comments. In particular, if registers are being saved for custom calling reasons (or whatever), you'd certainly want some comments saying so.
â€“Â David Wohlferd
Jul 31 at 23:25

xor r32, r32 should be used even for the high numbered registers, since xor r64, r64 is not recognized in KNL. @DavidWohlferd see Set all bits in CPU register to 1 efficiently
â€“Â phuclv
Aug 1 at 2:05

@phuclv That link seems to like my lea rcx, -1[rax] solution, since we already have a zeroed register we can use (rax).
â€“Â David Wohlferd
Aug 1 at 2:53

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr