Euler's Totient Function in x86 assembly (MASM)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
25
down vote

favorite

My task was to implement Euler's totient function, and I'm looking for any and all criticism.

.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD 

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
 x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
 push ebp ; setup stack, save regs
 mov ebp, esp
 push ebx
 push ecx
 push edx

 mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
 mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
 mul ebx 
 cmp DWORD PTR x, eax ; loop if( x >= p * p)
 jnge endL3;

; Check if p is a prime factor
 cdq
 mov eax, x
 div ebx 
 cmp edx, 0 ; if(x % p == 0)
 jne noprime

;if yes, update x and result 
L2: cdq 
 mov eax, x
 div ebx
 cmp edx, 0 ; if(x % p == 0)
 jne endL2
 mov x , eax ; x /= p 
 jmp L2
endL2:
 cdq
 mov eax, ecx
 div ebx
 sub ecx, eax ;result -= result / p
noprime: 
 inc ebx 
 jmp L3
endL3: 

; If x has a prime factor greater than sqrt(x)
 cmp DWORD PTR x, 1
 jle finish
 cdq
 mov eax, ecx
 div DWORD PTR x
 sub ecx, eax
finish:
 mov eax, ecx

 pop edx
 pop ecx
 pop ebx
 pop ebp
 ret 4

main:
 push 100
 call _phi@4
 INVOKE ExitProcess, 0

 END main

Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?

edited Apr 5 at 18:57

asked Apr 5 at 3:22

William

28615

2

How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â€“Â David Wohlferd
Apr 5 at 8:59

3

To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â€“Â John Coleman
Apr 5 at 12:16

3

you need to define efficient so that we know if you're talking about size or speed
â€“Â Thomas
Apr 5 at 12:49

4

fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â€“Â phuclv
Apr 5 at 16:05

In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â€“Â Jules
Apr 5 at 17:32

add a commentÂ |Â

up vote
25
down vote

favorite

My task was to implement Euler's totient function, and I'm looking for any and all criticism.

.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD 

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
 x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
 push ebp ; setup stack, save regs
 mov ebp, esp
 push ebx
 push ecx
 push edx

 mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
 mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
 mul ebx 
 cmp DWORD PTR x, eax ; loop if( x >= p * p)
 jnge endL3;

; Check if p is a prime factor
 cdq
 mov eax, x
 div ebx 
 cmp edx, 0 ; if(x % p == 0)
 jne noprime

;if yes, update x and result 
L2: cdq 
 mov eax, x
 div ebx
 cmp edx, 0 ; if(x % p == 0)
 jne endL2
 mov x , eax ; x /= p 
 jmp L2
endL2:
 cdq
 mov eax, ecx
 div ebx
 sub ecx, eax ;result -= result / p
noprime: 
 inc ebx 
 jmp L3
endL3: 

; If x has a prime factor greater than sqrt(x)
 cmp DWORD PTR x, 1
 jle finish
 cdq
 mov eax, ecx
 div DWORD PTR x
 sub ecx, eax
finish:
 mov eax, ecx

 pop edx
 pop ecx
 pop ebx
 pop ebp
 ret 4

main:
 push 100
 call _phi@4
 INVOKE ExitProcess, 0

 END main

Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?

edited Apr 5 at 18:57

asked Apr 5 at 3:22

William

28615

2

How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â€“Â David Wohlferd
Apr 5 at 8:59

3

To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â€“Â John Coleman
Apr 5 at 12:16

3

you need to define efficient so that we know if you're talking about size or speed
â€“Â Thomas
Apr 5 at 12:49

4

fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â€“Â phuclv
Apr 5 at 16:05

In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â€“Â Jules
Apr 5 at 17:32

add a commentÂ |Â

up vote
25
down vote

favorite

My task was to implement Euler's totient function, and I'm looking for any and all criticism.

.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD 

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
 x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
 push ebp ; setup stack, save regs
 mov ebp, esp
 push ebx
 push ecx
 push edx

 mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
 mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
 mul ebx 
 cmp DWORD PTR x, eax ; loop if( x >= p * p)
 jnge endL3;

; Check if p is a prime factor
 cdq
 mov eax, x
 div ebx 
 cmp edx, 0 ; if(x % p == 0)
 jne noprime

;if yes, update x and result 
L2: cdq 
 mov eax, x
 div ebx
 cmp edx, 0 ; if(x % p == 0)
 jne endL2
 mov x , eax ; x /= p 
 jmp L2
endL2:
 cdq
 mov eax, ecx
 div ebx
 sub ecx, eax ;result -= result / p
noprime: 
 inc ebx 
 jmp L3
endL3: 

; If x has a prime factor greater than sqrt(x)
 cmp DWORD PTR x, 1
 jle finish
 cdq
 mov eax, ecx
 div DWORD PTR x
 sub ecx, eax
finish:
 mov eax, ecx

 pop edx
 pop ecx
 pop ebx
 pop ebp
 ret 4

main:
 push 100
 call _phi@4
 INVOKE ExitProcess, 0

 END main

Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?

edited Apr 5 at 18:57

asked Apr 5 at 3:22

William

28615

My task was to implement Euler's totient function, and I'm looking for any and all criticism.

.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD 

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
 x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
 push ebp ; setup stack, save regs
 mov ebp, esp
 push ebx
 push ecx
 push edx

 mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
 mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
 mul ebx 
 cmp DWORD PTR x, eax ; loop if( x >= p * p)
 jnge endL3;

; Check if p is a prime factor
 cdq
 mov eax, x
 div ebx 
 cmp edx, 0 ; if(x % p == 0)
 jne noprime

;if yes, update x and result 
L2: cdq 
 mov eax, x
 div ebx
 cmp edx, 0 ; if(x % p == 0)
 jne endL2
 mov x , eax ; x /= p 
 jmp L2
endL2:
 cdq
 mov eax, ecx
 div ebx
 sub ecx, eax ;result -= result / p
noprime: 
 inc ebx 
 jmp L3
endL3: 

; If x has a prime factor greater than sqrt(x)
 cmp DWORD PTR x, 1
 jle finish
 cdq
 mov eax, ecx
 div DWORD PTR x
 sub ecx, eax
finish:
 mov eax, ecx

 pop edx
 pop ecx
 pop ebx
 pop ebp
 ret 4

main:
 push 100
 call _phi@4
 INVOKE ExitProcess, 0

 END main

Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?

edited Apr 5 at 18:57

asked Apr 5 at 3:22

William

28615

edited Apr 5 at 18:57

asked Apr 5 at 3:22

William

28615

asked Apr 5 at 3:22

William

28615

asked Apr 5 at 3:22

William

28615

2

How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â€“Â David Wohlferd
Apr 5 at 8:59

3

To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â€“Â John Coleman
Apr 5 at 12:16

3

you need to define efficient so that we know if you're talking about size or speed
â€“Â Thomas
Apr 5 at 12:49

4

fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â€“Â phuclv
Apr 5 at 16:05

In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â€“Â Jules
Apr 5 at 17:32

add a commentÂ |Â

2

How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â€“Â David Wohlferd
Apr 5 at 8:59

3

To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â€“Â John Coleman
Apr 5 at 12:16

3

you need to define efficient so that we know if you're talking about size or speed
â€“Â Thomas
Apr 5 at 12:49

4

fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â€“Â phuclv
Apr 5 at 16:05

In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â€“Â Jules
Apr 5 at 17:32

How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â€“Â David Wohlferd
Apr 5 at 8:59

To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â€“Â John Coleman
Apr 5 at 12:16

you need to define efficient so that we know if you're talking about size or speed
â€“Â Thomas
Apr 5 at 12:49

fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â€“Â phuclv
Apr 5 at 16:05

In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â€“Â Jules
Apr 5 at 17:32

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
17
down vote

You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.

edited Apr 5 at 19:54

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

1

test vs cmp
â€“Â phuclv
Apr 5 at 16:05

@LÃ†Â°uVÃ„Â©nhPhÃºc I've incorporated that link into my answer.
â€“Â 1201ProgramAlarm
Apr 5 at 19:55

add a commentÂ |Â

up vote
3
down vote

I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.

If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.

answered Apr 5 at 23:09

Roland Illig

10.4k11543

2

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â€“Â John Coleman
Apr 6 at 13:11

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191301%2feulers-totient-function-in-x86-assembly-masm%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
17
down vote

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

edited Apr 5 at 19:54

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

1

test vs cmp
â€“Â phuclv
Apr 5 at 16:05

@LÃ†Â°uVÃ„Â©nhPhÃºc I've incorporated that link into my answer.
â€“Â 1201ProgramAlarm
Apr 5 at 19:55

add a commentÂ |Â

up vote
17
down vote

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

edited Apr 5 at 19:54

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

1

test vs cmp
â€“Â phuclv
Apr 5 at 16:05

@LÃ†Â°uVÃ„Â©nhPhÃºc I've incorporated that link into my answer.
â€“Â 1201ProgramAlarm
Apr 5 at 19:55

add a commentÂ |Â

up vote
17
down vote

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

edited Apr 5 at 19:54

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.

edited Apr 5 at 19:54

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

edited Apr 5 at 19:54

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

answered Apr 5 at 3:50

1201ProgramAlarm

2,4752618

1

test vs cmp
â€“Â phuclv
Apr 5 at 16:05

add a commentÂ |Â

1

test vs cmp
â€“Â phuclv
Apr 5 at 16:05

test vs cmp
â€“Â phuclv
Apr 5 at 16:05

add a commentÂ |Â

up vote
3
down vote

answered Apr 5 at 23:09

Roland Illig

10.4k11543

2

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â€“Â John Coleman
Apr 6 at 13:11

add a commentÂ |Â

up vote
3
down vote

answered Apr 5 at 23:09

Roland Illig

10.4k11543

2

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â€“Â John Coleman
Apr 6 at 13:11

add a commentÂ |Â

up vote
3
down vote

answered Apr 5 at 23:09

Roland Illig

10.4k11543

answered Apr 5 at 23:09

Roland Illig

10.4k11543

answered Apr 5 at 23:09

Roland Illig

10.4k11543

answered Apr 5 at 23:09

Roland Illig

10.4k11543

answered Apr 5 at 23:09

Roland Illig

10.4k11543

2

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â€“Â John Coleman
Apr 6 at 13:11

add a commentÂ |Â

2

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â€“Â John Coleman
Apr 6 at 13:11

I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â€“Â John Coleman
Apr 6 at 13:11

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr