Euler's Totient Function in x86 assembly (MASM)
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
25
down vote
favorite
My task was to implement Euler's totient function, and I'm looking for any and all criticism.
.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD
.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx
mov ecx, x ; init result as x
; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;
; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime
;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:
; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx
pop edx
pop ecx
pop ebx
pop ebp
ret 4
main:
push 100
call _phi@4
INVOKE ExitProcess, 0
END main
Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?
beginner algorithm primes assembly
add a comment |Â
up vote
25
down vote
favorite
My task was to implement Euler's totient function, and I'm looking for any and all criticism.
.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD
.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx
mov ecx, x ; init result as x
; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;
; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime
;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:
; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx
pop edx
pop ecx
pop ebx
pop ebp
ret 4
main:
push 100
call _phi@4
INVOKE ExitProcess, 0
END main
Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?
beginner algorithm primes assembly
2
How about movingx
into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â David Wohlferd
Apr 5 at 8:59
3
To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â John Coleman
Apr 5 at 12:16
3
you need to define efficient so that we know if you're talking about size or speed
â Thomas
Apr 5 at 12:49
4
fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â phuclv
Apr 5 at 16:05
In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â Jules
Apr 5 at 17:32
add a comment |Â
up vote
25
down vote
favorite
up vote
25
down vote
favorite
My task was to implement Euler's totient function, and I'm looking for any and all criticism.
.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD
.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx
mov ecx, x ; init result as x
; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;
; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime
;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:
; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx
pop edx
pop ecx
pop ebx
pop ebp
ret 4
main:
push 100
call _phi@4
INVOKE ExitProcess, 0
END main
Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?
beginner algorithm primes assembly
My task was to implement Euler's totient function, and I'm looking for any and all criticism.
.386
.model flat, stdcall
.stack 4096
ExitProcess PROTO, dwExitCode:DWORD
.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx
mov ecx, x ; init result as x
; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;
; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime
;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:
; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx
pop edx
pop ecx
pop ebx
pop ebp
ret 4
main:
push 100
call _phi@4
INVOKE ExitProcess, 0
END main
Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?
beginner algorithm primes assembly
edited Apr 5 at 18:57
asked Apr 5 at 3:22
William
28615
28615
2
How about movingx
into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â David Wohlferd
Apr 5 at 8:59
3
To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â John Coleman
Apr 5 at 12:16
3
you need to define efficient so that we know if you're talking about size or speed
â Thomas
Apr 5 at 12:49
4
fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â phuclv
Apr 5 at 16:05
In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â Jules
Apr 5 at 17:32
add a comment |Â
2
How about movingx
into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
â David Wohlferd
Apr 5 at 8:59
3
To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â John Coleman
Apr 5 at 12:16
3
you need to define efficient so that we know if you're talking about size or speed
â Thomas
Apr 5 at 12:49
4
fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â phuclv
Apr 5 at 16:05
In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â Jules
Apr 5 at 17:32
2
2
How about moving
x
into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.â David Wohlferd
Apr 5 at 8:59
How about moving
x
into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.â David Wohlferd
Apr 5 at 8:59
3
3
To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â John Coleman
Apr 5 at 12:16
To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â John Coleman
Apr 5 at 12:16
3
3
you need to define efficient so that we know if you're talking about size or speed
â Thomas
Apr 5 at 12:49
you need to define efficient so that we know if you're talking about size or speed
â Thomas
Apr 5 at 12:49
4
4
fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â phuclv
Apr 5 at 16:05
fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â phuclv
Apr 5 at 16:05
In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â Jules
Apr 5 at 17:32
In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â Jules
Apr 5 at 17:32
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
17
down vote
You're not using cdq
properly. cdq
sign extends EAX
into EDX
usually in preparation for an idiv
(signed division) instruction. However, you're using div
(unsigned division), so instead of using cdq
(which can potentially fill EDX
with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv
, though, the cdq
is in the wrong place since it needs to come after loading data into EAX
. And there are a couple of places where you already know that EDX
will be 0 so you don't need the instruction at all.
In place of cmp edx,0
, you can use test edx,edx
. It sets the zero and sign flags the same way but does so with a smaller instruction.
Your L2
loop can be shortened a bit. If EDX
is zero (no remainder after the division), you already know that EDX
is zero, so you don't need to zero it out, and that x
is already in EAX
. So you can loop back to the div
instruction.
1
test vs cmp
â phuclv
Apr 5 at 16:05
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
add a comment |Â
up vote
3
down vote
I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.
If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.
2
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
17
down vote
You're not using cdq
properly. cdq
sign extends EAX
into EDX
usually in preparation for an idiv
(signed division) instruction. However, you're using div
(unsigned division), so instead of using cdq
(which can potentially fill EDX
with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv
, though, the cdq
is in the wrong place since it needs to come after loading data into EAX
. And there are a couple of places where you already know that EDX
will be 0 so you don't need the instruction at all.
In place of cmp edx,0
, you can use test edx,edx
. It sets the zero and sign flags the same way but does so with a smaller instruction.
Your L2
loop can be shortened a bit. If EDX
is zero (no remainder after the division), you already know that EDX
is zero, so you don't need to zero it out, and that x
is already in EAX
. So you can loop back to the div
instruction.
1
test vs cmp
â phuclv
Apr 5 at 16:05
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
add a comment |Â
up vote
17
down vote
You're not using cdq
properly. cdq
sign extends EAX
into EDX
usually in preparation for an idiv
(signed division) instruction. However, you're using div
(unsigned division), so instead of using cdq
(which can potentially fill EDX
with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv
, though, the cdq
is in the wrong place since it needs to come after loading data into EAX
. And there are a couple of places where you already know that EDX
will be 0 so you don't need the instruction at all.
In place of cmp edx,0
, you can use test edx,edx
. It sets the zero and sign flags the same way but does so with a smaller instruction.
Your L2
loop can be shortened a bit. If EDX
is zero (no remainder after the division), you already know that EDX
is zero, so you don't need to zero it out, and that x
is already in EAX
. So you can loop back to the div
instruction.
1
test vs cmp
â phuclv
Apr 5 at 16:05
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
add a comment |Â
up vote
17
down vote
up vote
17
down vote
You're not using cdq
properly. cdq
sign extends EAX
into EDX
usually in preparation for an idiv
(signed division) instruction. However, you're using div
(unsigned division), so instead of using cdq
(which can potentially fill EDX
with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv
, though, the cdq
is in the wrong place since it needs to come after loading data into EAX
. And there are a couple of places where you already know that EDX
will be 0 so you don't need the instruction at all.
In place of cmp edx,0
, you can use test edx,edx
. It sets the zero and sign flags the same way but does so with a smaller instruction.
Your L2
loop can be shortened a bit. If EDX
is zero (no remainder after the division), you already know that EDX
is zero, so you don't need to zero it out, and that x
is already in EAX
. So you can loop back to the div
instruction.
You're not using cdq
properly. cdq
sign extends EAX
into EDX
usually in preparation for an idiv
(signed division) instruction. However, you're using div
(unsigned division), so instead of using cdq
(which can potentially fill EDX
with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv
, though, the cdq
is in the wrong place since it needs to come after loading data into EAX
. And there are a couple of places where you already know that EDX
will be 0 so you don't need the instruction at all.
In place of cmp edx,0
, you can use test edx,edx
. It sets the zero and sign flags the same way but does so with a smaller instruction.
Your L2
loop can be shortened a bit. If EDX
is zero (no remainder after the division), you already know that EDX
is zero, so you don't need to zero it out, and that x
is already in EAX
. So you can loop back to the div
instruction.
edited Apr 5 at 19:54
answered Apr 5 at 3:50
1201ProgramAlarm
2,4752618
2,4752618
1
test vs cmp
â phuclv
Apr 5 at 16:05
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
add a comment |Â
1
test vs cmp
â phuclv
Apr 5 at 16:05
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
1
1
test vs cmp
â phuclv
Apr 5 at 16:05
test vs cmp
â phuclv
Apr 5 at 16:05
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
@LðuVénhPhúc I've incorporated that link into my answer.
â 1201ProgramAlarm
Apr 5 at 19:55
add a comment |Â
up vote
3
down vote
I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.
If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.
2
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
add a comment |Â
up vote
3
down vote
I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.
If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.
2
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
add a comment |Â
up vote
3
down vote
up vote
3
down vote
I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.
If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.
I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.
If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.
answered Apr 5 at 23:09
Roland Illig
10.4k11543
10.4k11543
2
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
add a comment |Â
2
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
2
2
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
â John Coleman
Apr 6 at 13:11
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191301%2feulers-totient-function-in-x86-assembly-masm%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
How about moving
x
into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.â David Wohlferd
Apr 5 at 8:59
3
To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
â John Coleman
Apr 5 at 12:16
3
you need to define efficient so that we know if you're talking about size or speed
â Thomas
Apr 5 at 12:49
4
fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
â phuclv
Apr 5 at 16:05
In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
â Jules
Apr 5 at 17:32