Euler's Totient Function in x86 assembly (MASM)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
25
down vote

favorite
4












My task was to implement Euler's totient function, and I'm looking for any and all criticism.



.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx

mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;

; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime

;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:

; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx

pop edx
pop ecx
pop ebx
pop ebp
ret 4

main:
push 100
call _phi@4
INVOKE ExitProcess, 0

END main


Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?







share|improve this question

















  • 2




    How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
    – David Wohlferd
    Apr 5 at 8:59






  • 3




    To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
    – John Coleman
    Apr 5 at 12:16






  • 3




    you need to define efficient so that we know if you're talking about size or speed
    – Thomas
    Apr 5 at 12:49






  • 4




    fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
    – phuclv
    Apr 5 at 16:05










  • In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
    – Jules
    Apr 5 at 17:32

















up vote
25
down vote

favorite
4












My task was to implement Euler's totient function, and I'm looking for any and all criticism.



.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx

mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;

; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime

;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:

; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx

pop edx
pop ecx
pop ebx
pop ebp
ret 4

main:
push 100
call _phi@4
INVOKE ExitProcess, 0

END main


Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?







share|improve this question

















  • 2




    How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
    – David Wohlferd
    Apr 5 at 8:59






  • 3




    To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
    – John Coleman
    Apr 5 at 12:16






  • 3




    you need to define efficient so that we know if you're talking about size or speed
    – Thomas
    Apr 5 at 12:49






  • 4




    fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
    – phuclv
    Apr 5 at 16:05










  • In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
    – Jules
    Apr 5 at 17:32













up vote
25
down vote

favorite
4









up vote
25
down vote

favorite
4






4





My task was to implement Euler's totient function, and I'm looking for any and all criticism.



.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx

mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;

; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime

;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:

; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx

pop edx
pop ecx
pop ebx
pop ebp
ret 4

main:
push 100
call _phi@4
INVOKE ExitProcess, 0

END main


Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?







share|improve this question













My task was to implement Euler's totient function, and I'm looking for any and all criticism.



.386 
.model flat, stdcall
.stack 4096

ExitProcess PROTO, dwExitCode:DWORD

.code
;------------------------------------------------------------------------------
; _phi@4 (int x)
;
; Calculates Euler's totient function using the product formula.
;
; param:
x TEXTEQU <[ebp+8]> ;integer argument
;
; Exit: The result is stored in eax
;------------------------------------------------------------------------------
_phi@4:
push ebp ; setup stack, save regs
mov ebp, esp
push ebx
push ecx
push edx

mov ecx, x ; init result as x

; Consider the prime factors of x and subtract their multiples
; from the result
mov ebx, 2 ; initialize loop counter p
L3: mov eax, ebx
mul ebx
cmp DWORD PTR x, eax ; loop if( x >= p * p)
jnge endL3;

; Check if p is a prime factor
cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne noprime

;if yes, update x and result
L2: cdq
mov eax, x
div ebx
cmp edx, 0 ; if(x % p == 0)
jne endL2
mov x , eax ; x /= p
jmp L2
endL2:
cdq
mov eax, ecx
div ebx
sub ecx, eax ;result -= result / p
noprime:
inc ebx
jmp L3
endL3:

; If x has a prime factor greater than sqrt(x)
cmp DWORD PTR x, 1
jle finish
cdq
mov eax, ecx
div DWORD PTR x
sub ecx, eax
finish:
mov eax, ecx

pop edx
pop ecx
pop ebx
pop ebp
ret 4

main:
push 100
call _phi@4
INVOKE ExitProcess, 0

END main


Is there any way I can make the code more efficient? - possibly with fewer instructions? Also, I'm looking for pointers on style. Is the code readable?









share|improve this question












share|improve this question




share|improve this question








edited Apr 5 at 18:57
























asked Apr 5 at 3:22









William

28615




28615







  • 2




    How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
    – David Wohlferd
    Apr 5 at 8:59






  • 3




    To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
    – John Coleman
    Apr 5 at 12:16






  • 3




    you need to define efficient so that we know if you're talking about size or speed
    – Thomas
    Apr 5 at 12:49






  • 4




    fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
    – phuclv
    Apr 5 at 16:05










  • In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
    – Jules
    Apr 5 at 17:32













  • 2




    How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
    – David Wohlferd
    Apr 5 at 8:59






  • 3




    To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
    – John Coleman
    Apr 5 at 12:16






  • 3




    you need to define efficient so that we know if you're talking about size or speed
    – Thomas
    Apr 5 at 12:49






  • 4




    fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
    – phuclv
    Apr 5 at 16:05










  • In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
    – Jules
    Apr 5 at 17:32








2




2




How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
– David Wohlferd
Apr 5 at 8:59




How about moving x into a register (say esi) rather than repeatedly read/writing memory? You could read it once on entry, and write it once at exit.
– David Wohlferd
Apr 5 at 8:59




3




3




To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
– John Coleman
Apr 5 at 12:16




To make the code more efficient would require more rather than fewer instructions. You seem to be using one of the crudest possible brute-force factoring algorithms. A better factoring algorithm would make it quicker but much harder to write in assembly. Factoring itself is somewhat unavoidable since computing the totient of a semiprime (a product of two distinct primes) is provably equivalent to factoring it.
– John Coleman
Apr 5 at 12:16




3




3




you need to define efficient so that we know if you're talking about size or speed
– Thomas
Apr 5 at 12:49




you need to define efficient so that we know if you're talking about size or speed
– Thomas
Apr 5 at 12:49




4




4




fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
– phuclv
Apr 5 at 16:05




fewer instruction doesn't mean faster execution. For example unrolling the loop will make it faster as long as it still fits in cache
– phuclv
Apr 5 at 16:05












In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
– Jules
Apr 5 at 17:32





In terms of eliminating unnecessary instructions: you're saving the caller's EBP and then setting up a stack frame pointer using the register, but you don't then have any real need to use that frame pointer at all, so you could just as easily leave the caller's EBP alone for the entire function. You'd have to rewrite your 'x' equ to be relative to ESP rather than EBP, but as your stack usage is constant throughout the function that's an easy job.
– Jules
Apr 5 at 17:32











2 Answers
2






active

oldest

votes

















up vote
17
down vote













You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.



In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.



Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.






share|improve this answer



















  • 1




    test vs cmp
    – phuclv
    Apr 5 at 16:05










  • @LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
    – 1201ProgramAlarm
    Apr 5 at 19:55

















up vote
3
down vote













I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.



If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.






share|improve this answer

















  • 2




    I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
    – John Coleman
    Apr 6 at 13:11










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191301%2feulers-totient-function-in-x86-assembly-masm%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
17
down vote













You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.



In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.



Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.






share|improve this answer



















  • 1




    test vs cmp
    – phuclv
    Apr 5 at 16:05










  • @LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
    – 1201ProgramAlarm
    Apr 5 at 19:55














up vote
17
down vote













You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.



In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.



Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.






share|improve this answer



















  • 1




    test vs cmp
    – phuclv
    Apr 5 at 16:05










  • @LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
    – 1201ProgramAlarm
    Apr 5 at 19:55












up vote
17
down vote










up vote
17
down vote









You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.



In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.



Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.






share|improve this answer















You're not using cdq properly. cdq sign extends EAX into EDX usually in preparation for an idiv (signed division) instruction. However, you're using div (unsigned division), so instead of using cdq (which can potentially fill EDX with 0xFFFFFFFF), you should zero out that register instead. If you were doing an idiv, though, the cdq is in the wrong place since it needs to come after loading data into EAX. And there are a couple of places where you already know that EDX will be 0 so you don't need the instruction at all.



In place of cmp edx,0, you can use test edx,edx. It sets the zero and sign flags the same way but does so with a smaller instruction.



Your L2 loop can be shortened a bit. If EDX is zero (no remainder after the division), you already know that EDX is zero, so you don't need to zero it out, and that x is already in EAX. So you can loop back to the div instruction.







share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 5 at 19:54


























answered Apr 5 at 3:50









1201ProgramAlarm

2,4752618




2,4752618







  • 1




    test vs cmp
    – phuclv
    Apr 5 at 16:05










  • @LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
    – 1201ProgramAlarm
    Apr 5 at 19:55












  • 1




    test vs cmp
    – phuclv
    Apr 5 at 16:05










  • @LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
    – 1201ProgramAlarm
    Apr 5 at 19:55







1




1




test vs cmp
– phuclv
Apr 5 at 16:05




test vs cmp
– phuclv
Apr 5 at 16:05












@LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
– 1201ProgramAlarm
Apr 5 at 19:55




@LÆ°uVÄ©nhPhúc I've incorporated that link into my answer.
– 1201ProgramAlarm
Apr 5 at 19:55












up vote
3
down vote













I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.



If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.






share|improve this answer

















  • 2




    I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
    – John Coleman
    Apr 6 at 13:11














up vote
3
down vote













I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.



If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.






share|improve this answer

















  • 2




    I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
    – John Coleman
    Apr 6 at 13:11












up vote
3
down vote










up vote
3
down vote









I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.



If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.






share|improve this answer













I don't see any reason to write this function in assembler. The effect is currently that you prevent the compiler from optimizing anything about the code since you are dictating every single machine instruction.



If you wrote the code in a high-level language, the compiler would be allowed to apply several optimizations that you probably never heard of. That would be my first try. Rewrite the code in C, C++, Go or a similar language and look at the generated machine code, at different optimization levels. Also try different compilers, such as GCC, Clang, Intel C++.







share|improve this answer













share|improve this answer



share|improve this answer











answered Apr 5 at 23:09









Roland Illig

10.4k11543




10.4k11543







  • 2




    I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
    – John Coleman
    Apr 6 at 13:11












  • 2




    I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
    – John Coleman
    Apr 6 at 13:11







2




2




I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
– John Coleman
Apr 6 at 13:11




I think that this answer is especially relevant since using a better factorization algorithm could be done in a minutes in C but might take many hours of work in hand-written assembly. On the other hand, this might be a project for a class in assembly language programming -- but even there looking at generated assembly might not be a bad idea.
– John Coleman
Apr 6 at 13:11












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191301%2feulers-totient-function-in-x86-assembly-masm%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?