Reducing execution time for a python program

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












The following is code that I used to do a task. the code runs fine but takes over 2 minutes to get executed. And this is with only one rule (Rule_1), the rule is checked using an .exe ( written in CPP).Actually there are many CSV files that are needed to pass through the rules.



My question is will the program always use this much amount of time, because I have to implement more the 50 rule for the files, or there is any other way around?



import os
import fnmatch
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
import sys
from shutil import copyfileobj


def locate(pattern, root="Z:/Automation/"):
'''Locate all files matching supplied filename pattern in and below
supplied root directory.'''
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)


csv_path_unrefined =
for xml in locate("*.csv"):
try:
ElementTree.parse(xml)
except (SyntaxError, ExpatError):
csv_path_unrefined.append(xml)
csv_path =
for paths in csv_path_unrefined:
if "results" in str(paths):
csv_path.append(paths)


def check_rule1(path):
# path = "PWLLOGGER_DEMO.csv"
file = 'ConsoleApplication9.exe "' + path + '"'
# print(file)
details = os.popen(file).read()
log_file = open("logs/Rule_1.txt")
state = log_file.read()
with open('results/Rule_1_log.log', 'a+') as files:
files.write("n========" + path + "========n")
files.close
with open('results/Rule_1_log.log', 'a+') as output, open('logs/Rule_1.txt', 'r') as input:
copyfileobj(input, output)
if "failed" in state:
return False
else:
return True


rule_1_passed =
rule_1_failed =

for paths in csv_path:
result_r1 = check_rule1(paths)
# print(result_r1)
if result_r1 == False:
rule_1_failed.append(paths)
#print("Rule 1 has failed for " + paths)
elif result_r1 == True:
rule_1_passed.append(paths)
#print("Rule 1 has passed for " + paths)
open('logs/Rule_1.txt', 'w').close()


print(rule_1_failed)
print(rule_1_passed)






share|improve this question















  • 2




    The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    Jun 26 at 11:22










  • So how large are those logfiles the application is creating? You create write, read, overwrite that file again and again...
    – Graipher
    Jun 26 at 11:25






  • 1




    You're using an XML reader to parse a CSV file? Could you clarify why that's happening?
    – scnerd
    Jun 26 at 21:48










  • Have you profiled your code to determine which parts are taking the most overall time?
    – scnerd
    Jun 26 at 21:56










  • @scnerd by profiling you mean calculating the time for each function in the program or anything else ?
    – noswear
    Jun 27 at 4:21
















up vote
0
down vote

favorite












The following is code that I used to do a task. the code runs fine but takes over 2 minutes to get executed. And this is with only one rule (Rule_1), the rule is checked using an .exe ( written in CPP).Actually there are many CSV files that are needed to pass through the rules.



My question is will the program always use this much amount of time, because I have to implement more the 50 rule for the files, or there is any other way around?



import os
import fnmatch
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
import sys
from shutil import copyfileobj


def locate(pattern, root="Z:/Automation/"):
'''Locate all files matching supplied filename pattern in and below
supplied root directory.'''
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)


csv_path_unrefined =
for xml in locate("*.csv"):
try:
ElementTree.parse(xml)
except (SyntaxError, ExpatError):
csv_path_unrefined.append(xml)
csv_path =
for paths in csv_path_unrefined:
if "results" in str(paths):
csv_path.append(paths)


def check_rule1(path):
# path = "PWLLOGGER_DEMO.csv"
file = 'ConsoleApplication9.exe "' + path + '"'
# print(file)
details = os.popen(file).read()
log_file = open("logs/Rule_1.txt")
state = log_file.read()
with open('results/Rule_1_log.log', 'a+') as files:
files.write("n========" + path + "========n")
files.close
with open('results/Rule_1_log.log', 'a+') as output, open('logs/Rule_1.txt', 'r') as input:
copyfileobj(input, output)
if "failed" in state:
return False
else:
return True


rule_1_passed =
rule_1_failed =

for paths in csv_path:
result_r1 = check_rule1(paths)
# print(result_r1)
if result_r1 == False:
rule_1_failed.append(paths)
#print("Rule 1 has failed for " + paths)
elif result_r1 == True:
rule_1_passed.append(paths)
#print("Rule 1 has passed for " + paths)
open('logs/Rule_1.txt', 'w').close()


print(rule_1_failed)
print(rule_1_passed)






share|improve this question















  • 2




    The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    Jun 26 at 11:22










  • So how large are those logfiles the application is creating? You create write, read, overwrite that file again and again...
    – Graipher
    Jun 26 at 11:25






  • 1




    You're using an XML reader to parse a CSV file? Could you clarify why that's happening?
    – scnerd
    Jun 26 at 21:48










  • Have you profiled your code to determine which parts are taking the most overall time?
    – scnerd
    Jun 26 at 21:56










  • @scnerd by profiling you mean calculating the time for each function in the program or anything else ?
    – noswear
    Jun 27 at 4:21












up vote
0
down vote

favorite









up vote
0
down vote

favorite











The following is code that I used to do a task. the code runs fine but takes over 2 minutes to get executed. And this is with only one rule (Rule_1), the rule is checked using an .exe ( written in CPP).Actually there are many CSV files that are needed to pass through the rules.



My question is will the program always use this much amount of time, because I have to implement more the 50 rule for the files, or there is any other way around?



import os
import fnmatch
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
import sys
from shutil import copyfileobj


def locate(pattern, root="Z:/Automation/"):
'''Locate all files matching supplied filename pattern in and below
supplied root directory.'''
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)


csv_path_unrefined =
for xml in locate("*.csv"):
try:
ElementTree.parse(xml)
except (SyntaxError, ExpatError):
csv_path_unrefined.append(xml)
csv_path =
for paths in csv_path_unrefined:
if "results" in str(paths):
csv_path.append(paths)


def check_rule1(path):
# path = "PWLLOGGER_DEMO.csv"
file = 'ConsoleApplication9.exe "' + path + '"'
# print(file)
details = os.popen(file).read()
log_file = open("logs/Rule_1.txt")
state = log_file.read()
with open('results/Rule_1_log.log', 'a+') as files:
files.write("n========" + path + "========n")
files.close
with open('results/Rule_1_log.log', 'a+') as output, open('logs/Rule_1.txt', 'r') as input:
copyfileobj(input, output)
if "failed" in state:
return False
else:
return True


rule_1_passed =
rule_1_failed =

for paths in csv_path:
result_r1 = check_rule1(paths)
# print(result_r1)
if result_r1 == False:
rule_1_failed.append(paths)
#print("Rule 1 has failed for " + paths)
elif result_r1 == True:
rule_1_passed.append(paths)
#print("Rule 1 has passed for " + paths)
open('logs/Rule_1.txt', 'w').close()


print(rule_1_failed)
print(rule_1_passed)






share|improve this question











The following is code that I used to do a task. the code runs fine but takes over 2 minutes to get executed. And this is with only one rule (Rule_1), the rule is checked using an .exe ( written in CPP).Actually there are many CSV files that are needed to pass through the rules.



My question is will the program always use this much amount of time, because I have to implement more the 50 rule for the files, or there is any other way around?



import os
import fnmatch
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
import sys
from shutil import copyfileobj


def locate(pattern, root="Z:/Automation/"):
'''Locate all files matching supplied filename pattern in and below
supplied root directory.'''
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)


csv_path_unrefined =
for xml in locate("*.csv"):
try:
ElementTree.parse(xml)
except (SyntaxError, ExpatError):
csv_path_unrefined.append(xml)
csv_path =
for paths in csv_path_unrefined:
if "results" in str(paths):
csv_path.append(paths)


def check_rule1(path):
# path = "PWLLOGGER_DEMO.csv"
file = 'ConsoleApplication9.exe "' + path + '"'
# print(file)
details = os.popen(file).read()
log_file = open("logs/Rule_1.txt")
state = log_file.read()
with open('results/Rule_1_log.log', 'a+') as files:
files.write("n========" + path + "========n")
files.close
with open('results/Rule_1_log.log', 'a+') as output, open('logs/Rule_1.txt', 'r') as input:
copyfileobj(input, output)
if "failed" in state:
return False
else:
return True


rule_1_passed =
rule_1_failed =

for paths in csv_path:
result_r1 = check_rule1(paths)
# print(result_r1)
if result_r1 == False:
rule_1_failed.append(paths)
#print("Rule 1 has failed for " + paths)
elif result_r1 == True:
rule_1_passed.append(paths)
#print("Rule 1 has passed for " + paths)
open('logs/Rule_1.txt', 'w').close()


print(rule_1_failed)
print(rule_1_passed)








share|improve this question










share|improve this question




share|improve this question









asked Jun 26 at 11:05









noswear

63




63







  • 2




    The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    Jun 26 at 11:22










  • So how large are those logfiles the application is creating? You create write, read, overwrite that file again and again...
    – Graipher
    Jun 26 at 11:25






  • 1




    You're using an XML reader to parse a CSV file? Could you clarify why that's happening?
    – scnerd
    Jun 26 at 21:48










  • Have you profiled your code to determine which parts are taking the most overall time?
    – scnerd
    Jun 26 at 21:56










  • @scnerd by profiling you mean calculating the time for each function in the program or anything else ?
    – noswear
    Jun 27 at 4:21












  • 2




    The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
    – Mathias Ettinger
    Jun 26 at 11:22










  • So how large are those logfiles the application is creating? You create write, read, overwrite that file again and again...
    – Graipher
    Jun 26 at 11:25






  • 1




    You're using an XML reader to parse a CSV file? Could you clarify why that's happening?
    – scnerd
    Jun 26 at 21:48










  • Have you profiled your code to determine which parts are taking the most overall time?
    – scnerd
    Jun 26 at 21:56










  • @scnerd by profiling you mean calculating the time for each function in the program or anything else ?
    – noswear
    Jun 27 at 4:21







2




2




The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
– Mathias Ettinger
Jun 26 at 11:22




The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.
– Mathias Ettinger
Jun 26 at 11:22












So how large are those logfiles the application is creating? You create write, read, overwrite that file again and again...
– Graipher
Jun 26 at 11:25




So how large are those logfiles the application is creating? You create write, read, overwrite that file again and again...
– Graipher
Jun 26 at 11:25




1




1




You're using an XML reader to parse a CSV file? Could you clarify why that's happening?
– scnerd
Jun 26 at 21:48




You're using an XML reader to parse a CSV file? Could you clarify why that's happening?
– scnerd
Jun 26 at 21:48












Have you profiled your code to determine which parts are taking the most overall time?
– scnerd
Jun 26 at 21:56




Have you profiled your code to determine which parts are taking the most overall time?
– scnerd
Jun 26 at 21:56












@scnerd by profiling you mean calculating the time for each function in the program or anything else ?
– noswear
Jun 27 at 4:21




@scnerd by profiling you mean calculating the time for each function in the program or anything else ?
– noswear
Jun 27 at 4:21










1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










Really, to answer your question, you need to profile your code. You need to understand what is taking up all that time. Just by looking at your code, it's impossible to tell, but my guess would be that the most time is spent in one of the following places:



  • Running every *.csv file you run into through the XML parser. I get the feeling you consider this necessary in order to discard XML files that are pretending to be CSV files. Ideally, you should do this once, then properly label your files thenceforth so you don't have to do this check every time. This strikes me as a potentially very expensive thing to do; as such, I've modified this check below so it only occurs when you might actually be interested in the file later on (that is, its path contains 'results')


  • Kicking off the external process individually for each file you want to check. Launching processes is not a cheap operation. Ideally, you'd want to EITHER launch a single process for each rule, passing it all relevant file paths for it to check all at once, OR if you wrote your rules in Python, you could read each file in exactly once, then process it through all your rules all at once. Launching a new process for each rule, for each file, is probably a huge source of slowness.


There are also several parts of the code that just seem hacky and bad practice. Also, if you use Python 3.5+, you can use glob instead of your custom locate function. In the spirit of reviewing all your code anyway and making any suggestions that seem appropriate, here's how I'd suggest your code be written, though admittedly without any good suggestions about how to speed up the code (because, again, you MUST profile your code to understand what's actually taking the time):



import os
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
from glob import iglob



def is_xml(path):
try:
ElementTree.parse(path)
return True
except (SyntaxError, ExpatError):
return False


def check_rule1(path):
subprocess.run(['ConsoleApplication9.exe', path])
state = open("logs/Rule_1.txt").read()

with open('results/Rule_1_log.log', 'a+') as output:
output.write("n========" + path + "========n")
output.write(state)
return "failed" not in state:


def main():
csv_path = (path for path in iglob('**/*.csv', recursive=True) if 'results' in path and not is_xml(path))
rules = [check_rule1]

for rule_num, rule in enumerate(rules):
rule_num += 1 # We want to count rules from 1 up
passed =
failed =

for paths in csv_path:
result = rule(paths)
if result:
passed.append(paths)
#print("Rule 1 has passed for " + paths)
else:
failed.append(paths)
#print("Rule 1 has failed for " + paths)

os.remove('logs/Rule_1.txt')

# Do something with passed/failed, presumably?


if __name__ == '__main__':
main()





share|improve this answer





















  • Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
    – noswear
    Jun 27 at 4:30











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197271%2freducing-execution-time-for-a-python-program%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










Really, to answer your question, you need to profile your code. You need to understand what is taking up all that time. Just by looking at your code, it's impossible to tell, but my guess would be that the most time is spent in one of the following places:



  • Running every *.csv file you run into through the XML parser. I get the feeling you consider this necessary in order to discard XML files that are pretending to be CSV files. Ideally, you should do this once, then properly label your files thenceforth so you don't have to do this check every time. This strikes me as a potentially very expensive thing to do; as such, I've modified this check below so it only occurs when you might actually be interested in the file later on (that is, its path contains 'results')


  • Kicking off the external process individually for each file you want to check. Launching processes is not a cheap operation. Ideally, you'd want to EITHER launch a single process for each rule, passing it all relevant file paths for it to check all at once, OR if you wrote your rules in Python, you could read each file in exactly once, then process it through all your rules all at once. Launching a new process for each rule, for each file, is probably a huge source of slowness.


There are also several parts of the code that just seem hacky and bad practice. Also, if you use Python 3.5+, you can use glob instead of your custom locate function. In the spirit of reviewing all your code anyway and making any suggestions that seem appropriate, here's how I'd suggest your code be written, though admittedly without any good suggestions about how to speed up the code (because, again, you MUST profile your code to understand what's actually taking the time):



import os
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
from glob import iglob



def is_xml(path):
try:
ElementTree.parse(path)
return True
except (SyntaxError, ExpatError):
return False


def check_rule1(path):
subprocess.run(['ConsoleApplication9.exe', path])
state = open("logs/Rule_1.txt").read()

with open('results/Rule_1_log.log', 'a+') as output:
output.write("n========" + path + "========n")
output.write(state)
return "failed" not in state:


def main():
csv_path = (path for path in iglob('**/*.csv', recursive=True) if 'results' in path and not is_xml(path))
rules = [check_rule1]

for rule_num, rule in enumerate(rules):
rule_num += 1 # We want to count rules from 1 up
passed =
failed =

for paths in csv_path:
result = rule(paths)
if result:
passed.append(paths)
#print("Rule 1 has passed for " + paths)
else:
failed.append(paths)
#print("Rule 1 has failed for " + paths)

os.remove('logs/Rule_1.txt')

# Do something with passed/failed, presumably?


if __name__ == '__main__':
main()





share|improve this answer





















  • Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
    – noswear
    Jun 27 at 4:30















up vote
0
down vote



accepted










Really, to answer your question, you need to profile your code. You need to understand what is taking up all that time. Just by looking at your code, it's impossible to tell, but my guess would be that the most time is spent in one of the following places:



  • Running every *.csv file you run into through the XML parser. I get the feeling you consider this necessary in order to discard XML files that are pretending to be CSV files. Ideally, you should do this once, then properly label your files thenceforth so you don't have to do this check every time. This strikes me as a potentially very expensive thing to do; as such, I've modified this check below so it only occurs when you might actually be interested in the file later on (that is, its path contains 'results')


  • Kicking off the external process individually for each file you want to check. Launching processes is not a cheap operation. Ideally, you'd want to EITHER launch a single process for each rule, passing it all relevant file paths for it to check all at once, OR if you wrote your rules in Python, you could read each file in exactly once, then process it through all your rules all at once. Launching a new process for each rule, for each file, is probably a huge source of slowness.


There are also several parts of the code that just seem hacky and bad practice. Also, if you use Python 3.5+, you can use glob instead of your custom locate function. In the spirit of reviewing all your code anyway and making any suggestions that seem appropriate, here's how I'd suggest your code be written, though admittedly without any good suggestions about how to speed up the code (because, again, you MUST profile your code to understand what's actually taking the time):



import os
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
from glob import iglob



def is_xml(path):
try:
ElementTree.parse(path)
return True
except (SyntaxError, ExpatError):
return False


def check_rule1(path):
subprocess.run(['ConsoleApplication9.exe', path])
state = open("logs/Rule_1.txt").read()

with open('results/Rule_1_log.log', 'a+') as output:
output.write("n========" + path + "========n")
output.write(state)
return "failed" not in state:


def main():
csv_path = (path for path in iglob('**/*.csv', recursive=True) if 'results' in path and not is_xml(path))
rules = [check_rule1]

for rule_num, rule in enumerate(rules):
rule_num += 1 # We want to count rules from 1 up
passed =
failed =

for paths in csv_path:
result = rule(paths)
if result:
passed.append(paths)
#print("Rule 1 has passed for " + paths)
else:
failed.append(paths)
#print("Rule 1 has failed for " + paths)

os.remove('logs/Rule_1.txt')

# Do something with passed/failed, presumably?


if __name__ == '__main__':
main()





share|improve this answer





















  • Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
    – noswear
    Jun 27 at 4:30













up vote
0
down vote



accepted







up vote
0
down vote



accepted






Really, to answer your question, you need to profile your code. You need to understand what is taking up all that time. Just by looking at your code, it's impossible to tell, but my guess would be that the most time is spent in one of the following places:



  • Running every *.csv file you run into through the XML parser. I get the feeling you consider this necessary in order to discard XML files that are pretending to be CSV files. Ideally, you should do this once, then properly label your files thenceforth so you don't have to do this check every time. This strikes me as a potentially very expensive thing to do; as such, I've modified this check below so it only occurs when you might actually be interested in the file later on (that is, its path contains 'results')


  • Kicking off the external process individually for each file you want to check. Launching processes is not a cheap operation. Ideally, you'd want to EITHER launch a single process for each rule, passing it all relevant file paths for it to check all at once, OR if you wrote your rules in Python, you could read each file in exactly once, then process it through all your rules all at once. Launching a new process for each rule, for each file, is probably a huge source of slowness.


There are also several parts of the code that just seem hacky and bad practice. Also, if you use Python 3.5+, you can use glob instead of your custom locate function. In the spirit of reviewing all your code anyway and making any suggestions that seem appropriate, here's how I'd suggest your code be written, though admittedly without any good suggestions about how to speed up the code (because, again, you MUST profile your code to understand what's actually taking the time):



import os
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
from glob import iglob



def is_xml(path):
try:
ElementTree.parse(path)
return True
except (SyntaxError, ExpatError):
return False


def check_rule1(path):
subprocess.run(['ConsoleApplication9.exe', path])
state = open("logs/Rule_1.txt").read()

with open('results/Rule_1_log.log', 'a+') as output:
output.write("n========" + path + "========n")
output.write(state)
return "failed" not in state:


def main():
csv_path = (path for path in iglob('**/*.csv', recursive=True) if 'results' in path and not is_xml(path))
rules = [check_rule1]

for rule_num, rule in enumerate(rules):
rule_num += 1 # We want to count rules from 1 up
passed =
failed =

for paths in csv_path:
result = rule(paths)
if result:
passed.append(paths)
#print("Rule 1 has passed for " + paths)
else:
failed.append(paths)
#print("Rule 1 has failed for " + paths)

os.remove('logs/Rule_1.txt')

# Do something with passed/failed, presumably?


if __name__ == '__main__':
main()





share|improve this answer













Really, to answer your question, you need to profile your code. You need to understand what is taking up all that time. Just by looking at your code, it's impossible to tell, but my guess would be that the most time is spent in one of the following places:



  • Running every *.csv file you run into through the XML parser. I get the feeling you consider this necessary in order to discard XML files that are pretending to be CSV files. Ideally, you should do this once, then properly label your files thenceforth so you don't have to do this check every time. This strikes me as a potentially very expensive thing to do; as such, I've modified this check below so it only occurs when you might actually be interested in the file later on (that is, its path contains 'results')


  • Kicking off the external process individually for each file you want to check. Launching processes is not a cheap operation. Ideally, you'd want to EITHER launch a single process for each rule, passing it all relevant file paths for it to check all at once, OR if you wrote your rules in Python, you could read each file in exactly once, then process it through all your rules all at once. Launching a new process for each rule, for each file, is probably a huge source of slowness.


There are also several parts of the code that just seem hacky and bad practice. Also, if you use Python 3.5+, you can use glob instead of your custom locate function. In the spirit of reviewing all your code anyway and making any suggestions that seem appropriate, here's how I'd suggest your code be written, though admittedly without any good suggestions about how to speed up the code (because, again, you MUST profile your code to understand what's actually taking the time):



import os
import subprocess
import xml.etree.ElementTree as ElementTree
from xml.parsers.expat import ExpatError
from glob import iglob



def is_xml(path):
try:
ElementTree.parse(path)
return True
except (SyntaxError, ExpatError):
return False


def check_rule1(path):
subprocess.run(['ConsoleApplication9.exe', path])
state = open("logs/Rule_1.txt").read()

with open('results/Rule_1_log.log', 'a+') as output:
output.write("n========" + path + "========n")
output.write(state)
return "failed" not in state:


def main():
csv_path = (path for path in iglob('**/*.csv', recursive=True) if 'results' in path and not is_xml(path))
rules = [check_rule1]

for rule_num, rule in enumerate(rules):
rule_num += 1 # We want to count rules from 1 up
passed =
failed =

for paths in csv_path:
result = rule(paths)
if result:
passed.append(paths)
#print("Rule 1 has passed for " + paths)
else:
failed.append(paths)
#print("Rule 1 has failed for " + paths)

os.remove('logs/Rule_1.txt')

# Do something with passed/failed, presumably?


if __name__ == '__main__':
main()






share|improve this answer













share|improve this answer



share|improve this answer











answered Jun 26 at 22:25









scnerd

6438




6438











  • Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
    – noswear
    Jun 27 at 4:30

















  • Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
    – noswear
    Jun 27 at 4:30
















Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
– noswear
Jun 27 at 4:30





Thanks a lot for your suggestion. Your second point is also what I thought could be the major problem here, needed to review that one. Also thanks for suggesting the glob function. The help is appreciated.
– noswear
Jun 27 at 4:30













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197271%2freducing-execution-time-for-a-python-program%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation