Improving read/write loops in Python [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
-1
down vote

favorite
1












Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!



I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?



for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()






share|improve this question











closed as off-topic by πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 2




    You could try to open/close the file only once and call write only once per iteration.
    – Josay
    Jun 20 at 16:37






  • 1




    Where comes the magic 500 from? What does your data look like?
    – Mast
    Jun 20 at 18:32
















up vote
-1
down vote

favorite
1












Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!



I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?



for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()






share|improve this question











closed as off-topic by πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 2




    You could try to open/close the file only once and call write only once per iteration.
    – Josay
    Jun 20 at 16:37






  • 1




    Where comes the magic 500 from? What does your data look like?
    – Mast
    Jun 20 at 18:32












up vote
-1
down vote

favorite
1









up vote
-1
down vote

favorite
1






1





Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!



I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?



for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()






share|improve this question











Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!



I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?



for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()








share|improve this question










share|improve this question




share|improve this question









asked Jun 20 at 16:28









F.D

113




113




closed as off-topic by πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – πάντα ῥεῖ, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
If this question can be reworded to fit the rules in the help center, please edit the question.







  • 2




    You could try to open/close the file only once and call write only once per iteration.
    – Josay
    Jun 20 at 16:37






  • 1




    Where comes the magic 500 from? What does your data look like?
    – Mast
    Jun 20 at 18:32












  • 2




    You could try to open/close the file only once and call write only once per iteration.
    – Josay
    Jun 20 at 16:37






  • 1




    Where comes the magic 500 from? What does your data look like?
    – Mast
    Jun 20 at 18:32







2




2




You could try to open/close the file only once and call write only once per iteration.
– Josay
Jun 20 at 16:37




You could try to open/close the file only once and call write only once per iteration.
– Josay
Jun 20 at 16:37




1




1




Where comes the magic 500 from? What does your data look like?
– Mast
Jun 20 at 18:32




Where comes the magic 500 from? What does your data look like?
– Mast
Jun 20 at 18:32










2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted











and this loop is naturally very slow given the I/O overhead




Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:



f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()





share|improve this answer

















  • 2




    Wouldn't with open(...) be more idiomatic?
    – Phrancis
    Jun 20 at 17:30










  • @Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:34










  • @Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:39











  • @πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
    – Vogel612♦
    Jun 20 at 17:43










  • @Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:47

















up vote
2
down vote













Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:



import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")


This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:



  1. expensive I/O operations (open and close) are only done once

  2. I/O is pushed from python into C++





share|improve this answer





















  • That's probably what separates the theorists from the experts. Great answer!
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 18:19

















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted











and this loop is naturally very slow given the I/O overhead




Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:



f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()





share|improve this answer

















  • 2




    Wouldn't with open(...) be more idiomatic?
    – Phrancis
    Jun 20 at 17:30










  • @Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:34










  • @Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:39











  • @πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
    – Vogel612♦
    Jun 20 at 17:43










  • @Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:47














up vote
0
down vote



accepted











and this loop is naturally very slow given the I/O overhead




Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:



f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()





share|improve this answer

















  • 2




    Wouldn't with open(...) be more idiomatic?
    – Phrancis
    Jun 20 at 17:30










  • @Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:34










  • @Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:39











  • @πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
    – Vogel612♦
    Jun 20 at 17:43










  • @Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:47












up vote
0
down vote



accepted







up vote
0
down vote



accepted







and this loop is naturally very slow given the I/O overhead




Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:



f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()





share|improve this answer














and this loop is naturally very slow given the I/O overhead




Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:



f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()






share|improve this answer













share|improve this answer



share|improve this answer











answered Jun 20 at 16:58









πάντα ῥεῖ

3,82431126




3,82431126







  • 2




    Wouldn't with open(...) be more idiomatic?
    – Phrancis
    Jun 20 at 17:30










  • @Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:34










  • @Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:39











  • @πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
    – Vogel612♦
    Jun 20 at 17:43










  • @Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:47












  • 2




    Wouldn't with open(...) be more idiomatic?
    – Phrancis
    Jun 20 at 17:30










  • @Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:34










  • @Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:39











  • @πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
    – Vogel612♦
    Jun 20 at 17:43










  • @Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 17:47







2




2




Wouldn't with open(...) be more idiomatic?
– Phrancis
Jun 20 at 17:30




Wouldn't with open(...) be more idiomatic?
– Phrancis
Jun 20 at 17:30












@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 17:34




@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 17:34












@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 17:39





@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 17:39













@πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
– Vogel612♦
Jun 20 at 17:43




@πάνταῥεῖ think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
– Vogel612♦
Jun 20 at 17:43












@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 17:47




@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 17:47












up vote
2
down vote













Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:



import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")


This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:



  1. expensive I/O operations (open and close) are only done once

  2. I/O is pushed from python into C++





share|improve this answer





















  • That's probably what separates the theorists from the experts. Great answer!
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 18:19














up vote
2
down vote













Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:



import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")


This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:



  1. expensive I/O operations (open and close) are only done once

  2. I/O is pushed from python into C++





share|improve this answer





















  • That's probably what separates the theorists from the experts. Great answer!
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 18:19












up vote
2
down vote










up vote
2
down vote









Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:



import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")


This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:



  1. expensive I/O operations (open and close) are only done once

  2. I/O is pushed from python into C++





share|improve this answer













Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:



import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")


This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:



  1. expensive I/O operations (open and close) are only done once

  2. I/O is pushed from python into C++






share|improve this answer













share|improve this answer



share|improve this answer











answered Jun 20 at 18:15









Vogel612♦

20.9k345124




20.9k345124











  • That's probably what separates the theorists from the experts. Great answer!
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 18:19
















  • That's probably what separates the theorists from the experts. Great answer!
    – Ï€Î¬Î½Ï„α ῥεῖ
    Jun 20 at 18:19















That's probably what separates the theorists from the experts. Great answer!
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 18:19




That's probably what separates the theorists from the experts. Great answer!
– Ï€Î¬Î½Ï„α ῥεῖ
Jun 20 at 18:19


Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation