Improving read/write loops in Python [closed]
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
-1
down vote
favorite
Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!
I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?
for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()
python
closed as off-topic by ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
add a comment |Â
up vote
-1
down vote
favorite
Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!
I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?
for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()
python
closed as off-topic by ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
2
You could try to open/close the file only once and call write only once per iteration.
â Josay
Jun 20 at 16:37
1
Where comes the magic500
from? What does your data look like?
â Mast
Jun 20 at 18:32
add a comment |Â
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!
I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?
for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()
python
Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!
I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?
for i in tqdm(data['column'].head(500)):
f = open("Questions.txt","a", newline="n",encoding='utf-8')
f.write(i.strip("/n"))
f.write("n")
f.close()
python
asked Jun 20 at 16:28
F.D
113
113
closed as off-topic by ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
closed as off-topic by ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â ÃÂìýÃÂñ á¿¥Ã栨Â, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel
2
You could try to open/close the file only once and call write only once per iteration.
â Josay
Jun 20 at 16:37
1
Where comes the magic500
from? What does your data look like?
â Mast
Jun 20 at 18:32
add a comment |Â
2
You could try to open/close the file only once and call write only once per iteration.
â Josay
Jun 20 at 16:37
1
Where comes the magic500
from? What does your data look like?
â Mast
Jun 20 at 18:32
2
2
You could try to open/close the file only once and call write only once per iteration.
â Josay
Jun 20 at 16:37
You could try to open/close the file only once and call write only once per iteration.
â Josay
Jun 20 at 16:37
1
1
Where comes the magic
500
from? What does your data look like?â Mast
Jun 20 at 18:32
Where comes the magic
500
from? What does your data look like?â Mast
Jun 20 at 18:32
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
and this loop is naturally very slow given the I/O overhead
Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:
f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()
2
Wouldn'twith open(...)
be more idiomatic?
â Phrancis
Jun 20 at 17:30
@Phrancis I'm no python expert and can't really tell, butwith
code blocks are the most horrible and obfuscating things with any programming language I've seen.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis Suchwith
code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@ÃÂìýÃÂÃ񠨝õῠthink ofwith
blocks in python as the equivalent ofusing
blocks in C# ortry-with-resources
in java. They're a resource handling construct, nothing like theWith
blocks you may know from Basic dialects
â Vogel612â¦
Jun 20 at 17:43
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
add a comment |Â
up vote
2
down vote
Long story short: you don't deal with the I/O and call into numpy.savetext
instead. Consider the following code:
import numpy as np
np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")
This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:
- expensive I/O operations (open and close) are only done once
- I/O is pushed from python into C++
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
and this loop is naturally very slow given the I/O overhead
Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:
f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()
2
Wouldn'twith open(...)
be more idiomatic?
â Phrancis
Jun 20 at 17:30
@Phrancis I'm no python expert and can't really tell, butwith
code blocks are the most horrible and obfuscating things with any programming language I've seen.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis Suchwith
code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@ÃÂìýÃÂÃ񠨝õῠthink ofwith
blocks in python as the equivalent ofusing
blocks in C# ortry-with-resources
in java. They're a resource handling construct, nothing like theWith
blocks you may know from Basic dialects
â Vogel612â¦
Jun 20 at 17:43
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
add a comment |Â
up vote
0
down vote
accepted
and this loop is naturally very slow given the I/O overhead
Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:
f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()
2
Wouldn'twith open(...)
be more idiomatic?
â Phrancis
Jun 20 at 17:30
@Phrancis I'm no python expert and can't really tell, butwith
code blocks are the most horrible and obfuscating things with any programming language I've seen.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis Suchwith
code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@ÃÂìýÃÂÃ񠨝õῠthink ofwith
blocks in python as the equivalent ofusing
blocks in C# ortry-with-resources
in java. They're a resource handling construct, nothing like theWith
blocks you may know from Basic dialects
â Vogel612â¦
Jun 20 at 17:43
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
add a comment |Â
up vote
0
down vote
accepted
up vote
0
down vote
accepted
and this loop is naturally very slow given the I/O overhead
Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:
f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()
and this loop is naturally very slow given the I/O overhead
Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:
f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
f.write(i.strip("/n"))
f.write("n")
f.close()
answered Jun 20 at 16:58
ÃÂìýÃÂñ á¿¥Ã栨Â
3,82431126
3,82431126
2
Wouldn'twith open(...)
be more idiomatic?
â Phrancis
Jun 20 at 17:30
@Phrancis I'm no python expert and can't really tell, butwith
code blocks are the most horrible and obfuscating things with any programming language I've seen.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis Suchwith
code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@ÃÂìýÃÂÃ񠨝õῠthink ofwith
blocks in python as the equivalent ofusing
blocks in C# ortry-with-resources
in java. They're a resource handling construct, nothing like theWith
blocks you may know from Basic dialects
â Vogel612â¦
Jun 20 at 17:43
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
add a comment |Â
2
Wouldn'twith open(...)
be more idiomatic?
â Phrancis
Jun 20 at 17:30
@Phrancis I'm no python expert and can't really tell, butwith
code blocks are the most horrible and obfuscating things with any programming language I've seen.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis Suchwith
code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@ÃÂìýÃÂÃ񠨝õῠthink ofwith
blocks in python as the equivalent ofusing
blocks in C# ortry-with-resources
in java. They're a resource handling construct, nothing like theWith
blocks you may know from Basic dialects
â Vogel612â¦
Jun 20 at 17:43
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
2
2
Wouldn't
with open(...)
be more idiomatic?â Phrancis
Jun 20 at 17:30
Wouldn't
with open(...)
be more idiomatic?â Phrancis
Jun 20 at 17:30
@Phrancis I'm no python expert and can't really tell, but
with
code blocks are the most horrible and obfuscating things with any programming language I've seen.â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis I'm no python expert and can't really tell, but
with
code blocks are the most horrible and obfuscating things with any programming language I've seen.â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:34
@Phrancis Such
with
code blocks are a strong indication you should just split up these parts, and move it into a separate function.â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@Phrancis Such
with
code blocks are a strong indication you should just split up these parts, and move it into a separate function.â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:39
@ÃÂìýÃÂÃ񠨝õῠthink of
with
blocks in python as the equivalent of using
blocks in C# or try-with-resources
in java. They're a resource handling construct, nothing like the With
blocks you may know from Basic dialectsâ Vogel612â¦
Jun 20 at 17:43
@ÃÂìýÃÂÃ񠨝õῠthink of
with
blocks in python as the equivalent of using
blocks in C# or try-with-resources
in java. They're a resource handling construct, nothing like the With
blocks you may know from Basic dialectsâ Vogel612â¦
Jun 20 at 17:43
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 17:47
add a comment |Â
up vote
2
down vote
Long story short: you don't deal with the I/O and call into numpy.savetext
instead. Consider the following code:
import numpy as np
np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")
This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:
- expensive I/O operations (open and close) are only done once
- I/O is pushed from python into C++
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
add a comment |Â
up vote
2
down vote
Long story short: you don't deal with the I/O and call into numpy.savetext
instead. Consider the following code:
import numpy as np
np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")
This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:
- expensive I/O operations (open and close) are only done once
- I/O is pushed from python into C++
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Long story short: you don't deal with the I/O and call into numpy.savetext
instead. Consider the following code:
import numpy as np
np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")
This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:
- expensive I/O operations (open and close) are only done once
- I/O is pushed from python into C++
Long story short: you don't deal with the I/O and call into numpy.savetext
instead. Consider the following code:
import numpy as np
np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")
This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:
- expensive I/O operations (open and close) are only done once
- I/O is pushed from python into C++
answered Jun 20 at 18:15
Vogel612â¦
20.9k345124
20.9k345124
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
add a comment |Â
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
That's probably what separates the theorists from the experts. Great answer!
â ÃÂìýÃÂñ á¿¥Ã栨Â
Jun 20 at 18:19
add a comment |Â
2
You could try to open/close the file only once and call write only once per iteration.
â Josay
Jun 20 at 16:37
1
Where comes the magic
500
from? What does your data look like?â Mast
Jun 20 at 18:32