Improving read/write loops in Python [closed]

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
-1
down vote

favorite

Task: I'm importing a .CSV files as a pandas dataframes. I'm writing one column of that dataframe to a .txt file. Importantly, each row of the column must only be written as one row in the text file, never more (hence stripping /n)!

I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?

for i in tqdm(data['column'].head(500)):
 f = open("Questions.txt","a", newline="n",encoding='utf-8')
 f.write(i.strip("/n"))
 f.write("n")
 f.close()

asked Jun 20 at 16:28

F.D

113

closed as off-topic by ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel

If this question can be reworded to fit the rules in the help center, please edit the question.

2

You could try to open/close the file only once and call write only once per iteration.
â€“Â Josay
Jun 20 at 16:37

1

Where comes the magic 500 from? What does your data look like?
â€“Â Mast
Jun 20 at 18:32

add a commentÂ |Â

up vote
-1
down vote

favorite

I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?

for i in tqdm(data['column'].head(500)):
 f = open("Questions.txt","a", newline="n",encoding='utf-8')
 f.write(i.strip("/n"))
 f.write("n")
 f.close()

asked Jun 20 at 16:28

F.D

113

closed as off-topic by ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel

If this question can be reworded to fit the rules in the help center, please edit the question.

2

You could try to open/close the file only once and call write only once per iteration.
â€“Â Josay
Jun 20 at 16:37

1

Where comes the magic 500 from? What does your data look like?
â€“Â Mast
Jun 20 at 18:32

add a commentÂ |Â

up vote
-1
down vote

favorite

I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?

for i in tqdm(data['column'].head(500)):
 f = open("Questions.txt","a", newline="n",encoding='utf-8')
 f.write(i.strip("/n"))
 f.write("n")
 f.close()

asked Jun 20 at 16:28

F.D

113

I have a very large dataframe (2 million rows), and this loop is naturally very slow given the I/O overhead. Any suggestions for improvements?

for i in tqdm(data['column'].head(500)):
 f = open("Questions.txt","a", newline="n",encoding='utf-8')
 f.write(i.strip("/n"))
 f.write("n")
 f.close()

asked Jun 20 at 16:28

F.D

113

asked Jun 20 at 16:28

F.D

113

asked Jun 20 at 16:28

F.D

113

asked Jun 20 at 16:28

F.D

113

closed as off-topic by ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel Jun 20 at 19:48

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." â€“ ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–, Billal BEGUERADJ, Stephen Rauch, Sam Onela, Daniel

If this question can be reworded to fit the rules in the help center, please edit the question.

2

You could try to open/close the file only once and call write only once per iteration.
â€“Â Josay
Jun 20 at 16:37

1

Where comes the magic 500 from? What does your data look like?
â€“Â Mast
Jun 20 at 18:32

add a commentÂ |Â

2

You could try to open/close the file only once and call write only once per iteration.
â€“Â Josay
Jun 20 at 16:37

1

Where comes the magic 500 from? What does your data look like?
â€“Â Mast
Jun 20 at 18:32

You could try to open/close the file only once and call write only once per iteration.
â€“Â Josay
Jun 20 at 16:37

Where comes the magic 500 from? What does your data look like?
â€“Â Mast
Jun 20 at 18:32

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

and this loop is naturally very slow given the I/O overhead

Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:

f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
 f.write(i.strip("/n"))
 f.write("n")
f.close()

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

2

Wouldn't with open(...) be more idiomatic?
â€“Â Phrancis
Jun 20 at 17:30

@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:34

@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:39

@ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ±Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â– think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
â€“Â Vogel612â™¦
Jun 20 at 17:43

@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:47

add a commentÂ |Â

up vote
2
down vote

Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:

import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")

This makes abundantly clear that you only care about a newline-stripped representation of the column 'column' in your dataframe. Note that I removed the progress bar from this. I expect this code to be blazingly fast in comparison to yours, because it does two things:

expensive I/O operations (open and close) are only done once

I/O is pushed from python into C++

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

That's probably what separates the theorists from the experts. Great answer!
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 18:19

add a commentÂ |Â

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

and this loop is naturally very slow given the I/O overhead

Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:

f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
 f.write(i.strip("/n"))
 f.write("n")
f.close()

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

2

Wouldn't with open(...) be more idiomatic?
â€“Â Phrancis
Jun 20 at 17:30

@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:34

@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:39

@ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ±Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â– think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
â€“Â Vogel612â™¦
Jun 20 at 17:43

@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:47

add a commentÂ |Â

up vote
0
down vote

accepted

and this loop is naturally very slow given the I/O overhead

Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:

f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
 f.write(i.strip("/n"))
 f.write("n")
f.close()

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

2

Wouldn't with open(...) be more idiomatic?
â€“Â Phrancis
Jun 20 at 17:30

@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:34

@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:39

@ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ±Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â– think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
â€“Â Vogel612â™¦
Jun 20 at 17:43

@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:47

add a commentÂ |Â

up vote
0
down vote

accepted

and this loop is naturally very slow given the I/O overhead

Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:

f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
 f.write(i.strip("/n"))
 f.write("n")
f.close()

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

and this loop is naturally very slow given the I/O overhead

Most of the overhead is probably produced from the way you are opening/closing the file with each iteration. That could be easily fixed by moving these operations out of the loop:

f = open("Questions.txt","a", newline="n",encoding='utf-8')
for i in tqdm(data['column'].head(500)):
 f.write(i.strip("/n"))
 f.write("n")
f.close()

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

answered Jun 20 at 16:58

ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–

3,82431126

2

Wouldn't with open(...) be more idiomatic?
â€“Â Phrancis
Jun 20 at 17:30

@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:34

@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:39

@ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ±Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â– think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
â€“Â Vogel612â™¦
Jun 20 at 17:43

@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:47

add a commentÂ |Â

2

Wouldn't with open(...) be more idiomatic?
â€“Â Phrancis
Jun 20 at 17:30

@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:34

@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:39

@ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ±Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â– think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
â€“Â Vogel612â™¦
Jun 20 at 17:43

@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:47

Wouldn't with open(...) be more idiomatic?
â€“Â Phrancis
Jun 20 at 17:30

@Phrancis I'm no python expert and can't really tell, but with code blocks are the most horrible and obfuscating things with any programming language I've seen.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:34

@Phrancis Such with code blocks are a strong indication you should just split up these parts, and move it into a separate function.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:39

@ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ±Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â– think of with blocks in python as the equivalent of using blocks in C# or try-with-resources in java. They're a resource handling construct, nothing like the With blocks you may know from Basic dialects
â€“Â Vogel612â™¦
Jun 20 at 17:43

@Vogel612 As mentioned, I just find such constructs just horrible and defeating the readability, well structuring and refactoring flexibility of code. It's quite similar in any language.
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 17:47

add a commentÂ |Â

up vote
2
down vote

Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:

import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")

expensive I/O operations (open and close) are only done once

I/O is pushed from python into C++

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

That's probably what separates the theorists from the experts. Great answer!
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 18:19

add a commentÂ |Â

up vote
2
down vote

Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:

import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")

expensive I/O operations (open and close) are only done once

I/O is pushed from python into C++

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

That's probably what separates the theorists from the experts. Great answer!
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 18:19

add a commentÂ |Â

up vote
2
down vote

Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:

import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")

expensive I/O operations (open and close) are only done once

I/O is pushed from python into C++

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

Long story short: you don't deal with the I/O and call into numpy.savetext instead. Consider the following code:

import numpy as np

np.savetext("Questions.txt", data['column'].map(strip_newlines).head(500), newline="n", encoding="utf-8")

expensive I/O operations (open and close) are only done once

I/O is pushed from python into C++

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

answered Jun 20 at 18:15

Vogel612â™¦

20.9k345124

That's probably what separates the theorists from the experts. Great answer!
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 18:19

add a commentÂ |Â

That's probably what separates the theorists from the experts. Great answer!
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 18:19

That's probably what separates the theorists from the experts. Great answer!
â€“Â ÃÂ€ÃŽÂ¬ÃŽÂ½ÃÂ„ÃŽÂ± Ã¡Â¿Â¥ÃŽÂµÃ¡Â¿Â–
Jun 20 at 18:19

add a commentÂ |Â

搜尋此網誌

trjhtr