Slow cursor.fetchall
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
0
down vote
favorite
I am trying to get +10M records from a DB:
import pymssql
from pandas import DataFrame
conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')
query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()
cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]
df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')
My output file size is about 600 MG
Until df = DataFrame..
. the runtime is about 2mins.
However df = DataFrame(cursor.fetchall())
is +1 Hour and a hell lot of RAM
Any suggestion how can I optimize that part of my code?
Thanks!
python pandas
add a comment |Â
up vote
0
down vote
favorite
I am trying to get +10M records from a DB:
import pymssql
from pandas import DataFrame
conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')
query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()
cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]
df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')
My output file size is about 600 MG
Until df = DataFrame..
. the runtime is about 2mins.
However df = DataFrame(cursor.fetchall())
is +1 Hour and a hell lot of RAM
Any suggestion how can I optimize that part of my code?
Thanks!
python pandas
There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
â C. Harley
Aug 2 at 13:51
Start by running your query directly in mysql.shell> mysql db_name < Query.sql
and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
â blues
2 days ago
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am trying to get +10M records from a DB:
import pymssql
from pandas import DataFrame
conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')
query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()
cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]
df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')
My output file size is about 600 MG
Until df = DataFrame..
. the runtime is about 2mins.
However df = DataFrame(cursor.fetchall())
is +1 Hour and a hell lot of RAM
Any suggestion how can I optimize that part of my code?
Thanks!
python pandas
I am trying to get +10M records from a DB:
import pymssql
from pandas import DataFrame
conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')
query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()
cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]
df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')
My output file size is about 600 MG
Until df = DataFrame..
. the runtime is about 2mins.
However df = DataFrame(cursor.fetchall())
is +1 Hour and a hell lot of RAM
Any suggestion how can I optimize that part of my code?
Thanks!
python pandas
asked Aug 2 at 9:25
no name
1
1
There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
â C. Harley
Aug 2 at 13:51
Start by running your query directly in mysql.shell> mysql db_name < Query.sql
and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
â blues
2 days ago
add a comment |Â
There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
â C. Harley
Aug 2 at 13:51
Start by running your query directly in mysql.shell> mysql db_name < Query.sql
and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
â blues
2 days ago
There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
â C. Harley
Aug 2 at 13:51
There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
â C. Harley
Aug 2 at 13:51
Start by running your query directly in mysql.
shell> mysql db_name < Query.sql
and check how long that takes, so at least you will know if your slowness is in sql, python, or both.â blues
2 days ago
Start by running your query directly in mysql.
shell> mysql db_name < Query.sql
and check how long that takes, so at least you will know if your slowness is in sql, python, or both.â blues
2 days ago
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200797%2fslow-cursor-fetchall%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
â C. Harley
Aug 2 at 13:51
Start by running your query directly in mysql.
shell> mysql db_name < Query.sql
and check how long that takes, so at least you will know if your slowness is in sql, python, or both.â blues
2 days ago