Slow cursor.fetchall

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












I am trying to get +10M records from a DB:



import pymssql
from pandas import DataFrame

conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')

query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()

cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]

df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')


My output file size is about 600 MG
Until df = DataFrame... the runtime is about 2mins.
However df = DataFrame(cursor.fetchall()) is +1 Hour and a hell lot of RAM



Any suggestion how can I optimize that part of my code?
Thanks!







share|improve this question



















  • There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
    – C. Harley
    Aug 2 at 13:51










  • Start by running your query directly in mysql. shell> mysql db_name < Query.sql and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
    – blues
    2 days ago

















up vote
0
down vote

favorite












I am trying to get +10M records from a DB:



import pymssql
from pandas import DataFrame

conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')

query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()

cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]

df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')


My output file size is about 600 MG
Until df = DataFrame... the runtime is about 2mins.
However df = DataFrame(cursor.fetchall()) is +1 Hour and a hell lot of RAM



Any suggestion how can I optimize that part of my code?
Thanks!







share|improve this question



















  • There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
    – C. Harley
    Aug 2 at 13:51










  • Start by running your query directly in mysql. shell> mysql db_name < Query.sql and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
    – blues
    2 days ago













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am trying to get +10M records from a DB:



import pymssql
from pandas import DataFrame

conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')

query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()

cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]

df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')


My output file size is about 600 MG
Until df = DataFrame... the runtime is about 2mins.
However df = DataFrame(cursor.fetchall()) is +1 Hour and a hell lot of RAM



Any suggestion how can I optimize that part of my code?
Thanks!







share|improve this question











I am trying to get +10M records from a DB:



import pymssql
from pandas import DataFrame

conn = pymssql.connect(server='xxx.xxx.xxx.xxx', user='USERNAME', password='PASS!', database='DB_NAME')

query = 'Query.sql'
cursor = conn.cursor()
with open(query, 'r') as content_file:
SQL = content_file.read()

cursor.execute(SQL)
df = DataFrame(cursor.fetchall())
df.columns = [
'ID'
, 'String'
, 'Date_time'
, 'Bool'
, 'Int'
]

df.String=df.String.astype('float64')
file_path = 'out.parquet'
df.to_parquet(
file_path,
engine='pyarrow',
compression='brotli')


My output file size is about 600 MG
Until df = DataFrame... the runtime is about 2mins.
However df = DataFrame(cursor.fetchall()) is +1 Hour and a hell lot of RAM



Any suggestion how can I optimize that part of my code?
Thanks!









share|improve this question










share|improve this question




share|improve this question









asked Aug 2 at 9:25









no name

1




1











  • There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
    – C. Harley
    Aug 2 at 13:51










  • Start by running your query directly in mysql. shell> mysql db_name < Query.sql and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
    – blues
    2 days ago

















  • There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
    – C. Harley
    Aug 2 at 13:51










  • Start by running your query directly in mysql. shell> mysql db_name < Query.sql and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
    – blues
    2 days ago
















There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
– C. Harley
Aug 2 at 13:51




There could be many issues here. The SQL query, the DataFrame doing the read instead of Pandas, you doing a fetchall instead of chunking (causes the server to allocate sufficient space first before getting the results, and before sending them to your script, and holding onto the memory until your script has accepted all the data), you then redefine the columns for the dataframe instead of in your SQL statement, and then redefine all the strings into floats, before dumping the dataframe into a different database format and implementing a compression routine. Are you a COBOL programmer? ;-) j/k
– C. Harley
Aug 2 at 13:51












Start by running your query directly in mysql. shell> mysql db_name < Query.sql and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
– blues
2 days ago





Start by running your query directly in mysql. shell> mysql db_name < Query.sql and check how long that takes, so at least you will know if your slowness is in sql, python, or both.
– blues
2 days ago
















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200797%2fslow-cursor-fetchall%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200797%2fslow-cursor-fetchall%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation