Compute leaderboard of competitive riders throughout a season
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:
['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final',
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position',
'round1_total_final', 'round1_total_final_position', 'round1_total_timed',
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc
The 'round1_qualy', 'round1_final', 'round1_timed'
all come from different tables and are added using another function. Once they are in the DB all of total
columns are calculated using generated columns in MySQL.
I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.
Here is the code I have right now:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position =
current_points[points_column].rank(ascending=0,
method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = 'NULL'
# Add rows to the DB
for row in df.itertuples():
connection.execute('UPDATE 0 '
'SET 1 = 2 '
'WHERE year = 3 and name = "4"'
''.format(table, position_column,
getattr(row, position_column),
getattr(row, 'year'),
getattr(row, 'name')))
trans.commit()
connection.close()
The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.
Is there any way to get the complexity down?
python mysql complexity pandas
add a comment |Â
up vote
3
down vote
favorite
I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:
['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final',
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position',
'round1_total_final', 'round1_total_final_position', 'round1_total_timed',
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc
The 'round1_qualy', 'round1_final', 'round1_timed'
all come from different tables and are added using another function. Once they are in the DB all of total
columns are calculated using generated columns in MySQL.
I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.
Here is the code I have right now:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position =
current_points[points_column].rank(ascending=0,
method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = 'NULL'
# Add rows to the DB
for row in df.itertuples():
connection.execute('UPDATE 0 '
'SET 1 = 2 '
'WHERE year = 3 and name = "4"'
''.format(table, position_column,
getattr(row, position_column),
getattr(row, 'year'),
getattr(row, 'name')))
trans.commit()
connection.close()
The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.
Is there any way to get the complexity down?
python mysql complexity pandas
Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â moto
Mar 1 at 13:16
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:
['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final',
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position',
'round1_total_final', 'round1_total_final_position', 'round1_total_timed',
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc
The 'round1_qualy', 'round1_final', 'round1_timed'
all come from different tables and are added using another function. Once they are in the DB all of total
columns are calculated using generated columns in MySQL.
I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.
Here is the code I have right now:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position =
current_points[points_column].rank(ascending=0,
method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = 'NULL'
# Add rows to the DB
for row in df.itertuples():
connection.execute('UPDATE 0 '
'SET 1 = 2 '
'WHERE year = 3 and name = "4"'
''.format(table, position_column,
getattr(row, position_column),
getattr(row, 'year'),
getattr(row, 'name')))
trans.commit()
connection.close()
The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.
Is there any way to get the complexity down?
python mysql complexity pandas
I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:
['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final',
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position',
'round1_total_final', 'round1_total_final_position', 'round1_total_timed',
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc
The 'round1_qualy', 'round1_final', 'round1_timed'
all come from different tables and are added using another function. Once they are in the DB all of total
columns are calculated using generated columns in MySQL.
I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.
Here is the code I have right now:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position =
current_points[points_column].rank(ascending=0,
method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = 'NULL'
# Add rows to the DB
for row in df.itertuples():
connection.execute('UPDATE 0 '
'SET 1 = 2 '
'WHERE year = 3 and name = "4"'
''.format(table, position_column,
getattr(row, position_column),
getattr(row, 'year'),
getattr(row, 'name')))
trans.commit()
connection.close()
The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.
Is there any way to get the complexity down?
python mysql complexity pandas
edited Mar 1 at 10:40
Mathias Ettinger
21.9k32876
21.9k32876
asked Mar 1 at 10:26
moto
161
161
Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â moto
Mar 1 at 13:16
add a comment |Â
Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â moto
Mar 1 at 13:16
Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â moto
Mar 1 at 13:16
Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â moto
Mar 1 at 13:16
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:
- load the data (using
pd.read_sql_table
); - perform the computation of the positions (this may require you to use
None
orpd.np.NaN
instead of'NULL'
); - write the whole new table at once at the end (using
df.to_sql
).
The rewrite could look like:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql_table(table, engine)
compute_positions_for_table(df)
df.to_sql(table, engine, if_exists='replace')
trans.commit()
connection.close()
def compute_positions_for_table(df):
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position = current_points[points_column].rank(ascending=0, method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = None
And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:
- load the data (using
pd.read_sql_table
); - perform the computation of the positions (this may require you to use
None
orpd.np.NaN
instead of'NULL'
); - write the whole new table at once at the end (using
df.to_sql
).
The rewrite could look like:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql_table(table, engine)
compute_positions_for_table(df)
df.to_sql(table, engine, if_exists='replace')
trans.commit()
connection.close()
def compute_positions_for_table(df):
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position = current_points[points_column].rank(ascending=0, method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = None
And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.
add a comment |Â
up vote
3
down vote
You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:
- load the data (using
pd.read_sql_table
); - perform the computation of the positions (this may require you to use
None
orpd.np.NaN
instead of'NULL'
); - write the whole new table at once at the end (using
df.to_sql
).
The rewrite could look like:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql_table(table, engine)
compute_positions_for_table(df)
df.to_sql(table, engine, if_exists='replace')
trans.commit()
connection.close()
def compute_positions_for_table(df):
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position = current_points[points_column].rank(ascending=0, method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = None
And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:
- load the data (using
pd.read_sql_table
); - perform the computation of the positions (this may require you to use
None
orpd.np.NaN
instead of'NULL'
); - write the whole new table at once at the end (using
df.to_sql
).
The rewrite could look like:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql_table(table, engine)
compute_positions_for_table(df)
df.to_sql(table, engine, if_exists='replace')
trans.commit()
connection.close()
def compute_positions_for_table(df):
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position = current_points[points_column].rank(ascending=0, method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = None
And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.
You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:
- load the data (using
pd.read_sql_table
); - perform the computation of the positions (this may require you to use
None
orpd.np.NaN
instead of'NULL'
); - write the whole new table at once at the end (using
df.to_sql
).
The rewrite could look like:
def set_positions():
"""Add positions to the DB for each column in our overall table"""
engine = sql.create_engine(MYSQL_CON_STRING)
connection = engine.connect()
trans = connection.begin()
# Men, Women, Junior
for category in CATEGORIES:
table = category + '_overall'
df = pd.read_sql_table(table, engine)
compute_positions_for_table(df)
df.to_sql(table, engine, if_exists='replace')
trans.commit()
connection.close()
def compute_positions_for_table(df):
points_columns_list = [col for col in df.columns
if 'total' in col and 'position' not in col]
# Calculate the rank based on current total points column and
# insert into current column index + 1
for points_column in points_columns_list:
position_index = df.columns.get_loc(points_column) + 1
position_column = df.columns[position_index]
current_points = df.groupby('year')
position = current_points[points_column].rank(ascending=0, method='min').astype(int)
df.loc[:, position_column] = position.values
# No points awarded in this and/or previous rounds. Everyone is
# at 0 so everyone is rank 1
df.loc[(df[points_column] == 0)
& (df[position_column] == 1), position_column] = None
And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.
edited Mar 1 at 14:04
answered Mar 1 at 12:30
Mathias Ettinger
21.9k32876
21.9k32876
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188590%2fcompute-leaderboard-of-competitive-riders-throughout-a-season%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â moto
Mar 1 at 13:16