Compute leaderboard of competitive riders throughout a season

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:

['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final', 
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position', 
'round1_total_final', 'round1_total_final_position', 'round1_total_timed', 
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc

The 'round1_qualy', 'round1_final', 'round1_timed' all come from different tables and are added using another function. Once they are in the DB all of total columns are calculated using generated columns in MySQL.

I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.

Here is the code I have right now:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = 
 current_points[points_column].rank(ascending=0,
 method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = 'NULL'

 # Add rows to the DB
 for row in df.itertuples():
 connection.execute('UPDATE 0 '
 'SET 1 = 2 '
 'WHERE year = 3 and name = "4"'
 ''.format(table, position_column,
 getattr(row, position_column),
 getattr(row, 'year'),
 getattr(row, 'name')))
 trans.commit()
 connection.close()

The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.

Is there any way to get the complexity down?

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

asked Mar 1 at 10:26

moto

161

Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â€“Â moto
Mar 1 at 13:16

add a commentÂ |Â

up vote
3
down vote

favorite

I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:

['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final', 
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position', 
'round1_total_final', 'round1_total_final_position', 'round1_total_timed', 
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc

I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.

Here is the code I have right now:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = 
 current_points[points_column].rank(ascending=0,
 method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = 'NULL'

 # Add rows to the DB
 for row in df.itertuples():
 connection.execute('UPDATE 0 '
 'SET 1 = 2 '
 'WHERE year = 3 and name = "4"'
 ''.format(table, position_column,
 getattr(row, position_column),
 getattr(row, 'year'),
 getattr(row, 'name')))
 trans.commit()
 connection.close()

The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.

Is there any way to get the complexity down?

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

asked Mar 1 at 10:26

moto

161

Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â€“Â moto
Mar 1 at 13:16

add a commentÂ |Â

up vote
3
down vote

favorite

I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:

['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final', 
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position', 
'round1_total_final', 'round1_total_final_position', 'round1_total_timed', 
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc

I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.

Here is the code I have right now:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = 
 current_points[points_column].rank(ascending=0,
 method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = 'NULL'

 # Add rows to the DB
 for row in df.itertuples():
 connection.execute('UPDATE 0 '
 'SET 1 = 2 '
 'WHERE year = 3 and name = "4"'
 ''.format(table, position_column,
 getattr(row, position_column),
 getattr(row, 'year'),
 getattr(row, 'name')))
 trans.commit()
 connection.close()

The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.

Is there any way to get the complexity down?

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

asked Mar 1 at 10:26

moto

161

I have a table in my DB that contains Riders as rows and their points throughout the season as columns. Here are the columns:

['name', 'wwq_id', 'wwq_code', 'year', 'round1_qualy', 'round1_final', 
'round1_timed', 'round1_total_qualy', 'round1_total_qualy_position', 
'round1_total_final', 'round1_total_final_position', 'round1_total_timed', 
'round1_total_timed_position', 'round1_total', 'round1_total_position'...etc

I need to calculate the position of the rider at different points in the season. I.E. After round one qualy, after round two final, etc.

Here is the code I have right now:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql('SELECT * FROM 0'.format(table), engine)
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = 
 current_points[points_column].rank(ascending=0,
 method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = 'NULL'

 # Add rows to the DB
 for row in df.itertuples():
 connection.execute('UPDATE 0 '
 'SET 1 = 2 '
 'WHERE year = 3 and name = "4"'
 ''.format(table, position_column,
 getattr(row, position_column),
 getattr(row, 'year'),
 getattr(row, 'name')))
 trans.commit()
 connection.close()

The code works perfectly, it just takes a long time. This function doesn't get executed very often, but considering there are about 8000 rows per table, it takes a bit of time.

Is there any way to get the complexity down?

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

asked Mar 1 at 10:26

moto

161

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

edited Mar 1 at 10:40

Mathias Ettinger

21.9k32876

asked Mar 1 at 10:26

moto

161

asked Mar 1 at 10:26

moto

161

asked Mar 1 at 10:26

moto

161

Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â€“Â moto
Mar 1 at 13:16

add a commentÂ |Â

Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â€“Â moto
Mar 1 at 13:16

Accidentally posted outside of my account(as a guest) so I can't comment. Thanks for your input! The only problem is that there are generated columns and pre-defined datatypes in the table so I'm not able to recreate it without breaking the data pipeline. Thinking about it now, that could have been a bad idea to use them...
â€“Â moto
Mar 1 at 13:16

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:

load the data (using pd.read_sql_table);

perform the computation of the positions (this may require you to use None or pd.np.NaN instead of 'NULL');

write the whole new table at once at the end (using df.to_sql).

The rewrite could look like:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql_table(table, engine)
 compute_positions_for_table(df)
 df.to_sql(table, engine, if_exists='replace')
 trans.commit()
 connection.close()


def compute_positions_for_table(df):
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = current_points[points_column].rank(ascending=0, method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = None

And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.

edited Mar 1 at 14:04

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188590%2fcompute-leaderboard-of-competitive-riders-throughout-a-season%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:

load the data (using pd.read_sql_table);

perform the computation of the positions (this may require you to use None or pd.np.NaN instead of 'NULL');

write the whole new table at once at the end (using df.to_sql).

The rewrite could look like:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql_table(table, engine)
 compute_positions_for_table(df)
 df.to_sql(table, engine, if_exists='replace')
 trans.commit()
 connection.close()


def compute_positions_for_table(df):
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = current_points[points_column].rank(ascending=0, method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = None

And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.

edited Mar 1 at 14:04

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

add a commentÂ |Â

up vote
3
down vote

You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:

load the data (using pd.read_sql_table);

perform the computation of the positions (this may require you to use None or pd.np.NaN instead of 'NULL');

write the whole new table at once at the end (using df.to_sql).

The rewrite could look like:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql_table(table, engine)
 compute_positions_for_table(df)
 df.to_sql(table, engine, if_exists='replace')
 trans.commit()
 connection.close()


def compute_positions_for_table(df):
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = current_points[points_column].rank(ascending=0, method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = None

And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.

edited Mar 1 at 14:04

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

add a commentÂ |Â

up vote
3
down vote

You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:

load the data (using pd.read_sql_table);

perform the computation of the positions (this may require you to use None or pd.np.NaN instead of 'NULL');

write the whole new table at once at the end (using df.to_sql).

The rewrite could look like:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql_table(table, engine)
 compute_positions_for_table(df)
 df.to_sql(table, engine, if_exists='replace')
 trans.commit()
 connection.close()


def compute_positions_for_table(df):
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = current_points[points_column].rank(ascending=0, method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = None

And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.

edited Mar 1 at 14:04

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

You should strive to reduce the number of queries to your DB. Instead of updating the whole table, row by row, each time you compute a new value, you could:

load the data (using pd.read_sql_table);

perform the computation of the positions (this may require you to use None or pd.np.NaN instead of 'NULL');

write the whole new table at once at the end (using df.to_sql).

The rewrite could look like:

def set_positions():
 """Add positions to the DB for each column in our overall table"""
 engine = sql.create_engine(MYSQL_CON_STRING)
 connection = engine.connect()
 trans = connection.begin()

 # Men, Women, Junior
 for category in CATEGORIES:
 table = category + '_overall'
 df = pd.read_sql_table(table, engine)
 compute_positions_for_table(df)
 df.to_sql(table, engine, if_exists='replace')
 trans.commit()
 connection.close()


def compute_positions_for_table(df):
 points_columns_list = [col for col in df.columns
 if 'total' in col and 'position' not in col]

 # Calculate the rank based on current total points column and
 # insert into current column index + 1
 for points_column in points_columns_list:
 position_index = df.columns.get_loc(points_column) + 1
 position_column = df.columns[position_index]
 current_points = df.groupby('year')
 position = current_points[points_column].rank(ascending=0, method='min').astype(int)
 df.loc[:, position_column] = position.values

 # No points awarded in this and/or previous rounds. Everyone is 
 # at 0 so everyone is rank 1
 df.loc[(df[points_column] == 0)
 & (df[position_column] == 1), position_column] = None

And even if you can't "replace" the table, you should still update the table only once at the end of the computation rather than at each new column computation.

edited Mar 1 at 14:04

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

edited Mar 1 at 14:04

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

answered Mar 1 at 12:30

Mathias Ettinger

21.9k32876

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr