Getting the average score for hotel scores in different countries using pandas

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
add a comment |Â
up vote
3
down vote
favorite
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
asked Jun 26 at 13:28
maufcost
1394
1394
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdictin the calculation of the sum of average scores (no need for special casingif country not in d). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdictin the calculation of the sum of average scores (no need for special casingif country not in d). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdictin the calculation of the sum of average scores (no need for special casingif country not in d). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
You should probably use pandas.DataFrame.groupby.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
edited Jun 26 at 14:07
answered Jun 26 at 13:39
Graipher
20.4k42981
20.4k42981
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdictin the calculation of the sum of average scores (no need for special casingif country not in d). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdictin the calculation of the sum of average scores (no need for special casingif country not in d). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
1
@maufcost There would have been some small improvements possible, like using a
collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.â Graipher
Jun 26 at 19:21
@maufcost There would have been some small improvements possible, like using a
collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.â Graipher
Jun 26 at 19:21
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197283%2fgetting-the-average-score-for-hotel-scores-in-different-countries-using-pandas%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password