Getting the average score for hotel scores in different countries using pandas
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
add a comment |Â
up vote
3
down vote
favorite
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.
For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.
I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?
Script:
# Average all scores that belong to a particular country.
import pandas as pd
# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)
# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]
# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )
# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]
d[country][0] += individual_average
d[country][1] += 1
# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)
# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)
python python-3.x pandas
asked Jun 26 at 13:28
maufcost
1394
1394
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby
.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str
methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdict
in the calculation of the sum of average scores (no need for special casingif country not in d
). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby
.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str
methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdict
in the calculation of the sum of average scores (no need for special casingif country not in d
). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby
.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str
methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdict
in the calculation of the sum of average scores (no need for special casingif country not in d
). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
You should probably use pandas.DataFrame.groupby
.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str
methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
You should probably use pandas.DataFrame.groupby
.
The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str
methods.
import pandas as pd
# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]
# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)
# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))
edited Jun 26 at 14:07
answered Jun 26 at 13:39
Graipher
20.4k42981
20.4k42981
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdict
in the calculation of the sum of average scores (no need for special casingif country not in d
). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
add a comment |Â
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
@maufcost There would have been some small improvements possible, like using acollections.defaultdict
in the calculation of the sum of average scores (no need for special casingif country not in d
). Other than that it looks good and is always good practice.
â Graipher
Jun 26 at 19:21
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â maufcost
Jun 26 at 19:18
1
1
@maufcost There would have been some small improvements possible, like using a
collections.defaultdict
in the calculation of the sum of average scores (no need for special casing if country not in d
). Other than that it looks good and is always good practice.â Graipher
Jun 26 at 19:21
@maufcost There would have been some small improvements possible, like using a
collections.defaultdict
in the calculation of the sum of average scores (no need for special casing if country not in d
). Other than that it looks good and is always good practice.â Graipher
Jun 26 at 19:21
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197283%2fgetting-the-average-score-for-hotel-scores-in-different-countries-using-pandas%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password