Getting the average score for hotel scores in different countries using pandas

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.

For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.

I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?

Script:

# Average all scores that belong to a particular country.

import pandas as pd

# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)

# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]

# List of tuples.
countries_w_avg_list = 
for _, row in df.iterrows():
 address = row[0].split()
 country_name = address[len(address) - 1]
 countries_w_avg_list.append( (country_name, row[1]) )

# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
 if country not in d:
 d[country] = [0, 0]

 d[country][0] += individual_average
 d[country][1] += 1

# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
 d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)

# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)

asked Jun 26 at 13:28

maufcost

1394

add a commentÂ |Â

up vote
3
down vote

favorite

Script:

# Average all scores that belong to a particular country.

import pandas as pd

# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)

# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]

# List of tuples.
countries_w_avg_list = 
for _, row in df.iterrows():
 address = row[0].split()
 country_name = address[len(address) - 1]
 countries_w_avg_list.append( (country_name, row[1]) )

# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
 if country not in d:
 d[country] = [0, 0]

 d[country][0] += individual_average
 d[country][1] += 1

# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
 d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)

# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)

asked Jun 26 at 13:28

maufcost

1394

add a commentÂ |Â

up vote
3
down vote

favorite

Script:

# Average all scores that belong to a particular country.

import pandas as pd

# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)

# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]

# List of tuples.
countries_w_avg_list = 
for _, row in df.iterrows():
 address = row[0].split()
 country_name = address[len(address) - 1]
 countries_w_avg_list.append( (country_name, row[1]) )

# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
 if country not in d:
 d[country] = [0, 0]

 d[country][0] += individual_average
 d[country][1] += 1

# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
 d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)

# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)

asked Jun 26 at 13:28

maufcost

1394

Script:

# Average all scores that belong to a particular country.

import pandas as pd

# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)

# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]

# List of tuples.
countries_w_avg_list = 
for _, row in df.iterrows():
 address = row[0].split()
 country_name = address[len(address) - 1]
 countries_w_avg_list.append( (country_name, row[1]) )

# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
 if country not in d:
 d[country] = [0, 0]

 d[country][0] += individual_average
 d[country][1] += 1

# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
 d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)

# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)

asked Jun 26 at 13:28

maufcost

1394

asked Jun 26 at 13:28

maufcost

1394

asked Jun 26 at 13:28

maufcost

1394

asked Jun 26 at 13:28

maufcost

1394

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

You should probably use pandas.DataFrame.groupby.

The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.

import pandas as pd

# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)

# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))

edited Jun 26 at 14:07

answered Jun 26 at 13:39

Graipher

20.4k42981

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â€“Â maufcost
Jun 26 at 19:18

1

@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
â€“Â Graipher
Jun 26 at 19:21

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197283%2fgetting-the-average-score-for-hotel-scores-in-different-countries-using-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

You should probably use pandas.DataFrame.groupby.

The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.

import pandas as pd

# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)

# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))

edited Jun 26 at 14:07

answered Jun 26 at 13:39

Graipher

20.4k42981

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â€“Â maufcost
Jun 26 at 19:18

1

@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
â€“Â Graipher
Jun 26 at 19:21

add a commentÂ |Â

up vote
3
down vote

accepted

You should probably use pandas.DataFrame.groupby.

The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.

import pandas as pd

# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)

# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))

edited Jun 26 at 14:07

answered Jun 26 at 13:39

Graipher

20.4k42981

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â€“Â maufcost
Jun 26 at 19:18

1

@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
â€“Â Graipher
Jun 26 at 19:21

add a commentÂ |Â

up vote
3
down vote

accepted

You should probably use pandas.DataFrame.groupby.

The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.

import pandas as pd

# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)

# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))

edited Jun 26 at 14:07

answered Jun 26 at 13:39

Graipher

20.4k42981

You should probably use pandas.DataFrame.groupby.

The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.

import pandas as pd

# Reading original hotel reviews dataset.
# Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

# Extract country from address
df["Country"] = df.Hotel_Address.str.split().str[-1]
df.drop(columns=["Hotel_Address"], inplace=True)

# Get average average score per country, rounded to two decimal places
print(df.groupby("Country").mean().round(2))

edited Jun 26 at 14:07

answered Jun 26 at 13:39

Graipher

20.4k42981

edited Jun 26 at 14:07

answered Jun 26 at 13:39

Graipher

20.4k42981

answered Jun 26 at 13:39

Graipher

20.4k42981

answered Jun 26 at 13:39

Graipher

20.4k42981

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â€“Â maufcost
Jun 26 at 19:18

1

@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
â€“Â Graipher
Jun 26 at 19:21

add a commentÂ |Â

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â€“Â maufcost
Jun 26 at 19:18

1

@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
â€“Â Graipher
Jun 26 at 19:21

That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
â€“Â maufcost
Jun 26 at 19:18

@maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
â€“Â Graipher
Jun 26 at 19:21

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

0rIiZ,de6 s8nNF3Epuhvjm Obh PkwM02j,rr8bV nbgEQN7rXjPxnJEGj2tYN ugh8I,OemRQp w7Z,6TMJzns NztKQL99YOA Ke 9ltLF

搜尋此網誌

trjhtr