Getting the average score for hotel scores in different countries using pandas

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.



For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.



I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?



Script:



# Average all scores that belong to a particular country.

import pandas as pd

# Reading original hotel reviews dataset.
df = pd.read_csv(DATASET_PATH)

# Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
df = df.loc[:, ["Hotel_Address", "Average_Score"]]

# List of tuples.
countries_w_avg_list =
for _, row in df.iterrows():
address = row[0].split()
country_name = address[len(address) - 1]
countries_w_avg_list.append( (country_name, row[1]) )

# Getting the sum of all 'Average_Score' values for each country.
d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
counter = 0
for country, individual_average in countries_w_avg_list:
if country not in d:
d[country] = [0, 0]

d[country][0] += individual_average
d[country][1] += 1

# Getting the average of all 'Average_Score' values for each country.
for key, value in d.items():
d[key] = round((d[key][0] / d[key][1]), 2)
# print(d)

# Now, I believe there are two ways to transform this dictionary in the df I want.
# 1 - Transform d in a df, and then transpose it. Then rename the columns.
# 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
# and their values as d's keys as the value for the first column and d's values as the
# values for the second column.
df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
print(df)






share|improve this question

























    up vote
    3
    down vote

    favorite












    I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.



    For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.



    I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?



    Script:



    # Average all scores that belong to a particular country.

    import pandas as pd

    # Reading original hotel reviews dataset.
    df = pd.read_csv(DATASET_PATH)

    # Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
    df = df.loc[:, ["Hotel_Address", "Average_Score"]]

    # List of tuples.
    countries_w_avg_list =
    for _, row in df.iterrows():
    address = row[0].split()
    country_name = address[len(address) - 1]
    countries_w_avg_list.append( (country_name, row[1]) )

    # Getting the sum of all 'Average_Score' values for each country.
    d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
    counter = 0
    for country, individual_average in countries_w_avg_list:
    if country not in d:
    d[country] = [0, 0]

    d[country][0] += individual_average
    d[country][1] += 1

    # Getting the average of all 'Average_Score' values for each country.
    for key, value in d.items():
    d[key] = round((d[key][0] / d[key][1]), 2)
    # print(d)

    # Now, I believe there are two ways to transform this dictionary in the df I want.
    # 1 - Transform d in a df, and then transpose it. Then rename the columns.
    # 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
    # and their values as d's keys as the value for the first column and d's values as the
    # values for the second column.
    df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
    print(df)






    share|improve this question





















      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.



      For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.



      I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?



      Script:



      # Average all scores that belong to a particular country.

      import pandas as pd

      # Reading original hotel reviews dataset.
      df = pd.read_csv(DATASET_PATH)

      # Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
      df = df.loc[:, ["Hotel_Address", "Average_Score"]]

      # List of tuples.
      countries_w_avg_list =
      for _, row in df.iterrows():
      address = row[0].split()
      country_name = address[len(address) - 1]
      countries_w_avg_list.append( (country_name, row[1]) )

      # Getting the sum of all 'Average_Score' values for each country.
      d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
      counter = 0
      for country, individual_average in countries_w_avg_list:
      if country not in d:
      d[country] = [0, 0]

      d[country][0] += individual_average
      d[country][1] += 1

      # Getting the average of all 'Average_Score' values for each country.
      for key, value in d.items():
      d[key] = round((d[key][0] / d[key][1]), 2)
      # print(d)

      # Now, I believe there are two ways to transform this dictionary in the df I want.
      # 1 - Transform d in a df, and then transpose it. Then rename the columns.
      # 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
      # and their values as d's keys as the value for the first column and d's values as the
      # values for the second column.
      df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
      print(df)






      share|improve this question











      I am diving into Data Analysis with pandas, and I have just written this Python script to calculate the average of hotel review scores of each country. The dataset contains an individual average score for each customer review, like: 8.86 or 7.95. My goal was to average all these individual scores for a particular country.



      For example, if the hotels in United Kingdom got the following hotel review scores: 8.65, 7.89, 4.35, and 6.98, I would average these four scores and create a dataframe where the first column is "Country" and the second column is the "Overall Average Score" for that country.



      I tried to write a concise code as much as I could. Would you mind giving your opinions and recommendations about it? I'll be adding this to my portfolio. What should be kept and/or avoided in a professional and real-world setting?



      Script:



      # Average all scores that belong to a particular country.

      import pandas as pd

      # Reading original hotel reviews dataset.
      df = pd.read_csv(DATASET_PATH)

      # Getting a dataframe with two columns: 'Hotel_Address' and 'Average_Score'.
      df = df.loc[:, ["Hotel_Address", "Average_Score"]]

      # List of tuples.
      countries_w_avg_list =
      for _, row in df.iterrows():
      address = row[0].split()
      country_name = address[len(address) - 1]
      countries_w_avg_list.append( (country_name, row[1]) )

      # Getting the sum of all 'Average_Score' values for each country.
      d = # Empty dictionary. It will be a dictionary with list values, like: "Netherlands": [sum, counter]
      counter = 0
      for country, individual_average in countries_w_avg_list:
      if country not in d:
      d[country] = [0, 0]

      d[country][0] += individual_average
      d[country][1] += 1

      # Getting the average of all 'Average_Score' values for each country.
      for key, value in d.items():
      d[key] = round((d[key][0] / d[key][1]), 2)
      # print(d)

      # Now, I believe there are two ways to transform this dictionary in the df I want.
      # 1 - Transform d in a df, and then transpose it. Then rename the columns.
      # 2 - Create a dataframe with the column names "Country" and "Overall Average Score"
      # and their values as d's keys as the value for the first column and d's values as the
      # values for the second column.
      df = pd.DataFrame("Country": list(d.keys()), "Overall Average Score": list(d.values()))
      print(df)








      share|improve this question










      share|improve this question




      share|improve this question









      asked Jun 26 at 13:28









      maufcost

      1394




      1394




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          You should probably use pandas.DataFrame.groupby.



          The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.



          import pandas as pd

          # Reading original hotel reviews dataset.
          # Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
          df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

          # Extract country from address
          df["Country"] = df.Hotel_Address.str.split().str[-1]
          df.drop(columns=["Hotel_Address"], inplace=True)

          # Get average average score per country, rounded to two decimal places
          print(df.groupby("Country").mean().round(2))





          share|improve this answer























          • That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
            – maufcost
            Jun 26 at 19:18







          • 1




            @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
            – Graipher
            Jun 26 at 19:21










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197283%2fgetting-the-average-score-for-hotel-scores-in-different-countries-using-pandas%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          You should probably use pandas.DataFrame.groupby.



          The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.



          import pandas as pd

          # Reading original hotel reviews dataset.
          # Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
          df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

          # Extract country from address
          df["Country"] = df.Hotel_Address.str.split().str[-1]
          df.drop(columns=["Hotel_Address"], inplace=True)

          # Get average average score per country, rounded to two decimal places
          print(df.groupby("Country").mean().round(2))





          share|improve this answer























          • That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
            – maufcost
            Jun 26 at 19:18







          • 1




            @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
            – Graipher
            Jun 26 at 19:21














          up vote
          3
          down vote



          accepted










          You should probably use pandas.DataFrame.groupby.



          The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.



          import pandas as pd

          # Reading original hotel reviews dataset.
          # Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
          df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

          # Extract country from address
          df["Country"] = df.Hotel_Address.str.split().str[-1]
          df.drop(columns=["Hotel_Address"], inplace=True)

          # Get average average score per country, rounded to two decimal places
          print(df.groupby("Country").mean().round(2))





          share|improve this answer























          • That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
            – maufcost
            Jun 26 at 19:18







          • 1




            @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
            – Graipher
            Jun 26 at 19:21












          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          You should probably use pandas.DataFrame.groupby.



          The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.



          import pandas as pd

          # Reading original hotel reviews dataset.
          # Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
          df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

          # Extract country from address
          df["Country"] = df.Hotel_Address.str.split().str[-1]
          df.drop(columns=["Hotel_Address"], inplace=True)

          # Get average average score per country, rounded to two decimal places
          print(df.groupby("Country").mean().round(2))





          share|improve this answer















          You should probably use pandas.DataFrame.groupby.



          The string manipulations to extract the countries can also be simplified using pandas.DataFrame.str methods.



          import pandas as pd

          # Reading original hotel reviews dataset.
          # Reduce dataframe to two columns: 'Hotel_Address' and 'Average_Score'.
          df = pd.read_csv(DATASET_PATH).loc[:, ["Hotel_Address", "Average_Score"]]

          # Extract country from address
          df["Country"] = df.Hotel_Address.str.split().str[-1]
          df.drop(columns=["Hotel_Address"], inplace=True)

          # Get average average score per country, rounded to two decimal places
          print(df.groupby("Country").mean().round(2))






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jun 26 at 14:07


























          answered Jun 26 at 13:39









          Graipher

          20.4k42981




          20.4k42981











          • That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
            – maufcost
            Jun 26 at 19:18







          • 1




            @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
            – Graipher
            Jun 26 at 19:21
















          • That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
            – maufcost
            Jun 26 at 19:18







          • 1




            @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
            – Graipher
            Jun 26 at 19:21















          That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
          – maufcost
          Jun 26 at 19:18





          That helps a lot. I wrote from scratch most of the algorithms in my original code. In general, do you think I wrote them well? (I know that pandas's version is a lot more efficient :) )
          – maufcost
          Jun 26 at 19:18





          1




          1




          @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
          – Graipher
          Jun 26 at 19:21




          @maufcost There would have been some small improvements possible, like using a collections.defaultdict in the calculation of the sum of average scores (no need for special casing if country not in d). Other than that it looks good and is always good practice.
          – Graipher
          Jun 26 at 19:21












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197283%2fgetting-the-average-score-for-hotel-scores-in-different-countries-using-pandas%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Greedy Best First Search implementation in Rust

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          C++11 CLH Lock Implementation