Analyzing the U.S. Births dataset in Python

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
5
down vote

favorite












I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:



  • The total number of births on each month

  • The total number of births on each day of the week

Sample dataset from CSV:




year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...



It led me into this implementation:



def weekly_births(lst):
mon = birth_counter(lst, 3, 1, 4)
tue = birth_counter(lst, 3, 2, 4)
wed = birth_counter(lst, 3, 3, 4)
thu = birth_counter(lst, 3, 4, 4)
fri = birth_counter(lst, 3, 5, 4)
sat = birth_counter(lst, 3, 6, 4)
sun = birth_counter(lst, 3, 7, 4)

births_per_week =
1: mon,
2: tue,
3: wed,
4: thu,
5: fri,
6: sat,
7: sun


return births_per_week

def monthly_births(lst):

jan_births = birth_counter(lst, 1, 1, 4)
feb_births = birth_counter(lst, 1, 2, 4)
mar_births = birth_counter(lst, 1, 3, 4)
apr_births = birth_counter(lst, 1, 4, 4)
may_births = birth_counter(lst, 1, 5, 4)
jun_births = birth_counter(lst, 1, 6, 4)
jul_births = birth_counter(lst, 1, 7, 4)
aug_births = birth_counter(lst, 1, 8, 4)
sep_births = birth_counter(lst, 1, 9, 4)
oct_births = birth_counter(lst, 1, 10, 4)
nov_births = birth_counter(lst, 1, 11, 4)
dec_births = birth_counter(lst, 1, 12, 4)

births_per_month =
1: jan_births,
2: feb_births,
3: mar_births,
4: apr_births,
5: may_births,
6: jun_births,
7: jul_births,
8: aug_births,
9: sep_births,
10: oct_births,
11: nov_births,
12: dec_births


return births_per_month


The birth_counter function:



def birth_counter(lst, index, head, tail):
sum = 0
for each in lst:
if each[index] == head:
sum = sum + each[tail]
return sum


The parameters:




  • lst - The list of dataset


  • index - The lst's index


  • head - Will be compared from lst's index


  • tail - The target data that needs to be computed

Example usage:



[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)

if sample_births[1] == 1 then
extract index[4] #8096


Questions regarding weekly_births and monthly_births:



  1. If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?






share|improve this question



























    up vote
    5
    down vote

    favorite












    I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:



    • The total number of births on each month

    • The total number of births on each day of the week

    Sample dataset from CSV:




    year, month, date_of_month, day_of_week, births
    1994, 1, 1, 6, 8096
    1994, 1, 2, 7, 7772
    1994, 1, 3, 1, 10142
    1994, 1, 4, 2, 11248
    1994, 1, 5, 3, 11053
    ...



    It led me into this implementation:



    def weekly_births(lst):
    mon = birth_counter(lst, 3, 1, 4)
    tue = birth_counter(lst, 3, 2, 4)
    wed = birth_counter(lst, 3, 3, 4)
    thu = birth_counter(lst, 3, 4, 4)
    fri = birth_counter(lst, 3, 5, 4)
    sat = birth_counter(lst, 3, 6, 4)
    sun = birth_counter(lst, 3, 7, 4)

    births_per_week =
    1: mon,
    2: tue,
    3: wed,
    4: thu,
    5: fri,
    6: sat,
    7: sun


    return births_per_week

    def monthly_births(lst):

    jan_births = birth_counter(lst, 1, 1, 4)
    feb_births = birth_counter(lst, 1, 2, 4)
    mar_births = birth_counter(lst, 1, 3, 4)
    apr_births = birth_counter(lst, 1, 4, 4)
    may_births = birth_counter(lst, 1, 5, 4)
    jun_births = birth_counter(lst, 1, 6, 4)
    jul_births = birth_counter(lst, 1, 7, 4)
    aug_births = birth_counter(lst, 1, 8, 4)
    sep_births = birth_counter(lst, 1, 9, 4)
    oct_births = birth_counter(lst, 1, 10, 4)
    nov_births = birth_counter(lst, 1, 11, 4)
    dec_births = birth_counter(lst, 1, 12, 4)

    births_per_month =
    1: jan_births,
    2: feb_births,
    3: mar_births,
    4: apr_births,
    5: may_births,
    6: jun_births,
    7: jul_births,
    8: aug_births,
    9: sep_births,
    10: oct_births,
    11: nov_births,
    12: dec_births


    return births_per_month


    The birth_counter function:



    def birth_counter(lst, index, head, tail):
    sum = 0
    for each in lst:
    if each[index] == head:
    sum = sum + each[tail]
    return sum


    The parameters:




    • lst - The list of dataset


    • index - The lst's index


    • head - Will be compared from lst's index


    • tail - The target data that needs to be computed

    Example usage:



    [lst] [0] [1] [2] [3] [4]
    lst = [1994, 1, 1, 6, 8096]...
    sample_births = birth_counter(lst, 1, 1, 4)

    if sample_births[1] == 1 then
    extract index[4] #8096


    Questions regarding weekly_births and monthly_births:



    1. If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?






    share|improve this question























      up vote
      5
      down vote

      favorite









      up vote
      5
      down vote

      favorite











      I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:



      • The total number of births on each month

      • The total number of births on each day of the week

      Sample dataset from CSV:




      year, month, date_of_month, day_of_week, births
      1994, 1, 1, 6, 8096
      1994, 1, 2, 7, 7772
      1994, 1, 3, 1, 10142
      1994, 1, 4, 2, 11248
      1994, 1, 5, 3, 11053
      ...



      It led me into this implementation:



      def weekly_births(lst):
      mon = birth_counter(lst, 3, 1, 4)
      tue = birth_counter(lst, 3, 2, 4)
      wed = birth_counter(lst, 3, 3, 4)
      thu = birth_counter(lst, 3, 4, 4)
      fri = birth_counter(lst, 3, 5, 4)
      sat = birth_counter(lst, 3, 6, 4)
      sun = birth_counter(lst, 3, 7, 4)

      births_per_week =
      1: mon,
      2: tue,
      3: wed,
      4: thu,
      5: fri,
      6: sat,
      7: sun


      return births_per_week

      def monthly_births(lst):

      jan_births = birth_counter(lst, 1, 1, 4)
      feb_births = birth_counter(lst, 1, 2, 4)
      mar_births = birth_counter(lst, 1, 3, 4)
      apr_births = birth_counter(lst, 1, 4, 4)
      may_births = birth_counter(lst, 1, 5, 4)
      jun_births = birth_counter(lst, 1, 6, 4)
      jul_births = birth_counter(lst, 1, 7, 4)
      aug_births = birth_counter(lst, 1, 8, 4)
      sep_births = birth_counter(lst, 1, 9, 4)
      oct_births = birth_counter(lst, 1, 10, 4)
      nov_births = birth_counter(lst, 1, 11, 4)
      dec_births = birth_counter(lst, 1, 12, 4)

      births_per_month =
      1: jan_births,
      2: feb_births,
      3: mar_births,
      4: apr_births,
      5: may_births,
      6: jun_births,
      7: jul_births,
      8: aug_births,
      9: sep_births,
      10: oct_births,
      11: nov_births,
      12: dec_births


      return births_per_month


      The birth_counter function:



      def birth_counter(lst, index, head, tail):
      sum = 0
      for each in lst:
      if each[index] == head:
      sum = sum + each[tail]
      return sum


      The parameters:




      • lst - The list of dataset


      • index - The lst's index


      • head - Will be compared from lst's index


      • tail - The target data that needs to be computed

      Example usage:



      [lst] [0] [1] [2] [3] [4]
      lst = [1994, 1, 1, 6, 8096]...
      sample_births = birth_counter(lst, 1, 1, 4)

      if sample_births[1] == 1 then
      extract index[4] #8096


      Questions regarding weekly_births and monthly_births:



      1. If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?






      share|improve this question













      I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:



      • The total number of births on each month

      • The total number of births on each day of the week

      Sample dataset from CSV:




      year, month, date_of_month, day_of_week, births
      1994, 1, 1, 6, 8096
      1994, 1, 2, 7, 7772
      1994, 1, 3, 1, 10142
      1994, 1, 4, 2, 11248
      1994, 1, 5, 3, 11053
      ...



      It led me into this implementation:



      def weekly_births(lst):
      mon = birth_counter(lst, 3, 1, 4)
      tue = birth_counter(lst, 3, 2, 4)
      wed = birth_counter(lst, 3, 3, 4)
      thu = birth_counter(lst, 3, 4, 4)
      fri = birth_counter(lst, 3, 5, 4)
      sat = birth_counter(lst, 3, 6, 4)
      sun = birth_counter(lst, 3, 7, 4)

      births_per_week =
      1: mon,
      2: tue,
      3: wed,
      4: thu,
      5: fri,
      6: sat,
      7: sun


      return births_per_week

      def monthly_births(lst):

      jan_births = birth_counter(lst, 1, 1, 4)
      feb_births = birth_counter(lst, 1, 2, 4)
      mar_births = birth_counter(lst, 1, 3, 4)
      apr_births = birth_counter(lst, 1, 4, 4)
      may_births = birth_counter(lst, 1, 5, 4)
      jun_births = birth_counter(lst, 1, 6, 4)
      jul_births = birth_counter(lst, 1, 7, 4)
      aug_births = birth_counter(lst, 1, 8, 4)
      sep_births = birth_counter(lst, 1, 9, 4)
      oct_births = birth_counter(lst, 1, 10, 4)
      nov_births = birth_counter(lst, 1, 11, 4)
      dec_births = birth_counter(lst, 1, 12, 4)

      births_per_month =
      1: jan_births,
      2: feb_births,
      3: mar_births,
      4: apr_births,
      5: may_births,
      6: jun_births,
      7: jul_births,
      8: aug_births,
      9: sep_births,
      10: oct_births,
      11: nov_births,
      12: dec_births


      return births_per_month


      The birth_counter function:



      def birth_counter(lst, index, head, tail):
      sum = 0
      for each in lst:
      if each[index] == head:
      sum = sum + each[tail]
      return sum


      The parameters:




      • lst - The list of dataset


      • index - The lst's index


      • head - Will be compared from lst's index


      • tail - The target data that needs to be computed

      Example usage:



      [lst] [0] [1] [2] [3] [4]
      lst = [1994, 1, 1, 6, 8096]...
      sample_births = birth_counter(lst, 1, 1, 4)

      if sample_births[1] == 1 then
      extract index[4] #8096


      Questions regarding weekly_births and monthly_births:



      1. If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?








      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 27 at 14:23









      200_success

      123k14143401




      123k14143401









      asked Jan 27 at 6:00









      Yodism

      4313920




      4313920




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).



          Your current code boils down to very few lines with pandas:



          import pandas as pd

          df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
          birth_per_month = df.groupby("month").births.sum()
          birth_per_weekday = df.groupby("day_of_week").births.sum()

          print(birth_per_month)
          print()
          print(birth_per_weekday)

          #month
          #1 48311
          #Name: births, dtype: int64

          #day_of_week
          #1 10142
          #2 11248
          #3 11053
          #6 8096
          #7 7772
          #Name: births, dtype: int64





          share|improve this answer























          • Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
            – Yodism
            Jan 28 at 1:11










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186111%2fanalyzing-the-u-s-births-dataset-in-python%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          5
          down vote



          accepted










          If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).



          Your current code boils down to very few lines with pandas:



          import pandas as pd

          df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
          birth_per_month = df.groupby("month").births.sum()
          birth_per_weekday = df.groupby("day_of_week").births.sum()

          print(birth_per_month)
          print()
          print(birth_per_weekday)

          #month
          #1 48311
          #Name: births, dtype: int64

          #day_of_week
          #1 10142
          #2 11248
          #3 11053
          #6 8096
          #7 7772
          #Name: births, dtype: int64





          share|improve this answer























          • Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
            – Yodism
            Jan 28 at 1:11














          up vote
          5
          down vote



          accepted










          If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).



          Your current code boils down to very few lines with pandas:



          import pandas as pd

          df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
          birth_per_month = df.groupby("month").births.sum()
          birth_per_weekday = df.groupby("day_of_week").births.sum()

          print(birth_per_month)
          print()
          print(birth_per_weekday)

          #month
          #1 48311
          #Name: births, dtype: int64

          #day_of_week
          #1 10142
          #2 11248
          #3 11053
          #6 8096
          #7 7772
          #Name: births, dtype: int64





          share|improve this answer























          • Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
            – Yodism
            Jan 28 at 1:11












          up vote
          5
          down vote



          accepted







          up vote
          5
          down vote



          accepted






          If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).



          Your current code boils down to very few lines with pandas:



          import pandas as pd

          df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
          birth_per_month = df.groupby("month").births.sum()
          birth_per_weekday = df.groupby("day_of_week").births.sum()

          print(birth_per_month)
          print()
          print(birth_per_weekday)

          #month
          #1 48311
          #Name: births, dtype: int64

          #day_of_week
          #1 10142
          #2 11248
          #3 11053
          #6 8096
          #7 7772
          #Name: births, dtype: int64





          share|improve this answer















          If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).



          Your current code boils down to very few lines with pandas:



          import pandas as pd

          df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
          birth_per_month = df.groupby("month").births.sum()
          birth_per_weekday = df.groupby("day_of_week").births.sum()

          print(birth_per_month)
          print()
          print(birth_per_weekday)

          #month
          #1 48311
          #Name: births, dtype: int64

          #day_of_week
          #1 10142
          #2 11248
          #3 11053
          #6 8096
          #7 7772
          #Name: births, dtype: int64






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jan 28 at 11:28


























          answered Jan 27 at 14:13









          Graipher

          20.5k43081




          20.5k43081











          • Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
            – Yodism
            Jan 28 at 1:11
















          • Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
            – Yodism
            Jan 28 at 1:11















          Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
          – Yodism
          Jan 28 at 1:11




          Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
          – Yodism
          Jan 28 at 1:11












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186111%2fanalyzing-the-u-s-births-dataset-in-python%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Chat program with C++ and SFML

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          Will my employers contract hold up in court?