Analyzing the U.S. Births dataset in Python

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
5
down vote

favorite

I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:

The total number of births on each month

The total number of births on each day of the week

Sample dataset from CSV:

year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...

It led me into this implementation:

def weekly_births(lst):
 mon = birth_counter(lst, 3, 1, 4)
 tue = birth_counter(lst, 3, 2, 4)
 wed = birth_counter(lst, 3, 3, 4)
 thu = birth_counter(lst, 3, 4, 4)
 fri = birth_counter(lst, 3, 5, 4)
 sat = birth_counter(lst, 3, 6, 4)
 sun = birth_counter(lst, 3, 7, 4)

 births_per_week = 
 1: mon,
 2: tue,
 3: wed,
 4: thu,
 5: fri,
 6: sat,
 7: sun
 

 return births_per_week

def monthly_births(lst):

 jan_births = birth_counter(lst, 1, 1, 4)
 feb_births = birth_counter(lst, 1, 2, 4)
 mar_births = birth_counter(lst, 1, 3, 4)
 apr_births = birth_counter(lst, 1, 4, 4)
 may_births = birth_counter(lst, 1, 5, 4)
 jun_births = birth_counter(lst, 1, 6, 4)
 jul_births = birth_counter(lst, 1, 7, 4)
 aug_births = birth_counter(lst, 1, 8, 4)
 sep_births = birth_counter(lst, 1, 9, 4)
 oct_births = birth_counter(lst, 1, 10, 4)
 nov_births = birth_counter(lst, 1, 11, 4)
 dec_births = birth_counter(lst, 1, 12, 4)

 births_per_month = 
 1: jan_births,
 2: feb_births,
 3: mar_births,
 4: apr_births,
 5: may_births,
 6: jun_births,
 7: jul_births,
 8: aug_births,
 9: sep_births,
 10: oct_births,
 11: nov_births,
 12: dec_births
 

 return births_per_month

The birth_counter function:

def birth_counter(lst, index, head, tail):
 sum = 0
 for each in lst:
 if each[index] == head:
 sum = sum + each[tail]
 return sum

The parameters:

lst - The list of dataset

index - The lst's index

head - Will be compared from lst's index

tail - The target data that needs to be computed

Example usage:

[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)

 if sample_births[1] == 1 then
 extract index[4] #8096

Questions regarding weekly_births and monthly_births:

If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?

edited Jan 27 at 14:23

200_success

123k14143401

asked Jan 27 at 6:00

Yodism

4313920

add a commentÂ |Â

up vote
5
down vote

favorite

I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:

The total number of births on each month

The total number of births on each day of the week

Sample dataset from CSV:

year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...

It led me into this implementation:

def weekly_births(lst):
 mon = birth_counter(lst, 3, 1, 4)
 tue = birth_counter(lst, 3, 2, 4)
 wed = birth_counter(lst, 3, 3, 4)
 thu = birth_counter(lst, 3, 4, 4)
 fri = birth_counter(lst, 3, 5, 4)
 sat = birth_counter(lst, 3, 6, 4)
 sun = birth_counter(lst, 3, 7, 4)

 births_per_week = 
 1: mon,
 2: tue,
 3: wed,
 4: thu,
 5: fri,
 6: sat,
 7: sun
 

 return births_per_week

def monthly_births(lst):

 jan_births = birth_counter(lst, 1, 1, 4)
 feb_births = birth_counter(lst, 1, 2, 4)
 mar_births = birth_counter(lst, 1, 3, 4)
 apr_births = birth_counter(lst, 1, 4, 4)
 may_births = birth_counter(lst, 1, 5, 4)
 jun_births = birth_counter(lst, 1, 6, 4)
 jul_births = birth_counter(lst, 1, 7, 4)
 aug_births = birth_counter(lst, 1, 8, 4)
 sep_births = birth_counter(lst, 1, 9, 4)
 oct_births = birth_counter(lst, 1, 10, 4)
 nov_births = birth_counter(lst, 1, 11, 4)
 dec_births = birth_counter(lst, 1, 12, 4)

 births_per_month = 
 1: jan_births,
 2: feb_births,
 3: mar_births,
 4: apr_births,
 5: may_births,
 6: jun_births,
 7: jul_births,
 8: aug_births,
 9: sep_births,
 10: oct_births,
 11: nov_births,
 12: dec_births
 

 return births_per_month

The birth_counter function:

def birth_counter(lst, index, head, tail):
 sum = 0
 for each in lst:
 if each[index] == head:
 sum = sum + each[tail]
 return sum

The parameters:

lst - The list of dataset

index - The lst's index

head - Will be compared from lst's index

tail - The target data that needs to be computed

Example usage:

[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)

 if sample_births[1] == 1 then
 extract index[4] #8096

Questions regarding weekly_births and monthly_births:

If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?

edited Jan 27 at 14:23

200_success

123k14143401

asked Jan 27 at 6:00

Yodism

4313920

add a commentÂ |Â

up vote
5
down vote

favorite

I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:

The total number of births on each month

The total number of births on each day of the week

Sample dataset from CSV:

year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...

It led me into this implementation:

def weekly_births(lst):
 mon = birth_counter(lst, 3, 1, 4)
 tue = birth_counter(lst, 3, 2, 4)
 wed = birth_counter(lst, 3, 3, 4)
 thu = birth_counter(lst, 3, 4, 4)
 fri = birth_counter(lst, 3, 5, 4)
 sat = birth_counter(lst, 3, 6, 4)
 sun = birth_counter(lst, 3, 7, 4)

 births_per_week = 
 1: mon,
 2: tue,
 3: wed,
 4: thu,
 5: fri,
 6: sat,
 7: sun
 

 return births_per_week

def monthly_births(lst):

 jan_births = birth_counter(lst, 1, 1, 4)
 feb_births = birth_counter(lst, 1, 2, 4)
 mar_births = birth_counter(lst, 1, 3, 4)
 apr_births = birth_counter(lst, 1, 4, 4)
 may_births = birth_counter(lst, 1, 5, 4)
 jun_births = birth_counter(lst, 1, 6, 4)
 jul_births = birth_counter(lst, 1, 7, 4)
 aug_births = birth_counter(lst, 1, 8, 4)
 sep_births = birth_counter(lst, 1, 9, 4)
 oct_births = birth_counter(lst, 1, 10, 4)
 nov_births = birth_counter(lst, 1, 11, 4)
 dec_births = birth_counter(lst, 1, 12, 4)

 births_per_month = 
 1: jan_births,
 2: feb_births,
 3: mar_births,
 4: apr_births,
 5: may_births,
 6: jun_births,
 7: jul_births,
 8: aug_births,
 9: sep_births,
 10: oct_births,
 11: nov_births,
 12: dec_births
 

 return births_per_month

The birth_counter function:

def birth_counter(lst, index, head, tail):
 sum = 0
 for each in lst:
 if each[index] == head:
 sum = sum + each[tail]
 return sum

The parameters:

lst - The list of dataset

index - The lst's index

head - Will be compared from lst's index

tail - The target data that needs to be computed

Example usage:

[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)

 if sample_births[1] == 1 then
 extract index[4] #8096

Questions regarding weekly_births and monthly_births:

If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?

edited Jan 27 at 14:23

200_success

123k14143401

asked Jan 27 at 6:00

Yodism

4313920

I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:

The total number of births on each month

The total number of births on each day of the week

Sample dataset from CSV:

year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...

It led me into this implementation:

def weekly_births(lst):
 mon = birth_counter(lst, 3, 1, 4)
 tue = birth_counter(lst, 3, 2, 4)
 wed = birth_counter(lst, 3, 3, 4)
 thu = birth_counter(lst, 3, 4, 4)
 fri = birth_counter(lst, 3, 5, 4)
 sat = birth_counter(lst, 3, 6, 4)
 sun = birth_counter(lst, 3, 7, 4)

 births_per_week = 
 1: mon,
 2: tue,
 3: wed,
 4: thu,
 5: fri,
 6: sat,
 7: sun
 

 return births_per_week

def monthly_births(lst):

 jan_births = birth_counter(lst, 1, 1, 4)
 feb_births = birth_counter(lst, 1, 2, 4)
 mar_births = birth_counter(lst, 1, 3, 4)
 apr_births = birth_counter(lst, 1, 4, 4)
 may_births = birth_counter(lst, 1, 5, 4)
 jun_births = birth_counter(lst, 1, 6, 4)
 jul_births = birth_counter(lst, 1, 7, 4)
 aug_births = birth_counter(lst, 1, 8, 4)
 sep_births = birth_counter(lst, 1, 9, 4)
 oct_births = birth_counter(lst, 1, 10, 4)
 nov_births = birth_counter(lst, 1, 11, 4)
 dec_births = birth_counter(lst, 1, 12, 4)

 births_per_month = 
 1: jan_births,
 2: feb_births,
 3: mar_births,
 4: apr_births,
 5: may_births,
 6: jun_births,
 7: jul_births,
 8: aug_births,
 9: sep_births,
 10: oct_births,
 11: nov_births,
 12: dec_births
 

 return births_per_month

The birth_counter function:

def birth_counter(lst, index, head, tail):
 sum = 0
 for each in lst:
 if each[index] == head:
 sum = sum + each[tail]
 return sum

The parameters:

lst - The list of dataset

index - The lst's index

head - Will be compared from lst's index

tail - The target data that needs to be computed

Example usage:

[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)

 if sample_births[1] == 1 then
 extract index[4] #8096

Questions regarding weekly_births and monthly_births:

If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?

edited Jan 27 at 14:23

200_success

123k14143401

asked Jan 27 at 6:00

Yodism

4313920

edited Jan 27 at 14:23

200_success

123k14143401

edited Jan 27 at 14:23

200_success

123k14143401

edited Jan 27 at 14:23

200_success

123k14143401

asked Jan 27 at 6:00

Yodism

4313920

asked Jan 27 at 6:00

Yodism

4313920

asked Jan 27 at 6:00

Yodism

4313920

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).

Your current code boils down to very few lines with pandas:

import pandas as pd

df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()

print(birth_per_month)
print()
print(birth_per_weekday)

#month
#1 48311
#Name: births, dtype: int64

#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64

edited Jan 28 at 11:28

answered Jan 27 at 14:13

Graipher

20.5k43081

Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â€“Â Yodism
Jan 28 at 1:11

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186111%2fanalyzing-the-u-s-births-dataset-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

Your current code boils down to very few lines with pandas:

import pandas as pd

df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()

print(birth_per_month)
print()
print(birth_per_weekday)

#month
#1 48311
#Name: births, dtype: int64

#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64

edited Jan 28 at 11:28

answered Jan 27 at 14:13

Graipher

20.5k43081

Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â€“Â Yodism
Jan 28 at 1:11

add a commentÂ |Â

up vote
5
down vote

accepted

Your current code boils down to very few lines with pandas:

import pandas as pd

df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()

print(birth_per_month)
print()
print(birth_per_weekday)

#month
#1 48311
#Name: births, dtype: int64

#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64

edited Jan 28 at 11:28

answered Jan 27 at 14:13

Graipher

20.5k43081

Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â€“Â Yodism
Jan 28 at 1:11

add a commentÂ |Â

up vote
5
down vote

accepted

Your current code boils down to very few lines with pandas:

import pandas as pd

df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()

print(birth_per_month)
print()
print(birth_per_weekday)

#month
#1 48311
#Name: births, dtype: int64

#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64

edited Jan 28 at 11:28

answered Jan 27 at 14:13

Graipher

20.5k43081

Your current code boils down to very few lines with pandas:

import pandas as pd

df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()

print(birth_per_month)
print()
print(birth_per_weekday)

#month
#1 48311
#Name: births, dtype: int64

#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64

edited Jan 28 at 11:28

answered Jan 27 at 14:13

Graipher

20.5k43081

edited Jan 28 at 11:28

answered Jan 27 at 14:13

Graipher

20.5k43081

answered Jan 27 at 14:13

Graipher

20.5k43081

answered Jan 27 at 14:13

Graipher

20.5k43081

Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â€“Â Yodism
Jan 28 at 1:11

add a commentÂ |Â

Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â€“Â Yodism
Jan 28 at 1:11

Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â€“Â Yodism
Jan 28 at 1:11

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr