Analyzing the U.S. Births dataset in Python
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
5
down vote
favorite
I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:
- The total number of births on each month
- The total number of births on each day of the week
Sample dataset from CSV:
year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...
It led me into this implementation:
def weekly_births(lst):
mon = birth_counter(lst, 3, 1, 4)
tue = birth_counter(lst, 3, 2, 4)
wed = birth_counter(lst, 3, 3, 4)
thu = birth_counter(lst, 3, 4, 4)
fri = birth_counter(lst, 3, 5, 4)
sat = birth_counter(lst, 3, 6, 4)
sun = birth_counter(lst, 3, 7, 4)
births_per_week =
1: mon,
2: tue,
3: wed,
4: thu,
5: fri,
6: sat,
7: sun
return births_per_week
def monthly_births(lst):
jan_births = birth_counter(lst, 1, 1, 4)
feb_births = birth_counter(lst, 1, 2, 4)
mar_births = birth_counter(lst, 1, 3, 4)
apr_births = birth_counter(lst, 1, 4, 4)
may_births = birth_counter(lst, 1, 5, 4)
jun_births = birth_counter(lst, 1, 6, 4)
jul_births = birth_counter(lst, 1, 7, 4)
aug_births = birth_counter(lst, 1, 8, 4)
sep_births = birth_counter(lst, 1, 9, 4)
oct_births = birth_counter(lst, 1, 10, 4)
nov_births = birth_counter(lst, 1, 11, 4)
dec_births = birth_counter(lst, 1, 12, 4)
births_per_month =
1: jan_births,
2: feb_births,
3: mar_births,
4: apr_births,
5: may_births,
6: jun_births,
7: jul_births,
8: aug_births,
9: sep_births,
10: oct_births,
11: nov_births,
12: dec_births
return births_per_month
The birth_counter
function:
def birth_counter(lst, index, head, tail):
sum = 0
for each in lst:
if each[index] == head:
sum = sum + each[tail]
return sum
The parameters:
lst
- The list of datasetindex
- Thelst
's indexhead
- Will be compared fromlst
's indextail
- The target data that needs to be computed
Example usage:
[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)
if sample_births[1] == 1 then
extract index[4] #8096
Questions regarding weekly_births
and monthly_births
:
- If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?
python python-3.x datetime csv statistics
add a comment |Â
up vote
5
down vote
favorite
I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:
- The total number of births on each month
- The total number of births on each day of the week
Sample dataset from CSV:
year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...
It led me into this implementation:
def weekly_births(lst):
mon = birth_counter(lst, 3, 1, 4)
tue = birth_counter(lst, 3, 2, 4)
wed = birth_counter(lst, 3, 3, 4)
thu = birth_counter(lst, 3, 4, 4)
fri = birth_counter(lst, 3, 5, 4)
sat = birth_counter(lst, 3, 6, 4)
sun = birth_counter(lst, 3, 7, 4)
births_per_week =
1: mon,
2: tue,
3: wed,
4: thu,
5: fri,
6: sat,
7: sun
return births_per_week
def monthly_births(lst):
jan_births = birth_counter(lst, 1, 1, 4)
feb_births = birth_counter(lst, 1, 2, 4)
mar_births = birth_counter(lst, 1, 3, 4)
apr_births = birth_counter(lst, 1, 4, 4)
may_births = birth_counter(lst, 1, 5, 4)
jun_births = birth_counter(lst, 1, 6, 4)
jul_births = birth_counter(lst, 1, 7, 4)
aug_births = birth_counter(lst, 1, 8, 4)
sep_births = birth_counter(lst, 1, 9, 4)
oct_births = birth_counter(lst, 1, 10, 4)
nov_births = birth_counter(lst, 1, 11, 4)
dec_births = birth_counter(lst, 1, 12, 4)
births_per_month =
1: jan_births,
2: feb_births,
3: mar_births,
4: apr_births,
5: may_births,
6: jun_births,
7: jul_births,
8: aug_births,
9: sep_births,
10: oct_births,
11: nov_births,
12: dec_births
return births_per_month
The birth_counter
function:
def birth_counter(lst, index, head, tail):
sum = 0
for each in lst:
if each[index] == head:
sum = sum + each[tail]
return sum
The parameters:
lst
- The list of datasetindex
- Thelst
's indexhead
- Will be compared fromlst
's indextail
- The target data that needs to be computed
Example usage:
[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)
if sample_births[1] == 1 then
extract index[4] #8096
Questions regarding weekly_births
and monthly_births
:
- If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?
python python-3.x datetime csv statistics
add a comment |Â
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:
- The total number of births on each month
- The total number of births on each day of the week
Sample dataset from CSV:
year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...
It led me into this implementation:
def weekly_births(lst):
mon = birth_counter(lst, 3, 1, 4)
tue = birth_counter(lst, 3, 2, 4)
wed = birth_counter(lst, 3, 3, 4)
thu = birth_counter(lst, 3, 4, 4)
fri = birth_counter(lst, 3, 5, 4)
sat = birth_counter(lst, 3, 6, 4)
sun = birth_counter(lst, 3, 7, 4)
births_per_week =
1: mon,
2: tue,
3: wed,
4: thu,
5: fri,
6: sat,
7: sun
return births_per_week
def monthly_births(lst):
jan_births = birth_counter(lst, 1, 1, 4)
feb_births = birth_counter(lst, 1, 2, 4)
mar_births = birth_counter(lst, 1, 3, 4)
apr_births = birth_counter(lst, 1, 4, 4)
may_births = birth_counter(lst, 1, 5, 4)
jun_births = birth_counter(lst, 1, 6, 4)
jul_births = birth_counter(lst, 1, 7, 4)
aug_births = birth_counter(lst, 1, 8, 4)
sep_births = birth_counter(lst, 1, 9, 4)
oct_births = birth_counter(lst, 1, 10, 4)
nov_births = birth_counter(lst, 1, 11, 4)
dec_births = birth_counter(lst, 1, 12, 4)
births_per_month =
1: jan_births,
2: feb_births,
3: mar_births,
4: apr_births,
5: may_births,
6: jun_births,
7: jul_births,
8: aug_births,
9: sep_births,
10: oct_births,
11: nov_births,
12: dec_births
return births_per_month
The birth_counter
function:
def birth_counter(lst, index, head, tail):
sum = 0
for each in lst:
if each[index] == head:
sum = sum + each[tail]
return sum
The parameters:
lst
- The list of datasetindex
- Thelst
's indexhead
- Will be compared fromlst
's indextail
- The target data that needs to be computed
Example usage:
[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)
if sample_births[1] == 1 then
extract index[4] #8096
Questions regarding weekly_births
and monthly_births
:
- If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?
python python-3.x datetime csv statistics
I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:
- The total number of births on each month
- The total number of births on each day of the week
Sample dataset from CSV:
year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...
It led me into this implementation:
def weekly_births(lst):
mon = birth_counter(lst, 3, 1, 4)
tue = birth_counter(lst, 3, 2, 4)
wed = birth_counter(lst, 3, 3, 4)
thu = birth_counter(lst, 3, 4, 4)
fri = birth_counter(lst, 3, 5, 4)
sat = birth_counter(lst, 3, 6, 4)
sun = birth_counter(lst, 3, 7, 4)
births_per_week =
1: mon,
2: tue,
3: wed,
4: thu,
5: fri,
6: sat,
7: sun
return births_per_week
def monthly_births(lst):
jan_births = birth_counter(lst, 1, 1, 4)
feb_births = birth_counter(lst, 1, 2, 4)
mar_births = birth_counter(lst, 1, 3, 4)
apr_births = birth_counter(lst, 1, 4, 4)
may_births = birth_counter(lst, 1, 5, 4)
jun_births = birth_counter(lst, 1, 6, 4)
jul_births = birth_counter(lst, 1, 7, 4)
aug_births = birth_counter(lst, 1, 8, 4)
sep_births = birth_counter(lst, 1, 9, 4)
oct_births = birth_counter(lst, 1, 10, 4)
nov_births = birth_counter(lst, 1, 11, 4)
dec_births = birth_counter(lst, 1, 12, 4)
births_per_month =
1: jan_births,
2: feb_births,
3: mar_births,
4: apr_births,
5: may_births,
6: jun_births,
7: jul_births,
8: aug_births,
9: sep_births,
10: oct_births,
11: nov_births,
12: dec_births
return births_per_month
The birth_counter
function:
def birth_counter(lst, index, head, tail):
sum = 0
for each in lst:
if each[index] == head:
sum = sum + each[tail]
return sum
The parameters:
lst
- The list of datasetindex
- Thelst
's indexhead
- Will be compared fromlst
's indextail
- The target data that needs to be computed
Example usage:
[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)
if sample_births[1] == 1 then
extract index[4] #8096
Questions regarding weekly_births
and monthly_births
:
- If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?
python python-3.x datetime csv statistics
edited Jan 27 at 14:23
200_success
123k14143401
123k14143401
asked Jan 27 at 6:00
Yodism
4313920
4313920
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
If you want to do data-analysis in Python, you should learn about numpy
and pandas
. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy
and introduces a DataFrame
, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).
Your current code boils down to very few lines with pandas
:
import pandas as pd
df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()
print(birth_per_month)
print()
print(birth_per_weekday)
#month
#1 48311
#Name: births, dtype: int64
#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
If you want to do data-analysis in Python, you should learn about numpy
and pandas
. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy
and introduces a DataFrame
, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).
Your current code boils down to very few lines with pandas
:
import pandas as pd
df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()
print(birth_per_month)
print()
print(birth_per_weekday)
#month
#1 48311
#Name: births, dtype: int64
#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
add a comment |Â
up vote
5
down vote
accepted
If you want to do data-analysis in Python, you should learn about numpy
and pandas
. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy
and introduces a DataFrame
, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).
Your current code boils down to very few lines with pandas
:
import pandas as pd
df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()
print(birth_per_month)
print()
print(birth_per_weekday)
#month
#1 48311
#Name: births, dtype: int64
#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
If you want to do data-analysis in Python, you should learn about numpy
and pandas
. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy
and introduces a DataFrame
, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).
Your current code boils down to very few lines with pandas
:
import pandas as pd
df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()
print(birth_per_month)
print()
print(birth_per_weekday)
#month
#1 48311
#Name: births, dtype: int64
#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64
If you want to do data-analysis in Python, you should learn about numpy
and pandas
. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy
and introduces a DataFrame
, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).
Your current code boils down to very few lines with pandas
:
import pandas as pd
df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()
print(birth_per_month)
print()
print(birth_per_weekday)
#month
#1 48311
#Name: births, dtype: int64
#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64
edited Jan 28 at 11:28
answered Jan 27 at 14:13
Graipher
20.5k43081
20.5k43081
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
add a comment |Â
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher!
â Yodism
Jan 28 at 1:11
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186111%2fanalyzing-the-u-s-births-dataset-in-python%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password