Handling periods ('.') in CSV column names with Pandas

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the CSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.



test.csv:



"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"




# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_'
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent:
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
print(row.birth_place)

# ain't that nice?


It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.







share|improve this question





















  • This looks a bit short on code, how do you use this, as I may recommend this. However if you use it differently, then I wouldn't.
    – Peilonrayz
    Jul 19 at 0:24










  • @Peilonrayz I chose Pandas over the csv library, because it has all these powerful features that I'm keen to explore. In a world without Pandas, I'd have certainly gone for csv.
    – RolfBly
    Jul 19 at 8:05
















up vote
1
down vote

favorite












I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the CSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.



test.csv:



"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"




# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_'
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent:
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
print(row.birth_place)

# ain't that nice?


It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.







share|improve this question





















  • This looks a bit short on code, how do you use this, as I may recommend this. However if you use it differently, then I wouldn't.
    – Peilonrayz
    Jul 19 at 0:24










  • @Peilonrayz I chose Pandas over the csv library, because it has all these powerful features that I'm keen to explore. In a world without Pandas, I'd have certainly gone for csv.
    – RolfBly
    Jul 19 at 8:05












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the CSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.



test.csv:



"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"




# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_'
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent:
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
print(row.birth_place)

# ain't that nice?


It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.







share|improve this question













I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the CSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.



test.csv:



"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"




# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_'
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent:
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
print(row.birth_place)

# ain't that nice?


It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.









share|improve this question












share|improve this question




share|improve this question








edited Jul 19 at 6:06









Jamal♦

30.1k11114225




30.1k11114225









asked Jul 18 at 19:24









RolfBly

584317




584317











  • This looks a bit short on code, how do you use this, as I may recommend this. However if you use it differently, then I wouldn't.
    – Peilonrayz
    Jul 19 at 0:24










  • @Peilonrayz I chose Pandas over the csv library, because it has all these powerful features that I'm keen to explore. In a world without Pandas, I'd have certainly gone for csv.
    – RolfBly
    Jul 19 at 8:05
















  • This looks a bit short on code, how do you use this, as I may recommend this. However if you use it differently, then I wouldn't.
    – Peilonrayz
    Jul 19 at 0:24










  • @Peilonrayz I chose Pandas over the csv library, because it has all these powerful features that I'm keen to explore. In a world without Pandas, I'd have certainly gone for csv.
    – RolfBly
    Jul 19 at 8:05















This looks a bit short on code, how do you use this, as I may recommend this. However if you use it differently, then I wouldn't.
– Peilonrayz
Jul 19 at 0:24




This looks a bit short on code, how do you use this, as I may recommend this. However if you use it differently, then I wouldn't.
– Peilonrayz
Jul 19 at 0:24












@Peilonrayz I chose Pandas over the csv library, because it has all these powerful features that I'm keen to explore. In a world without Pandas, I'd have certainly gone for csv.
– RolfBly
Jul 19 at 8:05




@Peilonrayz I chose Pandas over the csv library, because it has all these powerful features that I'm keen to explore. In a world without Pandas, I'd have certainly gone for csv.
– RolfBly
Jul 19 at 8:05










1 Answer
1






active

oldest

votes

















up vote
1
down vote













Did a little digging and found that you can use df._columnid when pandas df.columns runs into an issue with a name (in this example dealing with a ".")



I am sure you already know that you could just do df['birth.place'], since it's inside a string container, however it becomes tricky for row.birth.placeas you mentioned. For that you can do the following:



for row in df.itertuples():
print(row._2)


The _2 corresponds to the column id that pandas had issues parsing. It renamed it with an underscore and enumerated id in the column's list. Note that this renaming process only occurs when pandas ran into an issue grabbing the actual column name (i.e. row.name is still row.name, and you cannot use row._1 in-place of it). Hope that helps! Happy pythoning!






share|improve this answer























  • Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
    – RolfBly
    Jul 19 at 8:15










  • Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
    – PydPiper
    Jul 19 at 13:49










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199772%2fhandling-periods-in-csv-column-names-with-pandas%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













Did a little digging and found that you can use df._columnid when pandas df.columns runs into an issue with a name (in this example dealing with a ".")



I am sure you already know that you could just do df['birth.place'], since it's inside a string container, however it becomes tricky for row.birth.placeas you mentioned. For that you can do the following:



for row in df.itertuples():
print(row._2)


The _2 corresponds to the column id that pandas had issues parsing. It renamed it with an underscore and enumerated id in the column's list. Note that this renaming process only occurs when pandas ran into an issue grabbing the actual column name (i.e. row.name is still row.name, and you cannot use row._1 in-place of it). Hope that helps! Happy pythoning!






share|improve this answer























  • Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
    – RolfBly
    Jul 19 at 8:15










  • Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
    – PydPiper
    Jul 19 at 13:49














up vote
1
down vote













Did a little digging and found that you can use df._columnid when pandas df.columns runs into an issue with a name (in this example dealing with a ".")



I am sure you already know that you could just do df['birth.place'], since it's inside a string container, however it becomes tricky for row.birth.placeas you mentioned. For that you can do the following:



for row in df.itertuples():
print(row._2)


The _2 corresponds to the column id that pandas had issues parsing. It renamed it with an underscore and enumerated id in the column's list. Note that this renaming process only occurs when pandas ran into an issue grabbing the actual column name (i.e. row.name is still row.name, and you cannot use row._1 in-place of it). Hope that helps! Happy pythoning!






share|improve this answer























  • Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
    – RolfBly
    Jul 19 at 8:15










  • Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
    – PydPiper
    Jul 19 at 13:49












up vote
1
down vote










up vote
1
down vote









Did a little digging and found that you can use df._columnid when pandas df.columns runs into an issue with a name (in this example dealing with a ".")



I am sure you already know that you could just do df['birth.place'], since it's inside a string container, however it becomes tricky for row.birth.placeas you mentioned. For that you can do the following:



for row in df.itertuples():
print(row._2)


The _2 corresponds to the column id that pandas had issues parsing. It renamed it with an underscore and enumerated id in the column's list. Note that this renaming process only occurs when pandas ran into an issue grabbing the actual column name (i.e. row.name is still row.name, and you cannot use row._1 in-place of it). Hope that helps! Happy pythoning!






share|improve this answer















Did a little digging and found that you can use df._columnid when pandas df.columns runs into an issue with a name (in this example dealing with a ".")



I am sure you already know that you could just do df['birth.place'], since it's inside a string container, however it becomes tricky for row.birth.placeas you mentioned. For that you can do the following:



for row in df.itertuples():
print(row._2)


The _2 corresponds to the column id that pandas had issues parsing. It renamed it with an underscore and enumerated id in the column's list. Note that this renaming process only occurs when pandas ran into an issue grabbing the actual column name (i.e. row.name is still row.name, and you cannot use row._1 in-place of it). Hope that helps! Happy pythoning!







share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 19 at 7:39









Graipher

20.4k42981




20.4k42981











answered Jul 19 at 4:22









PydPiper

113




113











  • Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
    – RolfBly
    Jul 19 at 8:15










  • Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
    – PydPiper
    Jul 19 at 13:49
















  • Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
    – RolfBly
    Jul 19 at 8:15










  • Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
    – PydPiper
    Jul 19 at 13:49















Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
– RolfBly
Jul 19 at 8:15




Thanks. I didn't mention what I had already found out: df[birth.place] only works on entire columns, not on cells. getattr(row, 'birth.place') doesn't work because the column is renamed, and row.birth.place` errors has no attribute 'birth'.
– RolfBly
Jul 19 at 8:15












Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
– PydPiper
Jul 19 at 13:49




Right, getattr() would work the same then. You would say getattr(row, "_2"), but this is equivalent to saying row._2
– PydPiper
Jul 19 at 13:49












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199772%2fhandling-periods-in-csv-column-names-with-pandas%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods