Determine file provider based on column headers, replace columns and process data
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,
Customer1Data = 'User':['BobJones','BobJones','BobJones'],
'UserSong':['Song1','Song2','Song3'],
'UserSplit':[0.50,0.25,0.50],
'UserEarnings':[100,200,300]
Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
'SongName':['Song1','Song2','Song3'],
'SongSplit':[0.50,0.25,0.50],
'SongEarnings':[100,200,300]
I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.
from collections import namedtuple
_customers = namedtuple('CustomerName',['statement_headers','first_column'])
Customer1 = _customers(statement_headers = 'User':'Name',
'UserSong':'Song',
'UserSplit':'Split',
'UserEarnings' :'Earnings',
first_column = 'User')
Customer2 = _customers(statement_headers = 'Name':'Name',
'SongName':'Song',
'SongSplit':'Split',
'SongEarnings':'Earnings',
first_column = 'Name')
I then create a dictionary with the first column from each customers csv file:
_customertypes = 'Customer1':Customer1,'Customer2':Customer2
def _create_first_col_dictionary():
first_col =
for k,v in _customertypes.items():
first_col[v.first_column] = k
return first_col
Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict
import pandas as pd
csv_dict = k: for k in _customertypes.keys()
files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
for file in files:
x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
col = x.columns.tolist()[0]
file_type = _first_col[col]
csv_dict[file_type].append(file)
Now I am left with this:
csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
'Customer2':[r'C:locationcsv2.csv']
Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.
I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).
python python-3.x csv pandas
add a comment |Â
up vote
2
down vote
favorite
Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,
Customer1Data = 'User':['BobJones','BobJones','BobJones'],
'UserSong':['Song1','Song2','Song3'],
'UserSplit':[0.50,0.25,0.50],
'UserEarnings':[100,200,300]
Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
'SongName':['Song1','Song2','Song3'],
'SongSplit':[0.50,0.25,0.50],
'SongEarnings':[100,200,300]
I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.
from collections import namedtuple
_customers = namedtuple('CustomerName',['statement_headers','first_column'])
Customer1 = _customers(statement_headers = 'User':'Name',
'UserSong':'Song',
'UserSplit':'Split',
'UserEarnings' :'Earnings',
first_column = 'User')
Customer2 = _customers(statement_headers = 'Name':'Name',
'SongName':'Song',
'SongSplit':'Split',
'SongEarnings':'Earnings',
first_column = 'Name')
I then create a dictionary with the first column from each customers csv file:
_customertypes = 'Customer1':Customer1,'Customer2':Customer2
def _create_first_col_dictionary():
first_col =
for k,v in _customertypes.items():
first_col[v.first_column] = k
return first_col
Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict
import pandas as pd
csv_dict = k: for k in _customertypes.keys()
files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
for file in files:
x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
col = x.columns.tolist()[0]
file_type = _first_col[col]
csv_dict[file_type].append(file)
Now I am left with this:
csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
'Customer2':[r'C:locationcsv2.csv']
Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.
I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).
python python-3.x csv pandas
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,
Customer1Data = 'User':['BobJones','BobJones','BobJones'],
'UserSong':['Song1','Song2','Song3'],
'UserSplit':[0.50,0.25,0.50],
'UserEarnings':[100,200,300]
Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
'SongName':['Song1','Song2','Song3'],
'SongSplit':[0.50,0.25,0.50],
'SongEarnings':[100,200,300]
I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.
from collections import namedtuple
_customers = namedtuple('CustomerName',['statement_headers','first_column'])
Customer1 = _customers(statement_headers = 'User':'Name',
'UserSong':'Song',
'UserSplit':'Split',
'UserEarnings' :'Earnings',
first_column = 'User')
Customer2 = _customers(statement_headers = 'Name':'Name',
'SongName':'Song',
'SongSplit':'Split',
'SongEarnings':'Earnings',
first_column = 'Name')
I then create a dictionary with the first column from each customers csv file:
_customertypes = 'Customer1':Customer1,'Customer2':Customer2
def _create_first_col_dictionary():
first_col =
for k,v in _customertypes.items():
first_col[v.first_column] = k
return first_col
Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict
import pandas as pd
csv_dict = k: for k in _customertypes.keys()
files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
for file in files:
x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
col = x.columns.tolist()[0]
file_type = _first_col[col]
csv_dict[file_type].append(file)
Now I am left with this:
csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
'Customer2':[r'C:locationcsv2.csv']
Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.
I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).
python python-3.x csv pandas
Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,
Customer1Data = 'User':['BobJones','BobJones','BobJones'],
'UserSong':['Song1','Song2','Song3'],
'UserSplit':[0.50,0.25,0.50],
'UserEarnings':[100,200,300]
Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
'SongName':['Song1','Song2','Song3'],
'SongSplit':[0.50,0.25,0.50],
'SongEarnings':[100,200,300]
I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.
from collections import namedtuple
_customers = namedtuple('CustomerName',['statement_headers','first_column'])
Customer1 = _customers(statement_headers = 'User':'Name',
'UserSong':'Song',
'UserSplit':'Split',
'UserEarnings' :'Earnings',
first_column = 'User')
Customer2 = _customers(statement_headers = 'Name':'Name',
'SongName':'Song',
'SongSplit':'Split',
'SongEarnings':'Earnings',
first_column = 'Name')
I then create a dictionary with the first column from each customers csv file:
_customertypes = 'Customer1':Customer1,'Customer2':Customer2
def _create_first_col_dictionary():
first_col =
for k,v in _customertypes.items():
first_col[v.first_column] = k
return first_col
Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict
import pandas as pd
csv_dict = k: for k in _customertypes.keys()
files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
for file in files:
x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
col = x.columns.tolist()[0]
file_type = _first_col[col]
csv_dict[file_type].append(file)
Now I am left with this:
csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
'Customer2':[r'C:locationcsv2.csv']
Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.
I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).
python python-3.x csv pandas
edited Jun 5 at 3:01
Billal BEGUERADJ
1
1
asked May 14 at 15:55
chris dorn
12815
12815
add a comment |Â
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194369%2fdetermine-file-provider-based-on-column-headers-replace-columns-and-process-dat%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password