Determine file provider based on column headers, replace columns and process data

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,



Customer1Data = 'User':['BobJones','BobJones','BobJones'],
'UserSong':['Song1','Song2','Song3'],
'UserSplit':[0.50,0.25,0.50],
'UserEarnings':[100,200,300]

Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
'SongName':['Song1','Song2','Song3'],
'SongSplit':[0.50,0.25,0.50],
'SongEarnings':[100,200,300]


I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.



from collections import namedtuple
_customers = namedtuple('CustomerName',['statement_headers','first_column'])
Customer1 = _customers(statement_headers = 'User':'Name',
'UserSong':'Song',
'UserSplit':'Split',
'UserEarnings' :'Earnings',
first_column = 'User')

Customer2 = _customers(statement_headers = 'Name':'Name',
'SongName':'Song',
'SongSplit':'Split',
'SongEarnings':'Earnings',
first_column = 'Name')


I then create a dictionary with the first column from each customers csv file:



_customertypes = 'Customer1':Customer1,'Customer2':Customer2
def _create_first_col_dictionary():
first_col =
for k,v in _customertypes.items():
first_col[v.first_column] = k
return first_col


Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict



import pandas as pd
csv_dict = k: for k in _customertypes.keys()
files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
for file in files:
x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
col = x.columns.tolist()[0]
file_type = _first_col[col]
csv_dict[file_type].append(file)


Now I am left with this:



csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
'Customer2':[r'C:locationcsv2.csv']


Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.



I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).







share|improve this question



























    up vote
    2
    down vote

    favorite












    Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,



    Customer1Data = 'User':['BobJones','BobJones','BobJones'],
    'UserSong':['Song1','Song2','Song3'],
    'UserSplit':[0.50,0.25,0.50],
    'UserEarnings':[100,200,300]

    Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
    'SongName':['Song1','Song2','Song3'],
    'SongSplit':[0.50,0.25,0.50],
    'SongEarnings':[100,200,300]


    I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.



    from collections import namedtuple
    _customers = namedtuple('CustomerName',['statement_headers','first_column'])
    Customer1 = _customers(statement_headers = 'User':'Name',
    'UserSong':'Song',
    'UserSplit':'Split',
    'UserEarnings' :'Earnings',
    first_column = 'User')

    Customer2 = _customers(statement_headers = 'Name':'Name',
    'SongName':'Song',
    'SongSplit':'Split',
    'SongEarnings':'Earnings',
    first_column = 'Name')


    I then create a dictionary with the first column from each customers csv file:



    _customertypes = 'Customer1':Customer1,'Customer2':Customer2
    def _create_first_col_dictionary():
    first_col =
    for k,v in _customertypes.items():
    first_col[v.first_column] = k
    return first_col


    Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict



    import pandas as pd
    csv_dict = k: for k in _customertypes.keys()
    files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
    for file in files:
    x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
    col = x.columns.tolist()[0]
    file_type = _first_col[col]
    csv_dict[file_type].append(file)


    Now I am left with this:



    csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
    'Customer2':[r'C:locationcsv2.csv']


    Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.



    I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).







    share|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,



      Customer1Data = 'User':['BobJones','BobJones','BobJones'],
      'UserSong':['Song1','Song2','Song3'],
      'UserSplit':[0.50,0.25,0.50],
      'UserEarnings':[100,200,300]

      Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
      'SongName':['Song1','Song2','Song3'],
      'SongSplit':[0.50,0.25,0.50],
      'SongEarnings':[100,200,300]


      I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.



      from collections import namedtuple
      _customers = namedtuple('CustomerName',['statement_headers','first_column'])
      Customer1 = _customers(statement_headers = 'User':'Name',
      'UserSong':'Song',
      'UserSplit':'Split',
      'UserEarnings' :'Earnings',
      first_column = 'User')

      Customer2 = _customers(statement_headers = 'Name':'Name',
      'SongName':'Song',
      'SongSplit':'Split',
      'SongEarnings':'Earnings',
      first_column = 'Name')


      I then create a dictionary with the first column from each customers csv file:



      _customertypes = 'Customer1':Customer1,'Customer2':Customer2
      def _create_first_col_dictionary():
      first_col =
      for k,v in _customertypes.items():
      first_col[v.first_column] = k
      return first_col


      Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict



      import pandas as pd
      csv_dict = k: for k in _customertypes.keys()
      files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
      for file in files:
      x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
      col = x.columns.tolist()[0]
      file_type = _first_col[col]
      csv_dict[file_type].append(file)


      Now I am left with this:



      csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
      'Customer2':[r'C:locationcsv2.csv']


      Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.



      I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).







      share|improve this question













      Each day I receive files from different customers relaying the same information, but the customers use proprietary formats. For example,



      Customer1Data = 'User':['BobJones','BobJones','BobJones'],
      'UserSong':['Song1','Song2','Song3'],
      'UserSplit':[0.50,0.25,0.50],
      'UserEarnings':[100,200,300]

      Customer2Data = 'Name':['BobJones','BobJones','BobJones'],
      'SongName':['Song1','Song2','Song3'],
      'SongSplit':[0.50,0.25,0.50],
      'SongEarnings':[100,200,300]


      I would like to be able to read in whatever CSV file is present in the download directory, replace the column names and then do summations based on replaced column names. Right now, I do this.



      from collections import namedtuple
      _customers = namedtuple('CustomerName',['statement_headers','first_column'])
      Customer1 = _customers(statement_headers = 'User':'Name',
      'UserSong':'Song',
      'UserSplit':'Split',
      'UserEarnings' :'Earnings',
      first_column = 'User')

      Customer2 = _customers(statement_headers = 'Name':'Name',
      'SongName':'Song',
      'SongSplit':'Split',
      'SongEarnings':'Earnings',
      first_column = 'Name')


      I then create a dictionary with the first column from each customers csv file:



      _customertypes = 'Customer1':Customer1,'Customer2':Customer2
      def _create_first_col_dictionary():
      first_col =
      for k,v in _customertypes.items():
      first_col[v.first_column] = k
      return first_col


      Create a dictionary to hold the csv based on the file type, read in the header row and assign to csv_dict



      import pandas as pd
      csv_dict = k: for k in _customertypes.keys()
      files = [r'C:locationcsv1.csv',r'C:locationcsv2.csv',r'C:locationcsv3.csv']
      for file in files:
      x = pd.read_csv(file,nrows = 2,encoding = 'LATIN-1')
      col = x.columns.tolist()[0]
      file_type = _first_col[col]
      csv_dict[file_type].append(file)


      Now I am left with this:



      csv_dict = 'Customer1':[r'C:locationcsv1.csv',r'C:locationcsv3.csv'],
      'Customer2':[r'C:locationcsv2.csv']


      Finally, I merge the like csv files together for Customer1 and Customer2, then replace column names using the dictionary above and do analysis.



      I am curious if anyone has better suggestions on how to do something like this. I think it could require a level of maintenance (adding new columns to the dictionaries, adding the first_column name, etc).









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jun 5 at 3:01









      Billal BEGUERADJ

      1




      1









      asked May 14 at 15:55









      chris dorn

      12815




      12815

























          active

          oldest

          votes











          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194369%2fdetermine-file-provider-based-on-column-headers-replace-columns-and-process-dat%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes










           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194369%2fdetermine-file-provider-based-on-column-headers-replace-columns-and-process-dat%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Chat program with C++ and SFML

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          Will my employers contract hold up in court?