Update the idx values if vals element match between consecutive rows

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



Input data = '''postidxtvals
23t4tabc
25t7tatg
29t8tctb
35t1txyz
37t2tmno
39t3tpqr
41t6trtu
45t5tlfg'''


Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



#Expected output to a file (tab separated):
pos idx vals
23 4 abc
25 4 atg
29 4 ctb
35 1 xyz
37 2 mno
39 3 pqr
41 3 rtu
45 5 lfg


I have written the given workable code (below) so far, and would also like to



  • write the expected output to a file

  • optimize the code for the work I am doing.

  • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

Code:



import csv
import itertools
import collections
import io
from itertools import islice

data = '''postidxtvals
23t4tabc
25t7tatg
29t8tctb
35t1txyz
37t2tmno
39t3tpqr
41t6trtu
45t5tlfg'''


data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

''' Function to read the data as key, val pairs in Ordered way.'''
def accumulate(data):
acc = collections.OrderedDict()
for d in data:
for k, v in d.items():
acc.setdefault(k, ).append(v)
return acc


''' Store data as keys,values '''
grouped_data = collections.OrderedDict()
for k, g in grouped:
grouped_data[k] = accumulate(g)


''' Print the very first k1. After this we only need to print k2 and update the idx '''
header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
print('n'.join(header_with_1stK1))

''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
k2_new = ''

for n in range(2):
if n > 0:
break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

''' Now, read as keys, values pairs for two consecutive keys '''
for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

v1_vals = ''.join(v1['vals'])
v2_vals = ''.join(v2['vals'])

v1_list = list(v1_vals)
v2_list = list(v2_vals)

''' to check if there is any matching element '''
commons = [x for x in v1_list if x in v2_list]

v2_pos = ''.join(v2['pos'])


''' start updating the idx values '''
if k2_new == '':
if len(commons) > 0:
k2_new = k1
print('t'.join([v2_pos, k2_new, v2_vals]))

else:
k2_new = ''
print('t'.join([v2_pos, k2, v2_vals]))


elif k2_new != '':
if len(commons) > 0:
k2_new = k2_new
print('t'.join([v2_pos, k2_new, v2_vals]))

else:
k2_new = ''
print('t'.join([v2_pos, k2, v2_vals]))


print('nUpdated the idx values')






share|improve this question



























    up vote
    2
    down vote

    favorite












    I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



    Input data = '''postidxtvals
    23t4tabc
    25t7tatg
    29t8tctb
    35t1txyz
    37t2tmno
    39t3tpqr
    41t6trtu
    45t5tlfg'''


    Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



    #Expected output to a file (tab separated):
    pos idx vals
    23 4 abc
    25 4 atg
    29 4 ctb
    35 1 xyz
    37 2 mno
    39 3 pqr
    41 3 rtu
    45 5 lfg


    I have written the given workable code (below) so far, and would also like to



    • write the expected output to a file

    • optimize the code for the work I am doing.

    • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

    Code:



    import csv
    import itertools
    import collections
    import io
    from itertools import islice

    data = '''postidxtvals
    23t4tabc
    25t7tatg
    29t8tctb
    35t1txyz
    37t2tmno
    39t3tpqr
    41t6trtu
    45t5tlfg'''


    data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
    grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

    ''' Function to read the data as key, val pairs in Ordered way.'''
    def accumulate(data):
    acc = collections.OrderedDict()
    for d in data:
    for k, v in d.items():
    acc.setdefault(k, ).append(v)
    return acc


    ''' Store data as keys,values '''
    grouped_data = collections.OrderedDict()
    for k, g in grouped:
    grouped_data[k] = accumulate(g)


    ''' Print the very first k1. After this we only need to print k2 and update the idx '''
    header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
    print('n'.join(header_with_1stK1))

    ''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
    k2_new = ''

    for n in range(2):
    if n > 0:
    break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

    ''' Now, read as keys, values pairs for two consecutive keys '''
    for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

    v1_vals = ''.join(v1['vals'])
    v2_vals = ''.join(v2['vals'])

    v1_list = list(v1_vals)
    v2_list = list(v2_vals)

    ''' to check if there is any matching element '''
    commons = [x for x in v1_list if x in v2_list]

    v2_pos = ''.join(v2['pos'])


    ''' start updating the idx values '''
    if k2_new == '':
    if len(commons) > 0:
    k2_new = k1
    print('t'.join([v2_pos, k2_new, v2_vals]))

    else:
    k2_new = ''
    print('t'.join([v2_pos, k2, v2_vals]))


    elif k2_new != '':
    if len(commons) > 0:
    k2_new = k2_new
    print('t'.join([v2_pos, k2_new, v2_vals]))

    else:
    k2_new = ''
    print('t'.join([v2_pos, k2, v2_vals]))


    print('nUpdated the idx values')






    share|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



      Input data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



      #Expected output to a file (tab separated):
      pos idx vals
      23 4 abc
      25 4 atg
      29 4 ctb
      35 1 xyz
      37 2 mno
      39 3 pqr
      41 3 rtu
      45 5 lfg


      I have written the given workable code (below) so far, and would also like to



      • write the expected output to a file

      • optimize the code for the work I am doing.

      • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

      Code:



      import csv
      import itertools
      import collections
      import io
      from itertools import islice

      data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
      grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

      ''' Function to read the data as key, val pairs in Ordered way.'''
      def accumulate(data):
      acc = collections.OrderedDict()
      for d in data:
      for k, v in d.items():
      acc.setdefault(k, ).append(v)
      return acc


      ''' Store data as keys,values '''
      grouped_data = collections.OrderedDict()
      for k, g in grouped:
      grouped_data[k] = accumulate(g)


      ''' Print the very first k1. After this we only need to print k2 and update the idx '''
      header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
      print('n'.join(header_with_1stK1))

      ''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
      k2_new = ''

      for n in range(2):
      if n > 0:
      break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

      ''' Now, read as keys, values pairs for two consecutive keys '''
      for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

      v1_vals = ''.join(v1['vals'])
      v2_vals = ''.join(v2['vals'])

      v1_list = list(v1_vals)
      v2_list = list(v2_vals)

      ''' to check if there is any matching element '''
      commons = [x for x in v1_list if x in v2_list]

      v2_pos = ''.join(v2['pos'])


      ''' start updating the idx values '''
      if k2_new == '':
      if len(commons) > 0:
      k2_new = k1
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      elif k2_new != '':
      if len(commons) > 0:
      k2_new = k2_new
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      print('nUpdated the idx values')






      share|improve this question













      I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



      Input data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



      #Expected output to a file (tab separated):
      pos idx vals
      23 4 abc
      25 4 atg
      29 4 ctb
      35 1 xyz
      37 2 mno
      39 3 pqr
      41 3 rtu
      45 5 lfg


      I have written the given workable code (below) so far, and would also like to



      • write the expected output to a file

      • optimize the code for the work I am doing.

      • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

      Code:



      import csv
      import itertools
      import collections
      import io
      from itertools import islice

      data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
      grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

      ''' Function to read the data as key, val pairs in Ordered way.'''
      def accumulate(data):
      acc = collections.OrderedDict()
      for d in data:
      for k, v in d.items():
      acc.setdefault(k, ).append(v)
      return acc


      ''' Store data as keys,values '''
      grouped_data = collections.OrderedDict()
      for k, g in grouped:
      grouped_data[k] = accumulate(g)


      ''' Print the very first k1. After this we only need to print k2 and update the idx '''
      header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
      print('n'.join(header_with_1stK1))

      ''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
      k2_new = ''

      for n in range(2):
      if n > 0:
      break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

      ''' Now, read as keys, values pairs for two consecutive keys '''
      for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

      v1_vals = ''.join(v1['vals'])
      v2_vals = ''.join(v2['vals'])

      v1_list = list(v1_vals)
      v2_list = list(v2_vals)

      ''' to check if there is any matching element '''
      commons = [x for x in v1_list if x in v2_list]

      v2_pos = ''.join(v2['pos'])


      ''' start updating the idx values '''
      if k2_new == '':
      if len(commons) > 0:
      k2_new = k1
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      elif k2_new != '':
      if len(commons) > 0:
      k2_new = k2_new
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      print('nUpdated the idx values')








      share|improve this question












      share|improve this question




      share|improve this question








      edited Feb 5 at 16:22
























      asked Feb 5 at 14:20









      everestial007

      1509




      1509




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I may contribute by first eliminating the itertools and collections module.



          import csv
          import io




          Set the data



           data = '''postidxtvals
          23t4tabc
          25t7tatg
          29t8tctb
          35t1txyz
          37t2tmno
          39t3tpqr
          41t6trtu
          45t5tlfg''';
          print('INPUT:n'+data);




          Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



          data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
          list_Of_Dict = [i for i in data_As_Dict];




          Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



          Here is the code for creating the output,



          for i in range(0, len(list_Of_Dict)-1):
          v1 = list_Of_Dict[i]['vals'];
          v2 = list_Of_Dict[i+1]['vals'];
          if len(set(v1+v2)) != len(v1+v2):
          list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


          output_data = 'postidxtvalsn';
          for i in list_Of_Dict:
          output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
          print('OUTPUT:n'+output_data);




          Full code:



          import csv
          import io

          data = '''postidxtvals
          23t4tabc
          25t7tatg
          29t8tctb
          35t1txyz
          37t2tmno
          39t3tpqr
          41t6trtu
          45t5tlfg''';

          print('INPUT:n'+data);
          data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

          list_Of_Dict = [i for i in data_As_Dict];

          for i in range(0, len(list_Of_Dict)-1):
          v1 = list_Of_Dict[i]['vals'];
          v2 = list_Of_Dict[i+1]['vals'];
          if len(set(v1+v2)) != len(v1+v2):
          list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


          output_data = 'postidxtvalsn';
          for i in list_Of_Dict:
          output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
          print('OUTPUT:n'+output_data);







          share|improve this answer





















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186829%2fupdate-the-idx-values-if-vals-element-match-between-consecutive-rows%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            I may contribute by first eliminating the itertools and collections module.



            import csv
            import io




            Set the data



             data = '''postidxtvals
            23t4tabc
            25t7tatg
            29t8tctb
            35t1txyz
            37t2tmno
            39t3tpqr
            41t6trtu
            45t5tlfg''';
            print('INPUT:n'+data);




            Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



            data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
            list_Of_Dict = [i for i in data_As_Dict];




            Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



            Here is the code for creating the output,



            for i in range(0, len(list_Of_Dict)-1):
            v1 = list_Of_Dict[i]['vals'];
            v2 = list_Of_Dict[i+1]['vals'];
            if len(set(v1+v2)) != len(v1+v2):
            list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


            output_data = 'postidxtvalsn';
            for i in list_Of_Dict:
            output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
            print('OUTPUT:n'+output_data);




            Full code:



            import csv
            import io

            data = '''postidxtvals
            23t4tabc
            25t7tatg
            29t8tctb
            35t1txyz
            37t2tmno
            39t3tpqr
            41t6trtu
            45t5tlfg''';

            print('INPUT:n'+data);
            data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

            list_Of_Dict = [i for i in data_As_Dict];

            for i in range(0, len(list_Of_Dict)-1):
            v1 = list_Of_Dict[i]['vals'];
            v2 = list_Of_Dict[i+1]['vals'];
            if len(set(v1+v2)) != len(v1+v2):
            list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


            output_data = 'postidxtvalsn';
            for i in list_Of_Dict:
            output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
            print('OUTPUT:n'+output_data);







            share|improve this answer

























              up vote
              1
              down vote



              accepted










              I may contribute by first eliminating the itertools and collections module.



              import csv
              import io




              Set the data



               data = '''postidxtvals
              23t4tabc
              25t7tatg
              29t8tctb
              35t1txyz
              37t2tmno
              39t3tpqr
              41t6trtu
              45t5tlfg''';
              print('INPUT:n'+data);




              Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



              data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
              list_Of_Dict = [i for i in data_As_Dict];




              Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



              Here is the code for creating the output,



              for i in range(0, len(list_Of_Dict)-1):
              v1 = list_Of_Dict[i]['vals'];
              v2 = list_Of_Dict[i+1]['vals'];
              if len(set(v1+v2)) != len(v1+v2):
              list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


              output_data = 'postidxtvalsn';
              for i in list_Of_Dict:
              output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
              print('OUTPUT:n'+output_data);




              Full code:



              import csv
              import io

              data = '''postidxtvals
              23t4tabc
              25t7tatg
              29t8tctb
              35t1txyz
              37t2tmno
              39t3tpqr
              41t6trtu
              45t5tlfg''';

              print('INPUT:n'+data);
              data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

              list_Of_Dict = [i for i in data_As_Dict];

              for i in range(0, len(list_Of_Dict)-1):
              v1 = list_Of_Dict[i]['vals'];
              v2 = list_Of_Dict[i+1]['vals'];
              if len(set(v1+v2)) != len(v1+v2):
              list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


              output_data = 'postidxtvalsn';
              for i in list_Of_Dict:
              output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
              print('OUTPUT:n'+output_data);







              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                I may contribute by first eliminating the itertools and collections module.



                import csv
                import io




                Set the data



                 data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';
                print('INPUT:n'+data);




                Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
                list_Of_Dict = [i for i in data_As_Dict];




                Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



                Here is the code for creating the output,



                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);




                Full code:



                import csv
                import io

                data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';

                print('INPUT:n'+data);
                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

                list_Of_Dict = [i for i in data_As_Dict];

                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);







                share|improve this answer













                I may contribute by first eliminating the itertools and collections module.



                import csv
                import io




                Set the data



                 data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';
                print('INPUT:n'+data);




                Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
                list_Of_Dict = [i for i in data_As_Dict];




                Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



                Here is the code for creating the output,



                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);




                Full code:



                import csv
                import io

                data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';

                print('INPUT:n'+data);
                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

                list_Of_Dict = [i for i in data_As_Dict];

                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);








                share|improve this answer













                share|improve this answer



                share|improve this answer











                answered Feb 7 at 1:54









                Arief

                420112




                420112






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186829%2fupdate-the-idx-values-if-vals-element-match-between-consecutive-rows%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Chat program with C++ and SFML

                    Function to Return a JSON Like Objects Using VBA Collections and Arrays

                    Will my employers contract hold up in court?