Update the idx values if vals element match between consecutive rows

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



Input data = '''postidxtvals
23t4tabc
25t7tatg
29t8tctb
35t1txyz
37t2tmno
39t3tpqr
41t6trtu
45t5tlfg'''


Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



#Expected output to a file (tab separated):
pos idx vals
23 4 abc
25 4 atg
29 4 ctb
35 1 xyz
37 2 mno
39 3 pqr
41 3 rtu
45 5 lfg


I have written the given workable code (below) so far, and would also like to



  • write the expected output to a file

  • optimize the code for the work I am doing.

  • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

Code:



import csv
import itertools
import collections
import io
from itertools import islice

data = '''postidxtvals
23t4tabc
25t7tatg
29t8tctb
35t1txyz
37t2tmno
39t3tpqr
41t6trtu
45t5tlfg'''


data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

''' Function to read the data as key, val pairs in Ordered way.'''
def accumulate(data):
acc = collections.OrderedDict()
for d in data:
for k, v in d.items():
acc.setdefault(k, ).append(v)
return acc


''' Store data as keys,values '''
grouped_data = collections.OrderedDict()
for k, g in grouped:
grouped_data[k] = accumulate(g)


''' Print the very first k1. After this we only need to print k2 and update the idx '''
header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
print('n'.join(header_with_1stK1))

''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
k2_new = ''

for n in range(2):
if n > 0:
break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

''' Now, read as keys, values pairs for two consecutive keys '''
for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

v1_vals = ''.join(v1['vals'])
v2_vals = ''.join(v2['vals'])

v1_list = list(v1_vals)
v2_list = list(v2_vals)

''' to check if there is any matching element '''
commons = [x for x in v1_list if x in v2_list]

v2_pos = ''.join(v2['pos'])


''' start updating the idx values '''
if k2_new == '':
if len(commons) > 0:
k2_new = k1
print('t'.join([v2_pos, k2_new, v2_vals]))

else:
k2_new = ''
print('t'.join([v2_pos, k2, v2_vals]))


elif k2_new != '':
if len(commons) > 0:
k2_new = k2_new
print('t'.join([v2_pos, k2_new, v2_vals]))

else:
k2_new = ''
print('t'.join([v2_pos, k2, v2_vals]))


print('nUpdated the idx values')






share|improve this question



























    up vote
    2
    down vote

    favorite












    I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



    Input data = '''postidxtvals
    23t4tabc
    25t7tatg
    29t8tctb
    35t1txyz
    37t2tmno
    39t3tpqr
    41t6trtu
    45t5tlfg'''


    Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



    #Expected output to a file (tab separated):
    pos idx vals
    23 4 abc
    25 4 atg
    29 4 ctb
    35 1 xyz
    37 2 mno
    39 3 pqr
    41 3 rtu
    45 5 lfg


    I have written the given workable code (below) so far, and would also like to



    • write the expected output to a file

    • optimize the code for the work I am doing.

    • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

    Code:



    import csv
    import itertools
    import collections
    import io
    from itertools import islice

    data = '''postidxtvals
    23t4tabc
    25t7tatg
    29t8tctb
    35t1txyz
    37t2tmno
    39t3tpqr
    41t6trtu
    45t5tlfg'''


    data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
    grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

    ''' Function to read the data as key, val pairs in Ordered way.'''
    def accumulate(data):
    acc = collections.OrderedDict()
    for d in data:
    for k, v in d.items():
    acc.setdefault(k, ).append(v)
    return acc


    ''' Store data as keys,values '''
    grouped_data = collections.OrderedDict()
    for k, g in grouped:
    grouped_data[k] = accumulate(g)


    ''' Print the very first k1. After this we only need to print k2 and update the idx '''
    header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
    print('n'.join(header_with_1stK1))

    ''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
    k2_new = ''

    for n in range(2):
    if n > 0:
    break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

    ''' Now, read as keys, values pairs for two consecutive keys '''
    for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

    v1_vals = ''.join(v1['vals'])
    v2_vals = ''.join(v2['vals'])

    v1_list = list(v1_vals)
    v2_list = list(v2_vals)

    ''' to check if there is any matching element '''
    commons = [x for x in v1_list if x in v2_list]

    v2_pos = ''.join(v2['pos'])


    ''' start updating the idx values '''
    if k2_new == '':
    if len(commons) > 0:
    k2_new = k1
    print('t'.join([v2_pos, k2_new, v2_vals]))

    else:
    k2_new = ''
    print('t'.join([v2_pos, k2, v2_vals]))


    elif k2_new != '':
    if len(commons) > 0:
    k2_new = k2_new
    print('t'.join([v2_pos, k2_new, v2_vals]))

    else:
    k2_new = ''
    print('t'.join([v2_pos, k2, v2_vals]))


    print('nUpdated the idx values')






    share|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



      Input data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



      #Expected output to a file (tab separated):
      pos idx vals
      23 4 abc
      25 4 atg
      29 4 ctb
      35 1 xyz
      37 2 mno
      39 3 pqr
      41 3 rtu
      45 5 lfg


      I have written the given workable code (below) so far, and would also like to



      • write the expected output to a file

      • optimize the code for the work I am doing.

      • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

      Code:



      import csv
      import itertools
      import collections
      import io
      from itertools import islice

      data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
      grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

      ''' Function to read the data as key, val pairs in Ordered way.'''
      def accumulate(data):
      acc = collections.OrderedDict()
      for d in data:
      for k, v in d.items():
      acc.setdefault(k, ).append(v)
      return acc


      ''' Store data as keys,values '''
      grouped_data = collections.OrderedDict()
      for k, g in grouped:
      grouped_data[k] = accumulate(g)


      ''' Print the very first k1. After this we only need to print k2 and update the idx '''
      header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
      print('n'.join(header_with_1stK1))

      ''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
      k2_new = ''

      for n in range(2):
      if n > 0:
      break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

      ''' Now, read as keys, values pairs for two consecutive keys '''
      for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

      v1_vals = ''.join(v1['vals'])
      v2_vals = ''.join(v2['vals'])

      v1_list = list(v1_vals)
      v2_list = list(v2_vals)

      ''' to check if there is any matching element '''
      commons = [x for x in v1_list if x in v2_list]

      v2_pos = ''.join(v2['pos'])


      ''' start updating the idx values '''
      if k2_new == '':
      if len(commons) > 0:
      k2_new = k1
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      elif k2_new != '':
      if len(commons) > 0:
      k2_new = k2_new
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      print('nUpdated the idx values')






      share|improve this question













      I want to update the numbers in idx field if there is any matching letter between the vals from two consecutive row.



      Input data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      Explantion: Since there is letteramatching between idx 4 and 7 the idx at pos 25 will be updated to 4, but again there istmatching between vals at pos 25 and 29 we update the idx at 29 also to 4 instead of just 7.



      #Expected output to a file (tab separated):
      pos idx vals
      23 4 abc
      25 4 atg
      29 4 ctb
      35 1 xyz
      37 2 mno
      39 3 pqr
      41 3 rtu
      45 5 lfg


      I have written the given workable code (below) so far, and would also like to



      • write the expected output to a file

      • optimize the code for the work I am doing.

      • the answer has to follow my method of reading two consecutive rows (as keys, values) pairs at a time in ordered way. The reason is this question is just a trial of other problem I am trying to solve. Other parts of the code can be optimized in any pythonic way.

      Code:



      import csv
      import itertools
      import collections
      import io
      from itertools import islice

      data = '''postidxtvals
      23t4tabc
      25t7tatg
      29t8tctb
      35t1txyz
      37t2tmno
      39t3tpqr
      41t6trtu
      45t5tlfg'''


      data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t')
      grouped = itertools.groupby(data_As_Dict, key=lambda x: x['idx'])

      ''' Function to read the data as key, val pairs in Ordered way.'''
      def accumulate(data):
      acc = collections.OrderedDict()
      for d in data:
      for k, v in d.items():
      acc.setdefault(k, ).append(v)
      return acc


      ''' Store data as keys,values '''
      grouped_data = collections.OrderedDict()
      for k, g in grouped:
      grouped_data[k] = accumulate(g)


      ''' Print the very first k1. After this we only need to print k2 and update the idx '''
      header_with_1stK1 = io.StringIO(data).read().split('n')[0:2]
      print('n'.join(header_with_1stK1))

      ''' make an empty new_k2 value. This k2 value is updated and carried on base on match between vals from two different rows. '''
      k2_new = ''

      for n in range(2):
      if n > 0:
      break. # just to run the loop one time and to prevent resetting of k2_new values to ‘’

      ''' Now, read as keys, values pairs for two consecutive keys '''
      for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):

      v1_vals = ''.join(v1['vals'])
      v2_vals = ''.join(v2['vals'])

      v1_list = list(v1_vals)
      v2_list = list(v2_vals)

      ''' to check if there is any matching element '''
      commons = [x for x in v1_list if x in v2_list]

      v2_pos = ''.join(v2['pos'])


      ''' start updating the idx values '''
      if k2_new == '':
      if len(commons) > 0:
      k2_new = k1
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      elif k2_new != '':
      if len(commons) > 0:
      k2_new = k2_new
      print('t'.join([v2_pos, k2_new, v2_vals]))

      else:
      k2_new = ''
      print('t'.join([v2_pos, k2, v2_vals]))


      print('nUpdated the idx values')








      share|improve this question












      share|improve this question




      share|improve this question








      edited Feb 5 at 16:22
























      asked Feb 5 at 14:20









      everestial007

      1509




      1509




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I may contribute by first eliminating the itertools and collections module.



          import csv
          import io




          Set the data



           data = '''postidxtvals
          23t4tabc
          25t7tatg
          29t8tctb
          35t1txyz
          37t2tmno
          39t3tpqr
          41t6trtu
          45t5tlfg''';
          print('INPUT:n'+data);




          Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



          data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
          list_Of_Dict = [i for i in data_As_Dict];




          Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



          Here is the code for creating the output,



          for i in range(0, len(list_Of_Dict)-1):
          v1 = list_Of_Dict[i]['vals'];
          v2 = list_Of_Dict[i+1]['vals'];
          if len(set(v1+v2)) != len(v1+v2):
          list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


          output_data = 'postidxtvalsn';
          for i in list_Of_Dict:
          output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
          print('OUTPUT:n'+output_data);




          Full code:



          import csv
          import io

          data = '''postidxtvals
          23t4tabc
          25t7tatg
          29t8tctb
          35t1txyz
          37t2tmno
          39t3tpqr
          41t6trtu
          45t5tlfg''';

          print('INPUT:n'+data);
          data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

          list_Of_Dict = [i for i in data_As_Dict];

          for i in range(0, len(list_Of_Dict)-1):
          v1 = list_Of_Dict[i]['vals'];
          v2 = list_Of_Dict[i+1]['vals'];
          if len(set(v1+v2)) != len(v1+v2):
          list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


          output_data = 'postidxtvalsn';
          for i in list_Of_Dict:
          output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
          print('OUTPUT:n'+output_data);







          share|improve this answer





















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186829%2fupdate-the-idx-values-if-vals-element-match-between-consecutive-rows%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            I may contribute by first eliminating the itertools and collections module.



            import csv
            import io




            Set the data



             data = '''postidxtvals
            23t4tabc
            25t7tatg
            29t8tctb
            35t1txyz
            37t2tmno
            39t3tpqr
            41t6trtu
            45t5tlfg''';
            print('INPUT:n'+data);




            Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



            data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
            list_Of_Dict = [i for i in data_As_Dict];




            Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



            Here is the code for creating the output,



            for i in range(0, len(list_Of_Dict)-1):
            v1 = list_Of_Dict[i]['vals'];
            v2 = list_Of_Dict[i+1]['vals'];
            if len(set(v1+v2)) != len(v1+v2):
            list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


            output_data = 'postidxtvalsn';
            for i in list_Of_Dict:
            output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
            print('OUTPUT:n'+output_data);




            Full code:



            import csv
            import io

            data = '''postidxtvals
            23t4tabc
            25t7tatg
            29t8tctb
            35t1txyz
            37t2tmno
            39t3tpqr
            41t6trtu
            45t5tlfg''';

            print('INPUT:n'+data);
            data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

            list_Of_Dict = [i for i in data_As_Dict];

            for i in range(0, len(list_Of_Dict)-1):
            v1 = list_Of_Dict[i]['vals'];
            v2 = list_Of_Dict[i+1]['vals'];
            if len(set(v1+v2)) != len(v1+v2):
            list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


            output_data = 'postidxtvalsn';
            for i in list_Of_Dict:
            output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
            print('OUTPUT:n'+output_data);







            share|improve this answer

























              up vote
              1
              down vote



              accepted










              I may contribute by first eliminating the itertools and collections module.



              import csv
              import io




              Set the data



               data = '''postidxtvals
              23t4tabc
              25t7tatg
              29t8tctb
              35t1txyz
              37t2tmno
              39t3tpqr
              41t6trtu
              45t5tlfg''';
              print('INPUT:n'+data);




              Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



              data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
              list_Of_Dict = [i for i in data_As_Dict];




              Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



              Here is the code for creating the output,



              for i in range(0, len(list_Of_Dict)-1):
              v1 = list_Of_Dict[i]['vals'];
              v2 = list_Of_Dict[i+1]['vals'];
              if len(set(v1+v2)) != len(v1+v2):
              list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


              output_data = 'postidxtvalsn';
              for i in list_Of_Dict:
              output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
              print('OUTPUT:n'+output_data);




              Full code:



              import csv
              import io

              data = '''postidxtvals
              23t4tabc
              25t7tatg
              29t8tctb
              35t1txyz
              37t2tmno
              39t3tpqr
              41t6trtu
              45t5tlfg''';

              print('INPUT:n'+data);
              data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

              list_Of_Dict = [i for i in data_As_Dict];

              for i in range(0, len(list_Of_Dict)-1):
              v1 = list_Of_Dict[i]['vals'];
              v2 = list_Of_Dict[i+1]['vals'];
              if len(set(v1+v2)) != len(v1+v2):
              list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


              output_data = 'postidxtvalsn';
              for i in list_Of_Dict:
              output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
              print('OUTPUT:n'+output_data);







              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                I may contribute by first eliminating the itertools and collections module.



                import csv
                import io




                Set the data



                 data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';
                print('INPUT:n'+data);




                Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
                list_Of_Dict = [i for i in data_As_Dict];




                Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



                Here is the code for creating the output,



                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);




                Full code:



                import csv
                import io

                data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';

                print('INPUT:n'+data);
                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

                list_Of_Dict = [i for i in data_As_Dict];

                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);







                share|improve this answer













                I may contribute by first eliminating the itertools and collections module.



                import csv
                import io




                Set the data



                 data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';
                print('INPUT:n'+data);




                Create a "set" of OrderedDictionary object based on the data, and then save it as a list. This is because data_As_dict is not subscriptable, relatively not flexible for work in iteration, and each dictionary in it will be erased after being reused.



                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');
                list_Of_Dict = [i for i in data_As_Dict];




                Now it is saved as a list, we can access each row. We can check the condition by simply checking v1+v2. For example, if v1='abc' and v2='atg', then v1+v2='abcatg'. We have a pattern, 'a' occurs twice, and the others are unique. So here is your condition : len(set(v1+v2)) != len(v1+v2)



                Here is the code for creating the output,



                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);




                Full code:



                import csv
                import io

                data = '''postidxtvals
                23t4tabc
                25t7tatg
                29t8tctb
                35t1txyz
                37t2tmno
                39t3tpqr
                41t6trtu
                45t5tlfg''';

                print('INPUT:n'+data);
                data_As_Dict = csv.DictReader(io.StringIO(data), delimiter='t');

                list_Of_Dict = [i for i in data_As_Dict];

                for i in range(0, len(list_Of_Dict)-1):
                v1 = list_Of_Dict[i]['vals'];
                v2 = list_Of_Dict[i+1]['vals'];
                if len(set(v1+v2)) != len(v1+v2):
                list_Of_Dict[i+1]['idx'] = list_Of_Dict[i]['idx'];


                output_data = 'postidxtvalsn';
                for i in list_Of_Dict:
                output_data += i['pos']+'t'+i['idx']+'t'+i['vals']+'n';
                print('OUTPUT:n'+output_data);








                share|improve this answer













                share|improve this answer



                share|improve this answer











                answered Feb 7 at 1:54









                Arief

                420112




                420112






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186829%2fupdate-the-idx-values-if-vals-element-match-between-consecutive-rows%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    81WKSgFjXnCUcD 24wWEEcFgoVxmCyKGXGN74uE0zWVFC8aJG,h Hoq,XX,e LlIUUrLv9l3TuEMHOEdXr3ffnYxQDD0wocl02j8
                    yU i EDNYG,yp3mvGWXrj Sng095 d 6jo8,Swb1,LPjM08yJtlX7UHVWj

                    Popular posts from this blog

                    Chat program with C++ and SFML

                    Function to Return a JSON Like Objects Using VBA Collections and Arrays

                    Python - Quiz Game with Tkinter