Speed up script that determines if all columns in a row are the same or not

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite












I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".



For example:



1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59


Writes a new file:



1-69
no_match
1-46
no_match
6-1
5-51
4-59


Deleting the second and fourth rows because they contain non-identical columns.



Here is my far from elegant script:



#!/bin/bash

ind=$1 #file in
num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
echo "alleles" > same_alleles.txt #new file to write to

#loop over every line of 'file in'
for (( i =2; i <= "$num"; i++));do
#take first column of row being looped over (string to check match of other columns with)
match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
#counts how many matches there are in the looped row
match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
#counts number of commas in each looped row
comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
#number of columns in each row
tot_num=$((comma_num + 1))
#writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
if [ "$tot_num" == "$match_num" ]; then
echo $match >> same_alleles.txt
else
echo "no_match" >> same_alleles.txt
fi
done

#END


Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:



allele
4-39
1-46,1-46,1-46
4-39
4-4,4-4,4-4,4-4
3-23,3-23,3-23
3-21,3-21
4-34,4-34
3-33
4-4,4-4,4-4
4-59,4-59
3-23,3-23,3-23
1-45
1-46,1-46
3-23,3-23,3-23
4-61
1-8
3-7
4-4
4-59,4-59,4-59
1-18,1-18
3-21,3-21
3-23,3-23,3-23
3-23,3-23,3-23
3-30,3-30-3
4-39,4-39
4-61
2-70
4-38-2,4-38-2
1-69,1-69,1-69,1-69,1-69
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
1-18
3-7
1-69
4-30-4
4-39
1-69
1-69
4-39
3-23,3-23,3-23
4-39
2-5
3-30-3
4-59,4-59,4-59
3-21,3-21
4-59,4-59
3-9
4-59,4-59,4-59
4-31,4-31
1-46,1-46
1-46,1-46,1-46
5-51,5-51
3-48
4-31,4-31
3-7
4-61
4-59,4-59,4-59,4-61,4-61,4-61
4-38-2,4-38-2
3-21,3-21
1-69,1-69,1-69
3-23,3-23,3-23
4-59,4-59
3-48
3-48
1-46,1-46
3-23,3-23,3-23
3-30-3,3-30-3
1-46,1-46,1-46
3-64
3-73,3-73
4-4
1-18
3-7
1-46,1-46
1-3
4-61
2-70
4-59,4-59
5-51,5-51
3-49,3-49
4-4,4-4,4-4
4-31,4-31
1-69
1-69,1-69,1-69
4-39
3-21,3-21
3-33
3-9
3-48
4-59,4-59
4-59,4-59
4-39,4-39
3-21,3-21
1-18


My script takes ~ 7 sec to complete this.



Sorry for the long post and thank you in advance!







share|improve this question























    up vote
    4
    down vote

    favorite












    I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".



    For example:



    1-69
    4-59,4-59,4-59,4-61,4-61,4-61
    1-46,1-46
    4-59,4-59,4-59,4-61,4-61,4-61
    6-1,6-1
    5-51,5-51
    4-59,4-59


    Writes a new file:



    1-69
    no_match
    1-46
    no_match
    6-1
    5-51
    4-59


    Deleting the second and fourth rows because they contain non-identical columns.



    Here is my far from elegant script:



    #!/bin/bash

    ind=$1 #file in
    num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
    echo "alleles" > same_alleles.txt #new file to write to

    #loop over every line of 'file in'
    for (( i =2; i <= "$num"; i++));do
    #take first column of row being looped over (string to check match of other columns with)
    match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
    #counts how many matches there are in the looped row
    match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
    #counts number of commas in each looped row
    comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
    #number of columns in each row
    tot_num=$((comma_num + 1))
    #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
    if [ "$tot_num" == "$match_num" ]; then
    echo $match >> same_alleles.txt
    else
    echo "no_match" >> same_alleles.txt
    fi
    done

    #END


    Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:



    allele
    4-39
    1-46,1-46,1-46
    4-39
    4-4,4-4,4-4,4-4
    3-23,3-23,3-23
    3-21,3-21
    4-34,4-34
    3-33
    4-4,4-4,4-4
    4-59,4-59
    3-23,3-23,3-23
    1-45
    1-46,1-46
    3-23,3-23,3-23
    4-61
    1-8
    3-7
    4-4
    4-59,4-59,4-59
    1-18,1-18
    3-21,3-21
    3-23,3-23,3-23
    3-23,3-23,3-23
    3-30,3-30-3
    4-39,4-39
    4-61
    2-70
    4-38-2,4-38-2
    1-69,1-69,1-69,1-69,1-69
    1-69
    4-59,4-59,4-59,4-61,4-61,4-61
    1-46,1-46
    4-59,4-59,4-59,4-61,4-61,4-61
    6-1,6-1
    5-51,5-51
    4-59,4-59
    1-18
    3-7
    1-69
    4-30-4
    4-39
    1-69
    1-69
    4-39
    3-23,3-23,3-23
    4-39
    2-5
    3-30-3
    4-59,4-59,4-59
    3-21,3-21
    4-59,4-59
    3-9
    4-59,4-59,4-59
    4-31,4-31
    1-46,1-46
    1-46,1-46,1-46
    5-51,5-51
    3-48
    4-31,4-31
    3-7
    4-61
    4-59,4-59,4-59,4-61,4-61,4-61
    4-38-2,4-38-2
    3-21,3-21
    1-69,1-69,1-69
    3-23,3-23,3-23
    4-59,4-59
    3-48
    3-48
    1-46,1-46
    3-23,3-23,3-23
    3-30-3,3-30-3
    1-46,1-46,1-46
    3-64
    3-73,3-73
    4-4
    1-18
    3-7
    1-46,1-46
    1-3
    4-61
    2-70
    4-59,4-59
    5-51,5-51
    3-49,3-49
    4-4,4-4,4-4
    4-31,4-31
    1-69
    1-69,1-69,1-69
    4-39
    3-21,3-21
    3-33
    3-9
    3-48
    4-59,4-59
    4-59,4-59
    4-39,4-39
    3-21,3-21
    1-18


    My script takes ~ 7 sec to complete this.



    Sorry for the long post and thank you in advance!







    share|improve this question





















      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".



      For example:



      1-69
      4-59,4-59,4-59,4-61,4-61,4-61
      1-46,1-46
      4-59,4-59,4-59,4-61,4-61,4-61
      6-1,6-1
      5-51,5-51
      4-59,4-59


      Writes a new file:



      1-69
      no_match
      1-46
      no_match
      6-1
      5-51
      4-59


      Deleting the second and fourth rows because they contain non-identical columns.



      Here is my far from elegant script:



      #!/bin/bash

      ind=$1 #file in
      num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
      echo "alleles" > same_alleles.txt #new file to write to

      #loop over every line of 'file in'
      for (( i =2; i <= "$num"; i++));do
      #take first column of row being looped over (string to check match of other columns with)
      match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
      #counts how many matches there are in the looped row
      match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
      #counts number of commas in each looped row
      comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
      #number of columns in each row
      tot_num=$((comma_num + 1))
      #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
      if [ "$tot_num" == "$match_num" ]; then
      echo $match >> same_alleles.txt
      else
      echo "no_match" >> same_alleles.txt
      fi
      done

      #END


      Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:



      allele
      4-39
      1-46,1-46,1-46
      4-39
      4-4,4-4,4-4,4-4
      3-23,3-23,3-23
      3-21,3-21
      4-34,4-34
      3-33
      4-4,4-4,4-4
      4-59,4-59
      3-23,3-23,3-23
      1-45
      1-46,1-46
      3-23,3-23,3-23
      4-61
      1-8
      3-7
      4-4
      4-59,4-59,4-59
      1-18,1-18
      3-21,3-21
      3-23,3-23,3-23
      3-23,3-23,3-23
      3-30,3-30-3
      4-39,4-39
      4-61
      2-70
      4-38-2,4-38-2
      1-69,1-69,1-69,1-69,1-69
      1-69
      4-59,4-59,4-59,4-61,4-61,4-61
      1-46,1-46
      4-59,4-59,4-59,4-61,4-61,4-61
      6-1,6-1
      5-51,5-51
      4-59,4-59
      1-18
      3-7
      1-69
      4-30-4
      4-39
      1-69
      1-69
      4-39
      3-23,3-23,3-23
      4-39
      2-5
      3-30-3
      4-59,4-59,4-59
      3-21,3-21
      4-59,4-59
      3-9
      4-59,4-59,4-59
      4-31,4-31
      1-46,1-46
      1-46,1-46,1-46
      5-51,5-51
      3-48
      4-31,4-31
      3-7
      4-61
      4-59,4-59,4-59,4-61,4-61,4-61
      4-38-2,4-38-2
      3-21,3-21
      1-69,1-69,1-69
      3-23,3-23,3-23
      4-59,4-59
      3-48
      3-48
      1-46,1-46
      3-23,3-23,3-23
      3-30-3,3-30-3
      1-46,1-46,1-46
      3-64
      3-73,3-73
      4-4
      1-18
      3-7
      1-46,1-46
      1-3
      4-61
      2-70
      4-59,4-59
      5-51,5-51
      3-49,3-49
      4-4,4-4,4-4
      4-31,4-31
      1-69
      1-69,1-69,1-69
      4-39
      3-21,3-21
      3-33
      3-9
      3-48
      4-59,4-59
      4-59,4-59
      4-39,4-39
      3-21,3-21
      1-18


      My script takes ~ 7 sec to complete this.



      Sorry for the long post and thank you in advance!







      share|improve this question











      I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".



      For example:



      1-69
      4-59,4-59,4-59,4-61,4-61,4-61
      1-46,1-46
      4-59,4-59,4-59,4-61,4-61,4-61
      6-1,6-1
      5-51,5-51
      4-59,4-59


      Writes a new file:



      1-69
      no_match
      1-46
      no_match
      6-1
      5-51
      4-59


      Deleting the second and fourth rows because they contain non-identical columns.



      Here is my far from elegant script:



      #!/bin/bash

      ind=$1 #file in
      num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
      echo "alleles" > same_alleles.txt #new file to write to

      #loop over every line of 'file in'
      for (( i =2; i <= "$num"; i++));do
      #take first column of row being looped over (string to check match of other columns with)
      match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
      #counts how many matches there are in the looped row
      match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
      #counts number of commas in each looped row
      comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
      #number of columns in each row
      tot_num=$((comma_num + 1))
      #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
      if [ "$tot_num" == "$match_num" ]; then
      echo $match >> same_alleles.txt
      else
      echo "no_match" >> same_alleles.txt
      fi
      done

      #END


      Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:



      allele
      4-39
      1-46,1-46,1-46
      4-39
      4-4,4-4,4-4,4-4
      3-23,3-23,3-23
      3-21,3-21
      4-34,4-34
      3-33
      4-4,4-4,4-4
      4-59,4-59
      3-23,3-23,3-23
      1-45
      1-46,1-46
      3-23,3-23,3-23
      4-61
      1-8
      3-7
      4-4
      4-59,4-59,4-59
      1-18,1-18
      3-21,3-21
      3-23,3-23,3-23
      3-23,3-23,3-23
      3-30,3-30-3
      4-39,4-39
      4-61
      2-70
      4-38-2,4-38-2
      1-69,1-69,1-69,1-69,1-69
      1-69
      4-59,4-59,4-59,4-61,4-61,4-61
      1-46,1-46
      4-59,4-59,4-59,4-61,4-61,4-61
      6-1,6-1
      5-51,5-51
      4-59,4-59
      1-18
      3-7
      1-69
      4-30-4
      4-39
      1-69
      1-69
      4-39
      3-23,3-23,3-23
      4-39
      2-5
      3-30-3
      4-59,4-59,4-59
      3-21,3-21
      4-59,4-59
      3-9
      4-59,4-59,4-59
      4-31,4-31
      1-46,1-46
      1-46,1-46,1-46
      5-51,5-51
      3-48
      4-31,4-31
      3-7
      4-61
      4-59,4-59,4-59,4-61,4-61,4-61
      4-38-2,4-38-2
      3-21,3-21
      1-69,1-69,1-69
      3-23,3-23,3-23
      4-59,4-59
      3-48
      3-48
      1-46,1-46
      3-23,3-23,3-23
      3-30-3,3-30-3
      1-46,1-46,1-46
      3-64
      3-73,3-73
      4-4
      1-18
      3-7
      1-46,1-46
      1-3
      4-61
      2-70
      4-59,4-59
      5-51,5-51
      3-49,3-49
      4-4,4-4,4-4
      4-31,4-31
      1-69
      1-69,1-69,1-69
      4-39
      3-21,3-21
      3-33
      3-9
      3-48
      4-59,4-59
      4-59,4-59
      4-39,4-39
      3-21,3-21
      1-18


      My script takes ~ 7 sec to complete this.



      Sorry for the long post and thank you in advance!









      share|improve this question










      share|improve this question




      share|improve this question









      asked Aug 6 at 19:11









      Johnny

      452




      452




















          4 Answers
          4






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          $ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
          1-69
          no_match
          1-46
          no_match
          6-1
          5-51
          4-59


          I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk, you don't need grep and cut as awk would easily be able to do their tasks (which are not needed in this case though).



          The awk script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.



          As a script:



          #!/usr/bin/awk -f

          BEGIN FS = ","


          for (i=2; i<=NF; ++i)
          if ($i != $1)
          print "no_match"
          next


          print $1




          • FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.


          • NF is the number of fields in the current record ("columns on the line").


          • $i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

          Related:



          • Why is using a shell loop to process text considered bad practice?


          DRY variation:



          #!/usr/bin/awk -f

          BEGIN FS = ","


          output = $1

          for (i=2; i<=NF; ++i)
          if ($i != output)
          output = "no_match"
          break


          print output






          share|improve this answer






























            up vote
            1
            down vote













            Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.



            awk -F',' '

            eq=1;
            for (i = 2; i <= NF; i++)
            if ($1 != $i)
            eq=0;
            print eq ? $1 : "no_match";

            ' $1







            share|improve this answer




























              up vote
              1
              down vote













              With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:



              perl -MList::MoreUtils=distinct -F, -lne '
              print( (distinct @F) > 1 ? "no_match" : $F[0])
              ' example
              1-69
              no_match
              1-46
              no_match
              6-1
              5-51
              4-59





              share|improve this answer






























                up vote
                1
                down vote













                You could do this using the sed editor also, like as shown:



                sed -e '
                s/^([^,]*)(,1)*$/1/;t
                s/.*/NOMATCH/
                ' input.csv


                Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.



                Explanation:



                This is what goes on in my head when seeing this pbm:

                Think of the comma-separated fields as stones of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.



                Something like:



                STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line



                Now in terms of regex terminology, it becomes:



                ^ (STONEA) (,1) (,1) (,1) ... all the way to end of line



                ^ (STONEA) (,1)* $



                Output:



                1-69
                NOMATCH
                1-46
                NOMATCH
                6-1
                5-51
                4-59





                share|improve this answer























                • consider c for command two rather than s - should be nominally quicker still. smart, though.
                  – mikeserv
                  Aug 7 at 3:31






                • 1




                  @mikeserv Thank you mike for your gracious words.I feel delighted.
                  – Rakesh Sharma
                  Aug 8 at 5:08










                Your Answer







                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "106"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );








                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f460887%2fspeed-up-script-that-determines-if-all-columns-in-a-row-are-the-same-or-not%23new-answer', 'question_page');

                );

                Post as a guest






























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                5
                down vote



                accepted










                $ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
                1-69
                no_match
                1-46
                no_match
                6-1
                5-51
                4-59


                I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk, you don't need grep and cut as awk would easily be able to do their tasks (which are not needed in this case though).



                The awk script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.



                As a script:



                #!/usr/bin/awk -f

                BEGIN FS = ","


                for (i=2; i<=NF; ++i)
                if ($i != $1)
                print "no_match"
                next


                print $1




                • FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.


                • NF is the number of fields in the current record ("columns on the line").


                • $i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

                Related:



                • Why is using a shell loop to process text considered bad practice?


                DRY variation:



                #!/usr/bin/awk -f

                BEGIN FS = ","


                output = $1

                for (i=2; i<=NF; ++i)
                if ($i != output)
                output = "no_match"
                break


                print output






                share|improve this answer



























                  up vote
                  5
                  down vote



                  accepted










                  $ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
                  1-69
                  no_match
                  1-46
                  no_match
                  6-1
                  5-51
                  4-59


                  I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk, you don't need grep and cut as awk would easily be able to do their tasks (which are not needed in this case though).



                  The awk script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.



                  As a script:



                  #!/usr/bin/awk -f

                  BEGIN FS = ","


                  for (i=2; i<=NF; ++i)
                  if ($i != $1)
                  print "no_match"
                  next


                  print $1




                  • FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.


                  • NF is the number of fields in the current record ("columns on the line").


                  • $i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

                  Related:



                  • Why is using a shell loop to process text considered bad practice?


                  DRY variation:



                  #!/usr/bin/awk -f

                  BEGIN FS = ","


                  output = $1

                  for (i=2; i<=NF; ++i)
                  if ($i != output)
                  output = "no_match"
                  break


                  print output






                  share|improve this answer

























                    up vote
                    5
                    down vote



                    accepted







                    up vote
                    5
                    down vote



                    accepted






                    $ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
                    1-69
                    no_match
                    1-46
                    no_match
                    6-1
                    5-51
                    4-59


                    I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk, you don't need grep and cut as awk would easily be able to do their tasks (which are not needed in this case though).



                    The awk script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.



                    As a script:



                    #!/usr/bin/awk -f

                    BEGIN FS = ","


                    for (i=2; i<=NF; ++i)
                    if ($i != $1)
                    print "no_match"
                    next


                    print $1




                    • FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.


                    • NF is the number of fields in the current record ("columns on the line").


                    • $i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

                    Related:



                    • Why is using a shell loop to process text considered bad practice?


                    DRY variation:



                    #!/usr/bin/awk -f

                    BEGIN FS = ","


                    output = $1

                    for (i=2; i<=NF; ++i)
                    if ($i != output)
                    output = "no_match"
                    break


                    print output






                    share|improve this answer















                    $ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
                    1-69
                    no_match
                    1-46
                    no_match
                    6-1
                    5-51
                    4-59


                    I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk, you don't need grep and cut as awk would easily be able to do their tasks (which are not needed in this case though).



                    The awk script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.



                    As a script:



                    #!/usr/bin/awk -f

                    BEGIN FS = ","


                    for (i=2; i<=NF; ++i)
                    if ($i != $1)
                    print "no_match"
                    next


                    print $1




                    • FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.


                    • NF is the number of fields in the current record ("columns on the line").


                    • $i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

                    Related:



                    • Why is using a shell loop to process text considered bad practice?


                    DRY variation:



                    #!/usr/bin/awk -f

                    BEGIN FS = ","


                    output = $1

                    for (i=2; i<=NF; ++i)
                    if ($i != output)
                    output = "no_match"
                    break


                    print output







                    share|improve this answer















                    share|improve this answer



                    share|improve this answer








                    edited Aug 6 at 19:56


























                    answered Aug 6 at 19:15









                    Kusalananda

                    102k13199314




                    102k13199314






















                        up vote
                        1
                        down vote













                        Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.



                        awk -F',' '

                        eq=1;
                        for (i = 2; i <= NF; i++)
                        if ($1 != $i)
                        eq=0;
                        print eq ? $1 : "no_match";

                        ' $1







                        share|improve this answer

























                          up vote
                          1
                          down vote













                          Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.



                          awk -F',' '

                          eq=1;
                          for (i = 2; i <= NF; i++)
                          if ($1 != $i)
                          eq=0;
                          print eq ? $1 : "no_match";

                          ' $1







                          share|improve this answer























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.



                            awk -F',' '

                            eq=1;
                            for (i = 2; i <= NF; i++)
                            if ($1 != $i)
                            eq=0;
                            print eq ? $1 : "no_match";

                            ' $1







                            share|improve this answer













                            Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.



                            awk -F',' '

                            eq=1;
                            for (i = 2; i <= NF; i++)
                            if ($1 != $i)
                            eq=0;
                            print eq ? $1 : "no_match";

                            ' $1








                            share|improve this answer













                            share|improve this answer



                            share|improve this answer











                            answered Aug 6 at 19:21









                            RalfFriedl

                            1,479112




                            1,479112




















                                up vote
                                1
                                down vote













                                With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:



                                perl -MList::MoreUtils=distinct -F, -lne '
                                print( (distinct @F) > 1 ? "no_match" : $F[0])
                                ' example
                                1-69
                                no_match
                                1-46
                                no_match
                                6-1
                                5-51
                                4-59





                                share|improve this answer



























                                  up vote
                                  1
                                  down vote













                                  With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:



                                  perl -MList::MoreUtils=distinct -F, -lne '
                                  print( (distinct @F) > 1 ? "no_match" : $F[0])
                                  ' example
                                  1-69
                                  no_match
                                  1-46
                                  no_match
                                  6-1
                                  5-51
                                  4-59





                                  share|improve this answer

























                                    up vote
                                    1
                                    down vote










                                    up vote
                                    1
                                    down vote









                                    With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:



                                    perl -MList::MoreUtils=distinct -F, -lne '
                                    print( (distinct @F) > 1 ? "no_match" : $F[0])
                                    ' example
                                    1-69
                                    no_match
                                    1-46
                                    no_match
                                    6-1
                                    5-51
                                    4-59





                                    share|improve this answer















                                    With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:



                                    perl -MList::MoreUtils=distinct -F, -lne '
                                    print( (distinct @F) > 1 ? "no_match" : $F[0])
                                    ' example
                                    1-69
                                    no_match
                                    1-46
                                    no_match
                                    6-1
                                    5-51
                                    4-59






                                    share|improve this answer















                                    share|improve this answer



                                    share|improve this answer








                                    edited Aug 6 at 20:34


























                                    answered Aug 6 at 20:29









                                    steeldriver

                                    31.2k34978




                                    31.2k34978




















                                        up vote
                                        1
                                        down vote













                                        You could do this using the sed editor also, like as shown:



                                        sed -e '
                                        s/^([^,]*)(,1)*$/1/;t
                                        s/.*/NOMATCH/
                                        ' input.csv


                                        Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.



                                        Explanation:



                                        This is what goes on in my head when seeing this pbm:

                                        Think of the comma-separated fields as stones of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.



                                        Something like:



                                        STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line



                                        Now in terms of regex terminology, it becomes:



                                        ^ (STONEA) (,1) (,1) (,1) ... all the way to end of line



                                        ^ (STONEA) (,1)* $



                                        Output:



                                        1-69
                                        NOMATCH
                                        1-46
                                        NOMATCH
                                        6-1
                                        5-51
                                        4-59





                                        share|improve this answer























                                        • consider c for command two rather than s - should be nominally quicker still. smart, though.
                                          – mikeserv
                                          Aug 7 at 3:31






                                        • 1




                                          @mikeserv Thank you mike for your gracious words.I feel delighted.
                                          – Rakesh Sharma
                                          Aug 8 at 5:08














                                        up vote
                                        1
                                        down vote













                                        You could do this using the sed editor also, like as shown:



                                        sed -e '
                                        s/^([^,]*)(,1)*$/1/;t
                                        s/.*/NOMATCH/
                                        ' input.csv


                                        Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.



                                        Explanation:



                                        This is what goes on in my head when seeing this pbm:

                                        Think of the comma-separated fields as stones of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.



                                        Something like:



                                        STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line



                                        Now in terms of regex terminology, it becomes:



                                        ^ (STONEA) (,1) (,1) (,1) ... all the way to end of line



                                        ^ (STONEA) (,1)* $



                                        Output:



                                        1-69
                                        NOMATCH
                                        1-46
                                        NOMATCH
                                        6-1
                                        5-51
                                        4-59





                                        share|improve this answer























                                        • consider c for command two rather than s - should be nominally quicker still. smart, though.
                                          – mikeserv
                                          Aug 7 at 3:31






                                        • 1




                                          @mikeserv Thank you mike for your gracious words.I feel delighted.
                                          – Rakesh Sharma
                                          Aug 8 at 5:08












                                        up vote
                                        1
                                        down vote










                                        up vote
                                        1
                                        down vote









                                        You could do this using the sed editor also, like as shown:



                                        sed -e '
                                        s/^([^,]*)(,1)*$/1/;t
                                        s/.*/NOMATCH/
                                        ' input.csv


                                        Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.



                                        Explanation:



                                        This is what goes on in my head when seeing this pbm:

                                        Think of the comma-separated fields as stones of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.



                                        Something like:



                                        STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line



                                        Now in terms of regex terminology, it becomes:



                                        ^ (STONEA) (,1) (,1) (,1) ... all the way to end of line



                                        ^ (STONEA) (,1)* $



                                        Output:



                                        1-69
                                        NOMATCH
                                        1-46
                                        NOMATCH
                                        6-1
                                        5-51
                                        4-59





                                        share|improve this answer















                                        You could do this using the sed editor also, like as shown:



                                        sed -e '
                                        s/^([^,]*)(,1)*$/1/;t
                                        s/.*/NOMATCH/
                                        ' input.csv


                                        Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.



                                        Explanation:



                                        This is what goes on in my head when seeing this pbm:

                                        Think of the comma-separated fields as stones of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.



                                        Something like:



                                        STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line



                                        Now in terms of regex terminology, it becomes:



                                        ^ (STONEA) (,1) (,1) (,1) ... all the way to end of line



                                        ^ (STONEA) (,1)* $



                                        Output:



                                        1-69
                                        NOMATCH
                                        1-46
                                        NOMATCH
                                        6-1
                                        5-51
                                        4-59






                                        share|improve this answer















                                        share|improve this answer



                                        share|improve this answer








                                        edited Aug 7 at 3:08


























                                        answered Aug 7 at 2:55









                                        Rakesh Sharma

                                        37813




                                        37813











                                        • consider c for command two rather than s - should be nominally quicker still. smart, though.
                                          – mikeserv
                                          Aug 7 at 3:31






                                        • 1




                                          @mikeserv Thank you mike for your gracious words.I feel delighted.
                                          – Rakesh Sharma
                                          Aug 8 at 5:08
















                                        • consider c for command two rather than s - should be nominally quicker still. smart, though.
                                          – mikeserv
                                          Aug 7 at 3:31






                                        • 1




                                          @mikeserv Thank you mike for your gracious words.I feel delighted.
                                          – Rakesh Sharma
                                          Aug 8 at 5:08















                                        consider c for command two rather than s - should be nominally quicker still. smart, though.
                                        – mikeserv
                                        Aug 7 at 3:31




                                        consider c for command two rather than s - should be nominally quicker still. smart, though.
                                        – mikeserv
                                        Aug 7 at 3:31




                                        1




                                        1




                                        @mikeserv Thank you mike for your gracious words.I feel delighted.
                                        – Rakesh Sharma
                                        Aug 8 at 5:08




                                        @mikeserv Thank you mike for your gracious words.I feel delighted.
                                        – Rakesh Sharma
                                        Aug 8 at 5:08












                                         

                                        draft saved


                                        draft discarded


























                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f460887%2fspeed-up-script-that-determines-if-all-columns-in-a-row-are-the-same-or-not%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Popular posts from this blog

                                        Greedy Best First Search implementation in Rust

                                        Function to Return a JSON Like Objects Using VBA Collections and Arrays

                                        C++11 CLH Lock Implementation