Speed up script that determines if all columns in a row are the same or not
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".
For example:
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
Writes a new file:
1-69
no_match
1-46
no_match
6-1
5-51
4-59
Deleting the second and fourth rows because they contain non-identical columns.
Here is my far from elegant script:
#!/bin/bash
ind=$1 #file in
num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
echo "alleles" > same_alleles.txt #new file to write to
#loop over every line of 'file in'
for (( i =2; i <= "$num"; i++));do
#take first column of row being looped over (string to check match of other columns with)
match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
#counts how many matches there are in the looped row
match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
#counts number of commas in each looped row
comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
#number of columns in each row
tot_num=$((comma_num + 1))
#writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
if [ "$tot_num" == "$match_num" ]; then
echo $match >> same_alleles.txt
else
echo "no_match" >> same_alleles.txt
fi
done
#END
Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:
allele
4-39
1-46,1-46,1-46
4-39
4-4,4-4,4-4,4-4
3-23,3-23,3-23
3-21,3-21
4-34,4-34
3-33
4-4,4-4,4-4
4-59,4-59
3-23,3-23,3-23
1-45
1-46,1-46
3-23,3-23,3-23
4-61
1-8
3-7
4-4
4-59,4-59,4-59
1-18,1-18
3-21,3-21
3-23,3-23,3-23
3-23,3-23,3-23
3-30,3-30-3
4-39,4-39
4-61
2-70
4-38-2,4-38-2
1-69,1-69,1-69,1-69,1-69
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
1-18
3-7
1-69
4-30-4
4-39
1-69
1-69
4-39
3-23,3-23,3-23
4-39
2-5
3-30-3
4-59,4-59,4-59
3-21,3-21
4-59,4-59
3-9
4-59,4-59,4-59
4-31,4-31
1-46,1-46
1-46,1-46,1-46
5-51,5-51
3-48
4-31,4-31
3-7
4-61
4-59,4-59,4-59,4-61,4-61,4-61
4-38-2,4-38-2
3-21,3-21
1-69,1-69,1-69
3-23,3-23,3-23
4-59,4-59
3-48
3-48
1-46,1-46
3-23,3-23,3-23
3-30-3,3-30-3
1-46,1-46,1-46
3-64
3-73,3-73
4-4
1-18
3-7
1-46,1-46
1-3
4-61
2-70
4-59,4-59
5-51,5-51
3-49,3-49
4-4,4-4,4-4
4-31,4-31
1-69
1-69,1-69,1-69
4-39
3-21,3-21
3-33
3-9
3-48
4-59,4-59
4-59,4-59
4-39,4-39
3-21,3-21
1-18
My script takes ~ 7 sec to complete this.
Sorry for the long post and thank you in advance!
shell-script text-processing scripting arithmetic
add a comment |Â
up vote
4
down vote
favorite
I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".
For example:
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
Writes a new file:
1-69
no_match
1-46
no_match
6-1
5-51
4-59
Deleting the second and fourth rows because they contain non-identical columns.
Here is my far from elegant script:
#!/bin/bash
ind=$1 #file in
num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
echo "alleles" > same_alleles.txt #new file to write to
#loop over every line of 'file in'
for (( i =2; i <= "$num"; i++));do
#take first column of row being looped over (string to check match of other columns with)
match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
#counts how many matches there are in the looped row
match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
#counts number of commas in each looped row
comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
#number of columns in each row
tot_num=$((comma_num + 1))
#writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
if [ "$tot_num" == "$match_num" ]; then
echo $match >> same_alleles.txt
else
echo "no_match" >> same_alleles.txt
fi
done
#END
Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:
allele
4-39
1-46,1-46,1-46
4-39
4-4,4-4,4-4,4-4
3-23,3-23,3-23
3-21,3-21
4-34,4-34
3-33
4-4,4-4,4-4
4-59,4-59
3-23,3-23,3-23
1-45
1-46,1-46
3-23,3-23,3-23
4-61
1-8
3-7
4-4
4-59,4-59,4-59
1-18,1-18
3-21,3-21
3-23,3-23,3-23
3-23,3-23,3-23
3-30,3-30-3
4-39,4-39
4-61
2-70
4-38-2,4-38-2
1-69,1-69,1-69,1-69,1-69
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
1-18
3-7
1-69
4-30-4
4-39
1-69
1-69
4-39
3-23,3-23,3-23
4-39
2-5
3-30-3
4-59,4-59,4-59
3-21,3-21
4-59,4-59
3-9
4-59,4-59,4-59
4-31,4-31
1-46,1-46
1-46,1-46,1-46
5-51,5-51
3-48
4-31,4-31
3-7
4-61
4-59,4-59,4-59,4-61,4-61,4-61
4-38-2,4-38-2
3-21,3-21
1-69,1-69,1-69
3-23,3-23,3-23
4-59,4-59
3-48
3-48
1-46,1-46
3-23,3-23,3-23
3-30-3,3-30-3
1-46,1-46,1-46
3-64
3-73,3-73
4-4
1-18
3-7
1-46,1-46
1-3
4-61
2-70
4-59,4-59
5-51,5-51
3-49,3-49
4-4,4-4,4-4
4-31,4-31
1-69
1-69,1-69,1-69
4-39
3-21,3-21
3-33
3-9
3-48
4-59,4-59
4-59,4-59
4-39,4-39
3-21,3-21
1-18
My script takes ~ 7 sec to complete this.
Sorry for the long post and thank you in advance!
shell-script text-processing scripting arithmetic
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".
For example:
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
Writes a new file:
1-69
no_match
1-46
no_match
6-1
5-51
4-59
Deleting the second and fourth rows because they contain non-identical columns.
Here is my far from elegant script:
#!/bin/bash
ind=$1 #file in
num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
echo "alleles" > same_alleles.txt #new file to write to
#loop over every line of 'file in'
for (( i =2; i <= "$num"; i++));do
#take first column of row being looped over (string to check match of other columns with)
match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
#counts how many matches there are in the looped row
match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
#counts number of commas in each looped row
comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
#number of columns in each row
tot_num=$((comma_num + 1))
#writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
if [ "$tot_num" == "$match_num" ]; then
echo $match >> same_alleles.txt
else
echo "no_match" >> same_alleles.txt
fi
done
#END
Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:
allele
4-39
1-46,1-46,1-46
4-39
4-4,4-4,4-4,4-4
3-23,3-23,3-23
3-21,3-21
4-34,4-34
3-33
4-4,4-4,4-4
4-59,4-59
3-23,3-23,3-23
1-45
1-46,1-46
3-23,3-23,3-23
4-61
1-8
3-7
4-4
4-59,4-59,4-59
1-18,1-18
3-21,3-21
3-23,3-23,3-23
3-23,3-23,3-23
3-30,3-30-3
4-39,4-39
4-61
2-70
4-38-2,4-38-2
1-69,1-69,1-69,1-69,1-69
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
1-18
3-7
1-69
4-30-4
4-39
1-69
1-69
4-39
3-23,3-23,3-23
4-39
2-5
3-30-3
4-59,4-59,4-59
3-21,3-21
4-59,4-59
3-9
4-59,4-59,4-59
4-31,4-31
1-46,1-46
1-46,1-46,1-46
5-51,5-51
3-48
4-31,4-31
3-7
4-61
4-59,4-59,4-59,4-61,4-61,4-61
4-38-2,4-38-2
3-21,3-21
1-69,1-69,1-69
3-23,3-23,3-23
4-59,4-59
3-48
3-48
1-46,1-46
3-23,3-23,3-23
3-30-3,3-30-3
1-46,1-46,1-46
3-64
3-73,3-73
4-4
1-18
3-7
1-46,1-46
1-3
4-61
2-70
4-59,4-59
5-51,5-51
3-49,3-49
4-4,4-4,4-4
4-31,4-31
1-69
1-69,1-69,1-69
4-39
3-21,3-21
3-33
3-9
3-48
4-59,4-59
4-59,4-59
4-39,4-39
3-21,3-21
1-18
My script takes ~ 7 sec to complete this.
Sorry for the long post and thank you in advance!
shell-script text-processing scripting arithmetic
I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".
For example:
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
Writes a new file:
1-69
no_match
1-46
no_match
6-1
5-51
4-59
Deleting the second and fourth rows because they contain non-identical columns.
Here is my far from elegant script:
#!/bin/bash
ind=$1 #file in
num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'
echo "alleles" > same_alleles.txt #new file to write to
#loop over every line of 'file in'
for (( i =2; i <= "$num"; i++));do
#take first column of row being looped over (string to check match of other columns with)
match=`awk "FNR=="$i" print" "$ind"|cut -d, -f1`
#counts how many matches there are in the looped row
match_num=`awk "FNR=="$i" print" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`
#counts number of commas in each looped row
comma_num=`awk "FNR=="$i" print" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`
#number of columns in each row
tot_num=$((comma_num + 1))
#writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise
if [ "$tot_num" == "$match_num" ]; then
echo $match >> same_alleles.txt
else
echo "no_match" >> same_alleles.txt
fi
done
#END
Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:
allele
4-39
1-46,1-46,1-46
4-39
4-4,4-4,4-4,4-4
3-23,3-23,3-23
3-21,3-21
4-34,4-34
3-33
4-4,4-4,4-4
4-59,4-59
3-23,3-23,3-23
1-45
1-46,1-46
3-23,3-23,3-23
4-61
1-8
3-7
4-4
4-59,4-59,4-59
1-18,1-18
3-21,3-21
3-23,3-23,3-23
3-23,3-23,3-23
3-30,3-30-3
4-39,4-39
4-61
2-70
4-38-2,4-38-2
1-69,1-69,1-69,1-69,1-69
1-69
4-59,4-59,4-59,4-61,4-61,4-61
1-46,1-46
4-59,4-59,4-59,4-61,4-61,4-61
6-1,6-1
5-51,5-51
4-59,4-59
1-18
3-7
1-69
4-30-4
4-39
1-69
1-69
4-39
3-23,3-23,3-23
4-39
2-5
3-30-3
4-59,4-59,4-59
3-21,3-21
4-59,4-59
3-9
4-59,4-59,4-59
4-31,4-31
1-46,1-46
1-46,1-46,1-46
5-51,5-51
3-48
4-31,4-31
3-7
4-61
4-59,4-59,4-59,4-61,4-61,4-61
4-38-2,4-38-2
3-21,3-21
1-69,1-69,1-69
3-23,3-23,3-23
4-59,4-59
3-48
3-48
1-46,1-46
3-23,3-23,3-23
3-30-3,3-30-3
1-46,1-46,1-46
3-64
3-73,3-73
4-4
1-18
3-7
1-46,1-46
1-3
4-61
2-70
4-59,4-59
5-51,5-51
3-49,3-49
4-4,4-4,4-4
4-31,4-31
1-69
1-69,1-69,1-69
4-39
3-21,3-21
3-33
3-9
3-48
4-59,4-59
4-59,4-59
4-39,4-39
3-21,3-21
1-18
My script takes ~ 7 sec to complete this.
Sorry for the long post and thank you in advance!
shell-script text-processing scripting arithmetic
asked Aug 6 at 19:11
Johnny
452
452
add a comment |Â
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
5
down vote
accepted
$ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
1-69
no_match
1-46
no_match
6-1
5-51
4-59
I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk
three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk
, you don't need grep
and cut
as awk
would easily be able to do their tasks (which are not needed in this case though).
The awk
script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match
is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.
As a script:
#!/usr/bin/awk -f
BEGIN FS = ","
for (i=2; i<=NF; ++i)
if ($i != $1)
print "no_match"
next
print $1
FS
is the input field separator, also settable with the-F
option on the command line.awk
will split each line on this character to create the fields.NF
is the number of fields in the current record ("columns on the line").$i
refers the the i:th field in the current record, wherei
may be a variable or a constant (as in$1
).
Related:
- Why is using a shell loop to process text considered bad practice?
DRY variation:
#!/usr/bin/awk -f
BEGIN FS = ","
output = $1
for (i=2; i<=NF; ++i)
if ($i != output)
output = "no_match"
break
print output
add a comment |Â
up vote
1
down vote
Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.
awk -F',' '
eq=1;
for (i = 2; i <= NF; i++)
if ($1 != $i)
eq=0;
print eq ? $1 : "no_match";
' $1
add a comment |Â
up vote
1
down vote
With perl List::MoreUtils
, by evaluating the distinct
/ uniq
elements in scalar context:
perl -MList::MoreUtils=distinct -F, -lne '
print( (distinct @F) > 1 ? "no_match" : $F[0])
' example
1-69
no_match
1-46
no_match
6-1
5-51
4-59
add a comment |Â
up vote
1
down vote
You could do this using the sed
editor also, like as shown:
sed -e '
s/^([^,]*)(,1)*$/1/;t
s/.*/NOMATCH/
' input.csv
Here we rely on the regex
to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH
.
Explanation:
This is what goes on in my head when seeing this pbm:
Think of the comma-separated fields
as stones
of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.
Something like:
STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line
Now in terms of regex terminology, it becomes:
^ (STONEA) (,1) (,1) (,1) ... all the way to end of line
^ (STONEA) (,1)* $
Output:
1-69
NOMATCH
1-46
NOMATCH
6-1
5-51
4-59
considerc
for command two rather thans
- should be nominally quicker still. smart, though.
â mikeserv
Aug 7 at 3:31
1
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
$ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
1-69
no_match
1-46
no_match
6-1
5-51
4-59
I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk
three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk
, you don't need grep
and cut
as awk
would easily be able to do their tasks (which are not needed in this case though).
The awk
script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match
is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.
As a script:
#!/usr/bin/awk -f
BEGIN FS = ","
for (i=2; i<=NF; ++i)
if ($i != $1)
print "no_match"
next
print $1
FS
is the input field separator, also settable with the-F
option on the command line.awk
will split each line on this character to create the fields.NF
is the number of fields in the current record ("columns on the line").$i
refers the the i:th field in the current record, wherei
may be a variable or a constant (as in$1
).
Related:
- Why is using a shell loop to process text considered bad practice?
DRY variation:
#!/usr/bin/awk -f
BEGIN FS = ","
output = $1
for (i=2; i<=NF; ++i)
if ($i != output)
output = "no_match"
break
print output
add a comment |Â
up vote
5
down vote
accepted
$ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
1-69
no_match
1-46
no_match
6-1
5-51
4-59
I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk
three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk
, you don't need grep
and cut
as awk
would easily be able to do their tasks (which are not needed in this case though).
The awk
script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match
is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.
As a script:
#!/usr/bin/awk -f
BEGIN FS = ","
for (i=2; i<=NF; ++i)
if ($i != $1)
print "no_match"
next
print $1
FS
is the input field separator, also settable with the-F
option on the command line.awk
will split each line on this character to create the fields.NF
is the number of fields in the current record ("columns on the line").$i
refers the the i:th field in the current record, wherei
may be a variable or a constant (as in$1
).
Related:
- Why is using a shell loop to process text considered bad practice?
DRY variation:
#!/usr/bin/awk -f
BEGIN FS = ","
output = $1
for (i=2; i<=NF; ++i)
if ($i != output)
output = "no_match"
break
print output
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
$ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
1-69
no_match
1-46
no_match
6-1
5-51
4-59
I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk
three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk
, you don't need grep
and cut
as awk
would easily be able to do their tasks (which are not needed in this case though).
The awk
script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match
is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.
As a script:
#!/usr/bin/awk -f
BEGIN FS = ","
for (i=2; i<=NF; ++i)
if ($i != $1)
print "no_match"
next
print $1
FS
is the input field separator, also settable with the-F
option on the command line.awk
will split each line on this character to create the fields.NF
is the number of fields in the current record ("columns on the line").$i
refers the the i:th field in the current record, wherei
may be a variable or a constant (as in$1
).
Related:
- Why is using a shell loop to process text considered bad practice?
DRY variation:
#!/usr/bin/awk -f
BEGIN FS = ","
output = $1
for (i=2; i<=NF; ++i)
if ($i != output)
output = "no_match"
break
print output
$ awk -F, ' for (i=2; i<=NF; ++i) if ($i != $1) print "no_match"; next print $1 ' file
1-69
no_match
1-46
no_match
6-1
5-51
4-59
I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk
three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk
, you don't need grep
and cut
as awk
would easily be able to do their tasks (which are not needed in this case though).
The awk
script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match
is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.
As a script:
#!/usr/bin/awk -f
BEGIN FS = ","
for (i=2; i<=NF; ++i)
if ($i != $1)
print "no_match"
next
print $1
FS
is the input field separator, also settable with the-F
option on the command line.awk
will split each line on this character to create the fields.NF
is the number of fields in the current record ("columns on the line").$i
refers the the i:th field in the current record, wherei
may be a variable or a constant (as in$1
).
Related:
- Why is using a shell loop to process text considered bad practice?
DRY variation:
#!/usr/bin/awk -f
BEGIN FS = ","
output = $1
for (i=2; i<=NF; ++i)
if ($i != output)
output = "no_match"
break
print output
edited Aug 6 at 19:56
answered Aug 6 at 19:15
Kusalananda
102k13199314
102k13199314
add a comment |Â
add a comment |Â
up vote
1
down vote
Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.
awk -F',' '
eq=1;
for (i = 2; i <= NF; i++)
if ($1 != $i)
eq=0;
print eq ? $1 : "no_match";
' $1
add a comment |Â
up vote
1
down vote
Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.
awk -F',' '
eq=1;
for (i = 2; i <= NF; i++)
if ($1 != $i)
eq=0;
print eq ? $1 : "no_match";
' $1
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.
awk -F',' '
eq=1;
for (i = 2; i <= NF; i++)
if ($1 != $i)
eq=0;
print eq ? $1 : "no_match";
' $1
Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.
awk -F',' '
eq=1;
for (i = 2; i <= NF; i++)
if ($1 != $i)
eq=0;
print eq ? $1 : "no_match";
' $1
answered Aug 6 at 19:21
RalfFriedl
1,479112
1,479112
add a comment |Â
add a comment |Â
up vote
1
down vote
With perl List::MoreUtils
, by evaluating the distinct
/ uniq
elements in scalar context:
perl -MList::MoreUtils=distinct -F, -lne '
print( (distinct @F) > 1 ? "no_match" : $F[0])
' example
1-69
no_match
1-46
no_match
6-1
5-51
4-59
add a comment |Â
up vote
1
down vote
With perl List::MoreUtils
, by evaluating the distinct
/ uniq
elements in scalar context:
perl -MList::MoreUtils=distinct -F, -lne '
print( (distinct @F) > 1 ? "no_match" : $F[0])
' example
1-69
no_match
1-46
no_match
6-1
5-51
4-59
add a comment |Â
up vote
1
down vote
up vote
1
down vote
With perl List::MoreUtils
, by evaluating the distinct
/ uniq
elements in scalar context:
perl -MList::MoreUtils=distinct -F, -lne '
print( (distinct @F) > 1 ? "no_match" : $F[0])
' example
1-69
no_match
1-46
no_match
6-1
5-51
4-59
With perl List::MoreUtils
, by evaluating the distinct
/ uniq
elements in scalar context:
perl -MList::MoreUtils=distinct -F, -lne '
print( (distinct @F) > 1 ? "no_match" : $F[0])
' example
1-69
no_match
1-46
no_match
6-1
5-51
4-59
edited Aug 6 at 20:34
answered Aug 6 at 20:29
steeldriver
31.2k34978
31.2k34978
add a comment |Â
add a comment |Â
up vote
1
down vote
You could do this using the sed
editor also, like as shown:
sed -e '
s/^([^,]*)(,1)*$/1/;t
s/.*/NOMATCH/
' input.csv
Here we rely on the regex
to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH
.
Explanation:
This is what goes on in my head when seeing this pbm:
Think of the comma-separated fields
as stones
of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.
Something like:
STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line
Now in terms of regex terminology, it becomes:
^ (STONEA) (,1) (,1) (,1) ... all the way to end of line
^ (STONEA) (,1)* $
Output:
1-69
NOMATCH
1-46
NOMATCH
6-1
5-51
4-59
considerc
for command two rather thans
- should be nominally quicker still. smart, though.
â mikeserv
Aug 7 at 3:31
1
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
add a comment |Â
up vote
1
down vote
You could do this using the sed
editor also, like as shown:
sed -e '
s/^([^,]*)(,1)*$/1/;t
s/.*/NOMATCH/
' input.csv
Here we rely on the regex
to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH
.
Explanation:
This is what goes on in my head when seeing this pbm:
Think of the comma-separated fields
as stones
of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.
Something like:
STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line
Now in terms of regex terminology, it becomes:
^ (STONEA) (,1) (,1) (,1) ... all the way to end of line
^ (STONEA) (,1)* $
Output:
1-69
NOMATCH
1-46
NOMATCH
6-1
5-51
4-59
considerc
for command two rather thans
- should be nominally quicker still. smart, though.
â mikeserv
Aug 7 at 3:31
1
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You could do this using the sed
editor also, like as shown:
sed -e '
s/^([^,]*)(,1)*$/1/;t
s/.*/NOMATCH/
' input.csv
Here we rely on the regex
to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH
.
Explanation:
This is what goes on in my head when seeing this pbm:
Think of the comma-separated fields
as stones
of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.
Something like:
STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line
Now in terms of regex terminology, it becomes:
^ (STONEA) (,1) (,1) (,1) ... all the way to end of line
^ (STONEA) (,1)* $
Output:
1-69
NOMATCH
1-46
NOMATCH
6-1
5-51
4-59
You could do this using the sed
editor also, like as shown:
sed -e '
s/^([^,]*)(,1)*$/1/;t
s/.*/NOMATCH/
' input.csv
Here we rely on the regex
to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH
.
Explanation:
This is what goes on in my head when seeing this pbm:
Think of the comma-separated fields
as stones
of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.
Something like:
STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line
Now in terms of regex terminology, it becomes:
^ (STONEA) (,1) (,1) (,1) ... all the way to end of line
^ (STONEA) (,1)* $
Output:
1-69
NOMATCH
1-46
NOMATCH
6-1
5-51
4-59
edited Aug 7 at 3:08
answered Aug 7 at 2:55
Rakesh Sharma
37813
37813
considerc
for command two rather thans
- should be nominally quicker still. smart, though.
â mikeserv
Aug 7 at 3:31
1
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
add a comment |Â
considerc
for command two rather thans
- should be nominally quicker still. smart, though.
â mikeserv
Aug 7 at 3:31
1
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
consider
c
for command two rather than s
- should be nominally quicker still. smart, though.â mikeserv
Aug 7 at 3:31
consider
c
for command two rather than s
- should be nominally quicker still. smart, though.â mikeserv
Aug 7 at 3:31
1
1
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
@mikeserv Thank you mike for your gracious words.I feel delighted.
â Rakesh Sharma
Aug 8 at 5:08
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f460887%2fspeed-up-script-that-determines-if-all-columns-in-a-row-are-the-same-or-not%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password