Regex for chess notations
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
0
down vote
favorite
I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5)
. It would be used by me for chess FEN strings.
I wounder if regex [A-Za-z]
is faster than [RNBQKrnbqk]
? I only need to check the given letters (but no other letters will appear).
My thoughts are
[A-Za-z]
the regex engine can match if a char is65 <= c <= 90
or97 <= c <= 122
. Worst case 4 comparisons.[RNBQKrnbqk]
the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.
Am I understanding regex correct?
performance regex
add a comment |Â
up vote
0
down vote
favorite
I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5)
. It would be used by me for chess FEN strings.
I wounder if regex [A-Za-z]
is faster than [RNBQKrnbqk]
? I only need to check the given letters (but no other letters will appear).
My thoughts are
[A-Za-z]
the regex engine can match if a char is65 <= c <= 90
or97 <= c <= 122
. Worst case 4 comparisons.[RNBQKrnbqk]
the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.
Am I understanding regex correct?
performance regex
4
It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â RoToRa
Aug 2 at 9:25
Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â RoToRa
Aug 2 at 9:29
1
I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between[A-Za-z]
and[RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â JGNI
Aug 2 at 9:31
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5)
. It would be used by me for chess FEN strings.
I wounder if regex [A-Za-z]
is faster than [RNBQKrnbqk]
? I only need to check the given letters (but no other letters will appear).
My thoughts are
[A-Za-z]
the regex engine can match if a char is65 <= c <= 90
or97 <= c <= 122
. Worst case 4 comparisons.[RNBQKrnbqk]
the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.
Am I understanding regex correct?
performance regex
I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5)
. It would be used by me for chess FEN strings.
I wounder if regex [A-Za-z]
is faster than [RNBQKrnbqk]
? I only need to check the given letters (but no other letters will appear).
My thoughts are
[A-Za-z]
the regex engine can match if a char is65 <= c <= 90
or97 <= c <= 122
. Worst case 4 comparisons.[RNBQKrnbqk]
the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.
Am I understanding regex correct?
performance regex
asked Aug 2 at 9:11
S.G.
217112
217112
4
It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â RoToRa
Aug 2 at 9:25
Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â RoToRa
Aug 2 at 9:29
1
I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between[A-Za-z]
and[RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â JGNI
Aug 2 at 9:31
add a comment |Â
4
It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â RoToRa
Aug 2 at 9:25
Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â RoToRa
Aug 2 at 9:29
1
I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between[A-Za-z]
and[RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â JGNI
Aug 2 at 9:31
4
4
It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â RoToRa
Aug 2 at 9:25
It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â RoToRa
Aug 2 at 9:25
Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â RoToRa
Aug 2 at 9:29
Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â RoToRa
Aug 2 at 9:29
1
1
I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between
[A-Za-z]
and [RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothingâ JGNI
Aug 2 at 9:31
I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between
[A-Za-z]
and [RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothingâ JGNI
Aug 2 at 9:31
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
No. The matching for a-zA-Z
would be slower than the exact character-set you supply: RNBQKrnbqk
.
You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:
(d?)[a-z](?=(d?)(.*))(?=(.* )5)
has 64 matches generated in 14323 steps
(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)
has 32 matches generated in 7512 steps
and the pattern from chess.stackexchange:
([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])
has 2 matches generated in 95 steps
Note that I have enabled the ignorecase
flag in all three of them.
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
@S.G. Adding multiple casesKQkq
checks defeats the purpose ofK?Q?k?q?
pattern.
â hjpotter92
Aug 2 at 13:39
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
No. The matching for a-zA-Z
would be slower than the exact character-set you supply: RNBQKrnbqk
.
You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:
(d?)[a-z](?=(d?)(.*))(?=(.* )5)
has 64 matches generated in 14323 steps
(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)
has 32 matches generated in 7512 steps
and the pattern from chess.stackexchange:
([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])
has 2 matches generated in 95 steps
Note that I have enabled the ignorecase
flag in all three of them.
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
@S.G. Adding multiple casesKQkq
checks defeats the purpose ofK?Q?k?q?
pattern.
â hjpotter92
Aug 2 at 13:39
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
add a comment |Â
up vote
4
down vote
No. The matching for a-zA-Z
would be slower than the exact character-set you supply: RNBQKrnbqk
.
You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:
(d?)[a-z](?=(d?)(.*))(?=(.* )5)
has 64 matches generated in 14323 steps
(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)
has 32 matches generated in 7512 steps
and the pattern from chess.stackexchange:
([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])
has 2 matches generated in 95 steps
Note that I have enabled the ignorecase
flag in all three of them.
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
@S.G. Adding multiple casesKQkq
checks defeats the purpose ofK?Q?k?q?
pattern.
â hjpotter92
Aug 2 at 13:39
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
add a comment |Â
up vote
4
down vote
up vote
4
down vote
No. The matching for a-zA-Z
would be slower than the exact character-set you supply: RNBQKrnbqk
.
You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:
(d?)[a-z](?=(d?)(.*))(?=(.* )5)
has 64 matches generated in 14323 steps
(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)
has 32 matches generated in 7512 steps
and the pattern from chess.stackexchange:
([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])
has 2 matches generated in 95 steps
Note that I have enabled the ignorecase
flag in all three of them.
No. The matching for a-zA-Z
would be slower than the exact character-set you supply: RNBQKrnbqk
.
You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:
(d?)[a-z](?=(d?)(.*))(?=(.* )5)
has 64 matches generated in 14323 steps
(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)
has 32 matches generated in 7512 steps
and the pattern from chess.stackexchange:
([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])
has 2 matches generated in 95 steps
Note that I have enabled the ignorecase
flag in all three of them.
edited Aug 2 at 9:34
answered Aug 2 at 9:28
hjpotter92
4,89611538
4,89611538
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
@S.G. Adding multiple casesKQkq
checks defeats the purpose ofK?Q?k?q?
pattern.
â hjpotter92
Aug 2 at 13:39
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
add a comment |Â
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
@S.G. Adding multiple casesKQkq
checks defeats the purpose ofK?Q?k?q?
pattern.
â hjpotter92
Aug 2 at 13:39
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â S.G.
Aug 2 at 12:52
@S.G. Adding multiple cases
KQkq
checks defeats the purpose of K?Q?k?q?
pattern.â hjpotter92
Aug 2 at 13:39
@S.G. Adding multiple cases
KQkq
checks defeats the purpose of K?Q?k?q?
pattern.â hjpotter92
Aug 2 at 13:39
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â S.G.
Aug 2 at 14:35
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
regex101.com/r/ykc7s9/9
â S.G.
Aug 2 at 14:40
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200796%2fregex-for-chess-notations%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
4
It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â RoToRa
Aug 2 at 9:25
Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â RoToRa
Aug 2 at 9:29
1
I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between
[A-Za-z]
and[RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothingâ JGNI
Aug 2 at 9:31