Regex for chess notations

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.



I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).



My thoughts are



  1. [A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.


  2. [RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.


Am I understanding regex correct?







share|improve this question















  • 4




    It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
    – RoToRa
    Aug 2 at 9:25











  • Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
    – RoToRa
    Aug 2 at 9:29






  • 1




    I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
    – JGNI
    Aug 2 at 9:31
















up vote
0
down vote

favorite












I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.



I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).



My thoughts are



  1. [A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.


  2. [RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.


Am I understanding regex correct?







share|improve this question















  • 4




    It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
    – RoToRa
    Aug 2 at 9:25











  • Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
    – RoToRa
    Aug 2 at 9:29






  • 1




    I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
    – JGNI
    Aug 2 at 9:31












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.



I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).



My thoughts are



  1. [A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.


  2. [RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.


Am I understanding regex correct?







share|improve this question











I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.



I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).



My thoughts are



  1. [A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.


  2. [RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.


Am I understanding regex correct?









share|improve this question










share|improve this question




share|improve this question









asked Aug 2 at 9:11









S.G.

217112




217112







  • 4




    It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
    – RoToRa
    Aug 2 at 9:25











  • Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
    – RoToRa
    Aug 2 at 9:29






  • 1




    I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
    – JGNI
    Aug 2 at 9:31












  • 4




    It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
    – RoToRa
    Aug 2 at 9:25











  • Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
    – RoToRa
    Aug 2 at 9:29






  • 1




    I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
    – JGNI
    Aug 2 at 9:31







4




4




It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
– RoToRa
Aug 2 at 9:25





It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
– RoToRa
Aug 2 at 9:25













Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
– RoToRa
Aug 2 at 9:29




Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
– RoToRa
Aug 2 at 9:29




1




1




I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
– JGNI
Aug 2 at 9:31




I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
– JGNI
Aug 2 at 9:31










1 Answer
1






active

oldest

votes

















up vote
4
down vote













No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.



You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:



(d?)[a-z](?=(d?)(.*))(?=(.* )5)


has 64 matches generated in 14323 steps



(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)


has 32 matches generated in 7512 steps



and the pattern from chess.stackexchange:



([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])


has 2 matches generated in 95 steps




Note that I have enabled the ignorecase flag in all three of them.






share|improve this answer























  • slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
    – S.G.
    Aug 2 at 12:52










  • also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
    – S.G.
    Aug 2 at 12:52











  • @S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
    – hjpotter92
    Aug 2 at 13:39










  • yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
    – S.G.
    Aug 2 at 14:35











  • regex101.com/r/ykc7s9/9
    – S.G.
    Aug 2 at 14:40











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200796%2fregex-for-chess-notations%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
4
down vote













No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.



You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:



(d?)[a-z](?=(d?)(.*))(?=(.* )5)


has 64 matches generated in 14323 steps



(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)


has 32 matches generated in 7512 steps



and the pattern from chess.stackexchange:



([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])


has 2 matches generated in 95 steps




Note that I have enabled the ignorecase flag in all three of them.






share|improve this answer























  • slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
    – S.G.
    Aug 2 at 12:52










  • also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
    – S.G.
    Aug 2 at 12:52











  • @S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
    – hjpotter92
    Aug 2 at 13:39










  • yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
    – S.G.
    Aug 2 at 14:35











  • regex101.com/r/ykc7s9/9
    – S.G.
    Aug 2 at 14:40















up vote
4
down vote













No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.



You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:



(d?)[a-z](?=(d?)(.*))(?=(.* )5)


has 64 matches generated in 14323 steps



(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)


has 32 matches generated in 7512 steps



and the pattern from chess.stackexchange:



([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])


has 2 matches generated in 95 steps




Note that I have enabled the ignorecase flag in all three of them.






share|improve this answer























  • slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
    – S.G.
    Aug 2 at 12:52










  • also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
    – S.G.
    Aug 2 at 12:52











  • @S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
    – hjpotter92
    Aug 2 at 13:39










  • yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
    – S.G.
    Aug 2 at 14:35











  • regex101.com/r/ykc7s9/9
    – S.G.
    Aug 2 at 14:40













up vote
4
down vote










up vote
4
down vote









No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.



You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:



(d?)[a-z](?=(d?)(.*))(?=(.* )5)


has 64 matches generated in 14323 steps



(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)


has 32 matches generated in 7512 steps



and the pattern from chess.stackexchange:



([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])


has 2 matches generated in 95 steps




Note that I have enabled the ignorecase flag in all three of them.






share|improve this answer















No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.



You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:



(d?)[a-z](?=(d?)(.*))(?=(.* )5)


has 64 matches generated in 14323 steps



(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)


has 32 matches generated in 7512 steps



and the pattern from chess.stackexchange:



([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])


has 2 matches generated in 95 steps




Note that I have enabled the ignorecase flag in all three of them.







share|improve this answer















share|improve this answer



share|improve this answer








edited Aug 2 at 9:34


























answered Aug 2 at 9:28









hjpotter92

4,89611538




4,89611538











  • slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
    – S.G.
    Aug 2 at 12:52










  • also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
    – S.G.
    Aug 2 at 12:52











  • @S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
    – hjpotter92
    Aug 2 at 13:39










  • yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
    – S.G.
    Aug 2 at 14:35











  • regex101.com/r/ykc7s9/9
    – S.G.
    Aug 2 at 14:40

















  • slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
    – S.G.
    Aug 2 at 12:52










  • also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
    – S.G.
    Aug 2 at 12:52











  • @S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
    – hjpotter92
    Aug 2 at 13:39










  • yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
    – S.G.
    Aug 2 at 14:35











  • regex101.com/r/ykc7s9/9
    – S.G.
    Aug 2 at 14:40
















slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
– S.G.
Aug 2 at 12:52




slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
– S.G.
Aug 2 at 12:52












also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
– S.G.
Aug 2 at 12:52





also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
– S.G.
Aug 2 at 12:52













@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
– hjpotter92
Aug 2 at 13:39




@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
– hjpotter92
Aug 2 at 13:39












yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
– S.G.
Aug 2 at 14:35





yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
– S.G.
Aug 2 at 14:35













regex101.com/r/ykc7s9/9
– S.G.
Aug 2 at 14:40





regex101.com/r/ykc7s9/9
– S.G.
Aug 2 at 14:40













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200796%2fregex-for-chess-notations%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation