Regex for chess notations

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
0
down vote

favorite

I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.

I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).

My thoughts are

[A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.

[RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.

Am I understanding regex correct?

asked Aug 2 at 9:11

S.G.

217112

4

It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â€“Â RoToRa
Aug 2 at 9:25

Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â€“Â RoToRa
Aug 2 at 9:29

1

I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â€“Â JGNI
Aug 2 at 9:31

add a commentÂ |Â

up vote
0
down vote

favorite

I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.

I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).

My thoughts are

[A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.

[RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.

Am I understanding regex correct?

asked Aug 2 at 9:11

S.G.

217112

4

It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â€“Â RoToRa
Aug 2 at 9:25

Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â€“Â RoToRa
Aug 2 at 9:29

1

I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â€“Â JGNI
Aug 2 at 9:31

add a commentÂ |Â

up vote
0
down vote

favorite

I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.

I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).

My thoughts are

[A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.

[RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.

Am I understanding regex correct?

asked Aug 2 at 9:11

S.G.

217112

I have this regex (d?)[A-Za-z](?=(d?)(.*))(?=(.* )5). It would be used by me for chess FEN strings.

I wounder if regex [A-Za-z] is faster than [RNBQKrnbqk]? I only need to check the given letters (but no other letters will appear).

My thoughts are

[A-Za-z] the regex engine can match if a char is 65 <= c <= 90 or 97 <= c <= 122. Worst case 4 comparisons.

[RNBQKrnbqk] the engine checks if the inspected char equals every given character in the group. worst case 10 comparisons.

Am I understanding regex correct?

asked Aug 2 at 9:11

S.G.

217112

asked Aug 2 at 9:11

S.G.

217112

asked Aug 2 at 9:11

S.G.

217112

asked Aug 2 at 9:11

S.G.

217112

4

It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â€“Â RoToRa
Aug 2 at 9:25

Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â€“Â RoToRa
Aug 2 at 9:29

1

I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â€“Â JGNI
Aug 2 at 9:31

add a commentÂ |Â

4

It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â€“Â RoToRa
Aug 2 at 9:25

Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â€“Â RoToRa
Aug 2 at 9:29

1

I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â€“Â JGNI
Aug 2 at 9:31

It's impossible to say, because It would depend on the implementation details of the regex engine. In any case, it's unlikely that this is something you would need to worry about. If you have noticeable performance issues because of this (you'd need to be processing millions of inputs a second), then you probably should implement a simple parser optimized to this specific task instead of using a regex.
â€“Â RoToRa
Aug 2 at 9:25

Also for any practical use, you'd have to verify the character anyway somewhere down the line, so why not do it here?
â€“Â RoToRa
Aug 2 at 9:29

I think you are looking to optimise in the wrong place, with the speed of modern computers the difference between [A-Za-z] and [RNBQKrnbqk] wont be noticeable. I'd take a look at your lookaheads (?=(d?)(.*)) can match 0 or more of any character (d?) matching 0 or 1 number so if there is no number .* matches. Also the 5 in (?=(.* )5) is redundant as .* can match nothing to the end of the input so 5 can match 5 lots of nothing
â€“Â JGNI
Aug 2 at 9:31

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
4
down vote

No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.

You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:

(d?)[a-z](?=(d?)(.*))(?=(.* )5)

has 64 matches generated in 14323 steps

(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)

has 32 matches generated in 7512 steps

and the pattern from chess.stackexchange:

([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])

has 2 matches generated in 95 steps

Note that I have enabled the ignorecase flag in all three of them.

edited Aug 2 at 9:34

answered Aug 2 at 9:28

hjpotter92

4,89611538

slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â€“Â S.G.
Aug 2 at 12:52

also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â€“Â S.G.
Aug 2 at 12:52

@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
â€“Â hjpotter92
Aug 2 at 13:39

yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â€“Â S.G.
Aug 2 at 14:35

regex101.com/r/ykc7s9/9
â€“Â S.G.
Aug 2 at 14:40

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200796%2fregex-for-chess-notations%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
4
down vote

No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.

You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:

(d?)[a-z](?=(d?)(.*))(?=(.* )5)

has 64 matches generated in 14323 steps

(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)

has 32 matches generated in 7512 steps

and the pattern from chess.stackexchange:

([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])

has 2 matches generated in 95 steps

Note that I have enabled the ignorecase flag in all three of them.

edited Aug 2 at 9:34

answered Aug 2 at 9:28

hjpotter92

4,89611538

slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â€“Â S.G.
Aug 2 at 12:52

also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â€“Â S.G.
Aug 2 at 12:52

@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
â€“Â hjpotter92
Aug 2 at 13:39

yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â€“Â S.G.
Aug 2 at 14:35

regex101.com/r/ykc7s9/9
â€“Â S.G.
Aug 2 at 14:40

add a commentÂ |Â

up vote
4
down vote

No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.

You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:

(d?)[a-z](?=(d?)(.*))(?=(.* )5)

has 64 matches generated in 14323 steps

(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)

has 32 matches generated in 7512 steps

and the pattern from chess.stackexchange:

([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])

has 2 matches generated in 95 steps

Note that I have enabled the ignorecase flag in all three of them.

edited Aug 2 at 9:34

answered Aug 2 at 9:28

hjpotter92

4,89611538

slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â€“Â S.G.
Aug 2 at 12:52

also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â€“Â S.G.
Aug 2 at 12:52

@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
â€“Â hjpotter92
Aug 2 at 13:39

yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â€“Â S.G.
Aug 2 at 14:35

regex101.com/r/ykc7s9/9
â€“Â S.G.
Aug 2 at 14:40

add a commentÂ |Â

up vote
4
down vote

No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.

You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:

(d?)[a-z](?=(d?)(.*))(?=(.* )5)

has 64 matches generated in 14323 steps

(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)

has 32 matches generated in 7512 steps

and the pattern from chess.stackexchange:

([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])

has 2 matches generated in 95 steps

Note that I have enabled the ignorecase flag in all three of them.

edited Aug 2 at 9:34

answered Aug 2 at 9:28

hjpotter92

4,89611538

No. The matching for a-zA-Z would be slower than the exact character-set you supply: RNBQKrnbqk.

You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:

(d?)[a-z](?=(d?)(.*))(?=(.* )5)

has 64 matches generated in 14323 steps

(d?)[rnbqk](?=(d?)(.*))(?=(.* )5)

has 32 matches generated in 7512 steps

and the pattern from chess.stackexchange:

([rnbqkp1-8]+/)7([rnbqkp1-8]+)s[bw]s(-|K?Q?k?q?)s(-|[a-h][36])

has 2 matches generated in 95 steps

Note that I have enabled the ignorecase flag in all three of them.

edited Aug 2 at 9:34

answered Aug 2 at 9:28

hjpotter92

4,89611538

edited Aug 2 at 9:34

answered Aug 2 at 9:28

hjpotter92

4,89611538

answered Aug 2 at 9:28

hjpotter92

4,89611538

answered Aug 2 at 9:28

hjpotter92

4,89611538

slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â€“Â S.G.
Aug 2 at 12:52

also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â€“Â S.G.
Aug 2 at 12:52

@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
â€“Â hjpotter92
Aug 2 at 13:39

yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â€“Â S.G.
Aug 2 at 14:35

regex101.com/r/ykc7s9/9
â€“Â S.G.
Aug 2 at 14:40

add a commentÂ |Â

slight improvement to your code: ([rnbqkp1-8]+/)7([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36])
â€“Â S.G.
Aug 2 at 12:52

also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â€“Â S.G.
Aug 2 at 12:52

@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
â€“Â hjpotter92
Aug 2 at 13:39

yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â€“Â S.G.
Aug 2 at 14:35

regex101.com/r/ykc7s9/9
â€“Â S.G.
Aug 2 at 14:40

also "d+ d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2
â€“Â S.G.
Aug 2 at 12:52

@S.G. Adding multiple cases KQkq checks defeats the purpose of K?Q?k?q? pattern.
â€“Â hjpotter92
Aug 2 at 13:39

yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid
â€“Â S.G.
Aug 2 at 14:35

regex101.com/r/ykc7s9/9
â€“Â S.G.
Aug 2 at 14:40

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr