Is it safe to let a user type a regex as a search input?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
34
down vote
favorite
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
 |Â
show 8 more comments
up vote
34
down vote
favorite
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
10
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
â gowenfawr
yesterday
37
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
â gowenfawr
yesterday
7
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
â Daniel
18 hours ago
6
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
â Bent
16 hours ago
9
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
â jpa
15 hours ago
 |Â
show 8 more comments
up vote
34
down vote
favorite
up vote
34
down vote
favorite
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
edited 7 hours ago
asked yesterday
Xavier59
1,2861525
1,2861525
10
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
â gowenfawr
yesterday
37
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
â gowenfawr
yesterday
7
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
â Daniel
18 hours ago
6
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
â Bent
16 hours ago
9
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
â jpa
15 hours ago
 |Â
show 8 more comments
10
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
â gowenfawr
yesterday
37
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
â gowenfawr
yesterday
7
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
â Daniel
18 hours ago
6
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
â Bent
16 hours ago
9
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
â jpa
15 hours ago
10
10
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
â gowenfawr
yesterday
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
â gowenfawr
yesterday
37
37
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
â gowenfawr
yesterday
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
â gowenfawr
yesterday
7
7
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
â Daniel
18 hours ago
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
â Daniel
18 hours ago
6
6
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
â Bent
16 hours ago
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
â Bent
16 hours ago
9
9
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
â jpa
15 hours ago
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
â jpa
15 hours ago
 |Â
show 8 more comments
4 Answers
4
active
oldest
votes
up vote
35
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
3
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
4
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
1
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
â Philipp
9 hours ago
 |Â
show 1 more comment
up vote
6
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
4
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
2
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
add a comment |Â
up vote
3
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
add a comment |Â
up vote
3
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
35
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
3
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
4
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
1
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
â Philipp
9 hours ago
 |Â
show 1 more comment
up vote
35
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
3
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
4
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
1
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
â Philipp
9 hours ago
 |Â
show 1 more comment
up vote
35
down vote
accepted
up vote
35
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
edited yesterday
answered yesterday
Ryan Jenkins
32136
32136
3
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
4
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
1
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
â Philipp
9 hours ago
 |Â
show 1 more comment
3
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
4
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
1
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
â Philipp
9 hours ago
3
3
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
â Bob
22 hours ago
4
4
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
â Boris the Spider
18 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
â JimmyJames
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
@BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
â Nat
10 hours ago
1
1
Here is an example of a regular expression which takes exponential execution times on Java:
(0*)*A
â Philipp
9 hours ago
Here is an example of a regular expression which takes exponential execution times on Java:
(0*)*A
â Philipp
9 hours ago
 |Â
show 1 more comment
up vote
6
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
4
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
2
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
add a comment |Â
up vote
6
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
4
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
2
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
add a comment |Â
up vote
6
down vote
up vote
6
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
answered 23 hours ago
AJ Henderson
38.9k553104
38.9k553104
4
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
2
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
add a comment |Â
4
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
2
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
4
4
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
â Boris the Spider
18 hours ago
2
2
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
â AJ Henderson
12 hours ago
add a comment |Â
up vote
3
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
add a comment |Â
up vote
3
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
add a comment |Â
up vote
3
down vote
up vote
3
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
edited 11 hours ago
Xavier59
1,2861525
1,2861525
answered 15 hours ago
PhilLab
1313
1313
add a comment |Â
add a comment |Â
up vote
3
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
add a comment |Â
up vote
3
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
add a comment |Â
up vote
3
down vote
up vote
3
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
answered 5 hours ago
Peter Green
3,73111421
3,73111421
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f191017%2fis-it-safe-to-let-a-user-type-a-regex-as-a-search-input%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
10
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
â gowenfawr
yesterday
37
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
â gowenfawr
yesterday
7
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
â Daniel
18 hours ago
6
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
â Bent
16 hours ago
9
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
â jpa
15 hours ago