Is it safe to let a user type a regex as a search input?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
34
down vote

favorite
5












I was in a mall a few days ago and I searched for a shop on an indication panel.



Out of curiosity, I tried a search with (.+) and was a bit surprised to get the list of all the shops in the mall.



I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).



Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)







share|improve this question

















  • 10




    If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
    – gowenfawr
    yesterday






  • 37




    I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
    – gowenfawr
    yesterday







  • 7




    Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
    – Daniel
    18 hours ago






  • 6




    Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
    – Bent
    16 hours ago







  • 9




    It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
    – jpa
    15 hours ago
















up vote
34
down vote

favorite
5












I was in a mall a few days ago and I searched for a shop on an indication panel.



Out of curiosity, I tried a search with (.+) and was a bit surprised to get the list of all the shops in the mall.



I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).



Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)







share|improve this question

















  • 10




    If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
    – gowenfawr
    yesterday






  • 37




    I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
    – gowenfawr
    yesterday







  • 7




    Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
    – Daniel
    18 hours ago






  • 6




    Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
    – Bent
    16 hours ago







  • 9




    It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
    – jpa
    15 hours ago












up vote
34
down vote

favorite
5









up vote
34
down vote

favorite
5






5





I was in a mall a few days ago and I searched for a shop on an indication panel.



Out of curiosity, I tried a search with (.+) and was a bit surprised to get the list of all the shops in the mall.



I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).



Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)







share|improve this question













I was in a mall a few days ago and I searched for a shop on an indication panel.



Out of curiosity, I tried a search with (.+) and was a bit surprised to get the list of all the shops in the mall.



I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).



Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)









share|improve this question












share|improve this question




share|improve this question








edited 7 hours ago
























asked yesterday









Xavier59

1,2861525




1,2861525







  • 10




    If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
    – gowenfawr
    yesterday






  • 37




    I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
    – gowenfawr
    yesterday







  • 7




    Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
    – Daniel
    18 hours ago






  • 6




    Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
    – Bent
    16 hours ago







  • 9




    It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
    – jpa
    15 hours ago












  • 10




    If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
    – gowenfawr
    yesterday






  • 37




    I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
    – gowenfawr
    yesterday







  • 7




    Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
    – Daniel
    18 hours ago






  • 6




    Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
    – Bent
    16 hours ago







  • 9




    It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
    – jpa
    15 hours ago







10




10




If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
yesterday




If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
yesterday




37




37




I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
yesterday





I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
yesterday





7




7




Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
18 hours ago




Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
18 hours ago




6




6




Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
16 hours ago





Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
16 hours ago





9




9




It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
15 hours ago




It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
15 hours ago










4 Answers
4






active

oldest

votes

















up vote
35
down vote



accepted










I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.



Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.



In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.






share|improve this answer



















  • 3




    Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
    – Bob
    22 hours ago






  • 4




    @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
    – Boris the Spider
    18 hours ago










  • Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
    – JimmyJames
    10 hours ago










  • @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
    – Nat
    10 hours ago







  • 1




    Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
    – Philipp
    9 hours ago


















up vote
6
down vote













The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.



Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.



That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)






share|improve this answer

















  • 4




    This doesn't cover DoS attacks via, for example, catastrophic backtracking.
    – Boris the Spider
    18 hours ago






  • 2




    @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
    – AJ Henderson
    12 hours ago


















up vote
3
down vote













As the other answers have pointed out, the attack vector would most possibly be the regex engine.



While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:



CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:




A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.




But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.



The other aspect, albeit not inherently technical, would be the (.+) case you mentioned: Should the product allow arbitrary data retrieval?






share|improve this answer






























    up vote
    3
    down vote













    The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.



    Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.



    https://www.regular-expressions.info/catastrophic.html






    share|improve this answer





















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "162"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f191017%2fis-it-safe-to-let-a-user-type-a-regex-as-a-search-input%23new-answer', 'question_page');

      );

      Post as a guest






























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      35
      down vote



      accepted










      I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.



      Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.



      In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.






      share|improve this answer



















      • 3




        Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
        – Bob
        22 hours ago






      • 4




        @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
        – Boris the Spider
        18 hours ago










      • Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
        – JimmyJames
        10 hours ago










      • @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
        – Nat
        10 hours ago







      • 1




        Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
        – Philipp
        9 hours ago















      up vote
      35
      down vote



      accepted










      I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.



      Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.



      In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.






      share|improve this answer



















      • 3




        Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
        – Bob
        22 hours ago






      • 4




        @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
        – Boris the Spider
        18 hours ago










      • Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
        – JimmyJames
        10 hours ago










      • @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
        – Nat
        10 hours ago







      • 1




        Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
        – Philipp
        9 hours ago













      up vote
      35
      down vote



      accepted







      up vote
      35
      down vote



      accepted






      I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.



      Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.



      In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.






      share|improve this answer















      I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principal is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process. Most regex libraries are mature and part of the standard library in many languages which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution. That is to say it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.



      Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent. For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks. RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.



      In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.







      share|improve this answer















      share|improve this answer



      share|improve this answer








      edited yesterday


























      answered yesterday









      Ryan Jenkins

      32136




      32136







      • 3




        Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
        – Bob
        22 hours ago






      • 4




        @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
        – Boris the Spider
        18 hours ago










      • Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
        – JimmyJames
        10 hours ago










      • @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
        – Nat
        10 hours ago







      • 1




        Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
        – Philipp
        9 hours ago













      • 3




        Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
        – Bob
        22 hours ago






      • 4




        @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
        – Boris the Spider
        18 hours ago










      • Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
        – JimmyJames
        10 hours ago










      • @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
        – Nat
        10 hours ago







      • 1




        Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
        – Philipp
        9 hours ago








      3




      3




      Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
      – Bob
      22 hours ago




      Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
      – Bob
      22 hours ago




      4




      4




      @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
      – Boris the Spider
      18 hours ago




      @Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
      – Boris the Spider
      18 hours ago












      Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
      – JimmyJames
      10 hours ago




      Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
      – JimmyJames
      10 hours ago












      @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
      – Nat
      10 hours ago





      @BoristheSpider This StackOverflow question seems to provide a method for launching tasks with a time-out. Does that not work in this scenario?
      – Nat
      10 hours ago





      1




      1




      Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
      – Philipp
      9 hours ago





      Here is an example of a regular expression which takes exponential execution times on Java: (0*)*A
      – Philipp
      9 hours ago













      up vote
      6
      down vote













      The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.



      Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.



      That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)






      share|improve this answer

















      • 4




        This doesn't cover DoS attacks via, for example, catastrophic backtracking.
        – Boris the Spider
        18 hours ago






      • 2




        @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
        – AJ Henderson
        12 hours ago















      up vote
      6
      down vote













      The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.



      Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.



      That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)






      share|improve this answer

















      • 4




        This doesn't cover DoS attacks via, for example, catastrophic backtracking.
        – Boris the Spider
        18 hours ago






      • 2




        @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
        – AJ Henderson
        12 hours ago













      up vote
      6
      down vote










      up vote
      6
      down vote









      The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.



      Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.



      That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)






      share|improve this answer













      The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.



      Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.



      That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)







      share|improve this answer













      share|improve this answer



      share|improve this answer











      answered 23 hours ago









      AJ Henderson

      38.9k553104




      38.9k553104







      • 4




        This doesn't cover DoS attacks via, for example, catastrophic backtracking.
        – Boris the Spider
        18 hours ago






      • 2




        @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
        – AJ Henderson
        12 hours ago













      • 4




        This doesn't cover DoS attacks via, for example, catastrophic backtracking.
        – Boris the Spider
        18 hours ago






      • 2




        @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
        – AJ Henderson
        12 hours ago








      4




      4




      This doesn't cover DoS attacks via, for example, catastrophic backtracking.
      – Boris the Spider
      18 hours ago




      This doesn't cover DoS attacks via, for example, catastrophic backtracking.
      – Boris the Spider
      18 hours ago




      2




      2




      @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
      – AJ Henderson
      12 hours ago





      @boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
      – AJ Henderson
      12 hours ago











      up vote
      3
      down vote













      As the other answers have pointed out, the attack vector would most possibly be the regex engine.



      While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:



      CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
      Quote from the Patch notes:




      A memory corruption issue exists in WebKit's handling
      of regular expressions. Visiting a maliciously crafted website may
      lead to an unexpected application termination or arbitrary code
      execution.




      But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.



      The other aspect, albeit not inherently technical, would be the (.+) case you mentioned: Should the product allow arbitrary data retrieval?






      share|improve this answer



























        up vote
        3
        down vote













        As the other answers have pointed out, the attack vector would most possibly be the regex engine.



        While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:



        CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
        Quote from the Patch notes:




        A memory corruption issue exists in WebKit's handling
        of regular expressions. Visiting a maliciously crafted website may
        lead to an unexpected application termination or arbitrary code
        execution.




        But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.



        The other aspect, albeit not inherently technical, would be the (.+) case you mentioned: Should the product allow arbitrary data retrieval?






        share|improve this answer

























          up vote
          3
          down vote










          up vote
          3
          down vote









          As the other answers have pointed out, the attack vector would most possibly be the regex engine.



          While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:



          CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
          Quote from the Patch notes:




          A memory corruption issue exists in WebKit's handling
          of regular expressions. Visiting a maliciously crafted website may
          lead to an unexpected application termination or arbitrary code
          execution.




          But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.



          The other aspect, albeit not inherently technical, would be the (.+) case you mentioned: Should the product allow arbitrary data retrieval?






          share|improve this answer















          As the other answers have pointed out, the attack vector would most possibly be the regex engine.



          While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:



          CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
          Quote from the Patch notes:




          A memory corruption issue exists in WebKit's handling
          of regular expressions. Visiting a maliciously crafted website may
          lead to an unexpected application termination or arbitrary code
          execution.




          But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.



          The other aspect, albeit not inherently technical, would be the (.+) case you mentioned: Should the product allow arbitrary data retrieval?







          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited 11 hours ago









          Xavier59

          1,2861525




          1,2861525











          answered 15 hours ago









          PhilLab

          1313




          1313




















              up vote
              3
              down vote













              The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.



              Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.



              https://www.regular-expressions.info/catastrophic.html






              share|improve this answer

























                up vote
                3
                down vote













                The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.



                Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.



                https://www.regular-expressions.info/catastrophic.html






                share|improve this answer























                  up vote
                  3
                  down vote










                  up vote
                  3
                  down vote









                  The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.



                  Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.



                  https://www.regular-expressions.info/catastrophic.html






                  share|improve this answer













                  The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.



                  Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.



                  https://www.regular-expressions.info/catastrophic.html







                  share|improve this answer













                  share|improve this answer



                  share|improve this answer











                  answered 5 hours ago









                  Peter Green

                  3,73111421




                  3,73111421






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f191017%2fis-it-safe-to-let-a-user-type-a-regex-as-a-search-input%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      Greedy Best First Search implementation in Rust

                      Function to Return a JSON Like Objects Using VBA Collections and Arrays

                      C++11 CLH Lock Implementation