Functions to convert camelCase strings to snake_case

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
22
down vote

favorite
7












I have written two functions, one actual and one helper, to convert camelCase strings into snake_case strings (I call it joint-lower case). Any ideas for improvement are welcome.



def _cc2jl(string):
"""Camel case to joint-lower helper."""

for index, current in enumerate(string):

if is_upper(current):
if index > 0:
previous = string[index-1]

if is_upper(previous):
try:
next = string[index+1]
except IndexError:
yield current.lower()
continue
else:
if is_upper(next):
yield current.lower()
continue

yield '_' + current.lower()
continue

yield current.lower()
continue

yield current


def cc2jl(string):
"""Converts a camel case string to joint-lower."""

return ''.join(_cc2jl(string))


Expected behaviour:



>>> cc2jl('m')
'm'
>>> cc2jl('AA')
'aa'
>>> cc2jl('MySQLDatabase')
'my_sql_database'
>>> cc2jl('GarbageCollection')
'garbage_collection'
>>> cc2jl('AAAAAAAAAAAAAAAAA')
'aaaaaaaaaaaaaaaaa'






share|improve this question



























    up vote
    22
    down vote

    favorite
    7












    I have written two functions, one actual and one helper, to convert camelCase strings into snake_case strings (I call it joint-lower case). Any ideas for improvement are welcome.



    def _cc2jl(string):
    """Camel case to joint-lower helper."""

    for index, current in enumerate(string):

    if is_upper(current):
    if index > 0:
    previous = string[index-1]

    if is_upper(previous):
    try:
    next = string[index+1]
    except IndexError:
    yield current.lower()
    continue
    else:
    if is_upper(next):
    yield current.lower()
    continue

    yield '_' + current.lower()
    continue

    yield current.lower()
    continue

    yield current


    def cc2jl(string):
    """Converts a camel case string to joint-lower."""

    return ''.join(_cc2jl(string))


    Expected behaviour:



    >>> cc2jl('m')
    'm'
    >>> cc2jl('AA')
    'aa'
    >>> cc2jl('MySQLDatabase')
    'my_sql_database'
    >>> cc2jl('GarbageCollection')
    'garbage_collection'
    >>> cc2jl('AAAAAAAAAAAAAAAAA')
    'aaaaaaaaaaaaaaaaa'






    share|improve this question























      up vote
      22
      down vote

      favorite
      7









      up vote
      22
      down vote

      favorite
      7






      7





      I have written two functions, one actual and one helper, to convert camelCase strings into snake_case strings (I call it joint-lower case). Any ideas for improvement are welcome.



      def _cc2jl(string):
      """Camel case to joint-lower helper."""

      for index, current in enumerate(string):

      if is_upper(current):
      if index > 0:
      previous = string[index-1]

      if is_upper(previous):
      try:
      next = string[index+1]
      except IndexError:
      yield current.lower()
      continue
      else:
      if is_upper(next):
      yield current.lower()
      continue

      yield '_' + current.lower()
      continue

      yield current.lower()
      continue

      yield current


      def cc2jl(string):
      """Converts a camel case string to joint-lower."""

      return ''.join(_cc2jl(string))


      Expected behaviour:



      >>> cc2jl('m')
      'm'
      >>> cc2jl('AA')
      'aa'
      >>> cc2jl('MySQLDatabase')
      'my_sql_database'
      >>> cc2jl('GarbageCollection')
      'garbage_collection'
      >>> cc2jl('AAAAAAAAAAAAAAAAA')
      'aaaaaaaaaaaaaaaaa'






      share|improve this question













      I have written two functions, one actual and one helper, to convert camelCase strings into snake_case strings (I call it joint-lower case). Any ideas for improvement are welcome.



      def _cc2jl(string):
      """Camel case to joint-lower helper."""

      for index, current in enumerate(string):

      if is_upper(current):
      if index > 0:
      previous = string[index-1]

      if is_upper(previous):
      try:
      next = string[index+1]
      except IndexError:
      yield current.lower()
      continue
      else:
      if is_upper(next):
      yield current.lower()
      continue

      yield '_' + current.lower()
      continue

      yield current.lower()
      continue

      yield current


      def cc2jl(string):
      """Converts a camel case string to joint-lower."""

      return ''.join(_cc2jl(string))


      Expected behaviour:



      >>> cc2jl('m')
      'm'
      >>> cc2jl('AA')
      'aa'
      >>> cc2jl('MySQLDatabase')
      'my_sql_database'
      >>> cc2jl('GarbageCollection')
      'garbage_collection'
      >>> cc2jl('AAAAAAAAAAAAAAAAA')
      'aaaaaaaaaaaaaaaaa'








      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 25 at 23:06









      Jamal♦

      30.1k11114225




      30.1k11114225









      asked Jan 25 at 11:56









      Richard Neumann

      1,761620




      1,761620




















          4 Answers
          4






          active

          oldest

          votes

















          up vote
          12
          down vote



          accepted










          I think regex would be the easiest to do this with. You just need to find each word in the string.



          1. The name starts with an uppercase letter. [A-Z]. However due to the way the rest of the regex works, we can change this to ., so that we match all words, even ones that start with _.


          2. The word will either contain uppercase or lowercase letters, both with other characters - excluding _.




            • Uppercase:



              1. The word won't be lowercase or have a _. [^a-z_]+

              2. You don't want the last uppercase letter or a _. (?=[A-Z_])

              3. You want the above if it's the last in the string. (?=[A-Z_]|$)



            • Lowercase:



              1. The word won't be uppercase or have a _. [^A-Z_]+



          And so you can use:



          (.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))


          You then want to apply the following to these:




          1. Prepend the name with a single _, unless:



            • It's the first word in the name

            • The word already starts with a _


          2. Convert the word to lowercase


          Making:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()


          And so I'd use:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()

          REGEX = r'(.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))'

          def _cc2jl(string):
          return re.subn(REGEX, _jl_match, string)[0]





          share|improve this answer























          • Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
            – Graipher
            Jan 25 at 19:50







          • 2




            Here is a comparison of various regexes that could be used.
            – 200_success
            Jan 25 at 19:52










          • I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
            – Richard Neumann
            Jan 26 at 9:02






          • 1




            @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
            – Peilonrayz
            Jan 26 at 11:33







          • 2




            @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
            – Peilonrayz
            Jan 26 at 12:07

















          up vote
          16
          down vote













          A couple of things:



          1. Why 2 functions?

          2. Now that we established (1), you can get rid of the yields.
            yield is useful when you want to process stuff in chunks and do not want to wait for the entire population to be created first. This is not the case here.

          3. There are some continues that do make sense and some that are redundant. Yours are of the latter type.

          4. Nested if statements without corresponding else clauses can be merged with and if readability is not reduced.

          Putting all that together, we get the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          try:
          if my_str[i-1].islower() or my_str[i+1].islower():
          r += '_'
          except IndexError:
          pass
          r += letter.lower()
          return r

          print(cc2jl('m')) #-> m
          print(cc2jl('AA')) #-> aa
          print(cc2jl('MySQLDatabase')) #-> my_sql_database
          print(cc2jl('GarbageCollection')) #-> garbage_collection
          print(cc2jl('AAAAAAAAAAAAAAAAA')) #-> aaaaaaaaaaaaaaaaa



          Finally, some thoughts on avoiding the try block that bothers me personally.



          We can leverage from the short-circuiting used in Python's logical expressions and (re-)write the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          if my_str[i-1].islower() or (i != len(my_str)-1 and my_str[i+1].islower()):
          r += '_'
          r += letter.lower()
          return r


          Note the i != len(my_str)-1 on the left side of the and. If this returns False the my_str[i+1] is not evaluated at all and thus cannot raise the IndexError.






          share|improve this answer























          • Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
            – Graipher
            Jan 25 at 14:07










          • @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
            – Ev. Kounis
            Jan 25 at 14:11










          • You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
            – Graipher
            Jan 25 at 14:15






          • 2




            @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
            – Ev. Kounis
            Jan 25 at 14:17

















          up vote
          12
          down vote













          First a note on naming: cc2jl is a very cryptic name. Give the public function a clearer name, like to_snake_case or something similar (note that the function does not do anything to a string that already is in snake_case).




          While I quite like the fact that you used a generator approach, which guarantees that you make only on pass over the string, this seems like the perfect place for a regular expression. Some quick googling finds this rather simple function:



          import re

          def convert(name):
          s1 = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
          return re.sub('([a-z0-9])([A-Z])', r'1_2', s1).lower()


          This is definitely shorter, even though that cryptic regex is not very easy to understand. But following your code, while written quite straightforward, is also not easy, especially with the deep nesting.



          It works by first splitting runs of upper-case letters followed by one or more lowercase letters so that the last upper-case letter goes to the lower-case letter run. So "ABCdef" becomes "AB_Cdef".



          Then it separates runs of not-upper-case letters, followed by a single upper-case letter with a "_", so "abcD" becomes "abc_D".



          To get a bit more performance out of this, you should pre-compile the regexes:



          first_cap_re = re.compile('(.)([A-Z][a-z]+)')
          all_cap_re = re.compile('([a-z0-9])([A-Z])')
          def convert(name):
          s1 = first_cap_re.sub(r'1_2', name)
          return all_cap_re.sub(r'1_2', s1).lower()



          Performance wise, all algorithms are similar:



          enter image description here



          Not quite surprising, they all give a linear behavior (as a function of string length).



          My regex version needs to make two passes over the string and is therefore consistently slower. The function written by @Ev.Kounis in his answer outperforms both of ours, but the regex approach by @Peilonrayz is even faster, because it manages to do only one pass, but with a regex.



          Note that all functions are very fast, so as long as you need this less than a few thousand times per second, any is fine.




          The test strings were generated with this code:



          import random
          import string

          strings = [''.join(random.choice(string.ascii_letters) for _ in range(n))
          for n in range(10, 2000, 20)]


          In Python 3 it could have been:



          strings = [''.join(random.choices(string.ascii_letters, k=k))
          for k in range(10, 2000, 20)]





          share|improve this answer























          • I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
            – Ev. Kounis
            Jan 25 at 14:08










          • @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
            – Graipher
            Jan 25 at 14:13











          • Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
            – Richard Neumann
            Jan 26 at 9:02


















          up vote
          1
          down vote













          It can be a bit shorter:



          def cc2jl(s):
          return "".join(["_"+l if i and l.isupper() and not s[i-1:i+2].isupper() else l for i, l in enumerate(s)]).lower()


          Regular Expression Alternative:



          rx = re.compile(r"(?<=.)(((?<![A-Z])[A-Z])|([A-Z](?=[a-z])))")
          def cc2jl(s):
          return rx.sub("_\1", s).lower()





          share|improve this answer























          • My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
            – M. Utku ALTINKAYA
            Jan 26 at 6:14






          • 1




            @Graipher fixed
            – M. Utku ALTINKAYA
            Jan 26 at 8:06






          • 1




            Added your function to the timings
            – Graipher
            Jan 26 at 9:12










          • "(...) compared to regex I think it is easier to read". Nope
            – ÑÒ¯Ï…к
            Jan 26 at 11:24










          • Thanks @Graipher, I've revised that code and added a regex alternative also
            – M. Utku ALTINKAYA
            Jan 26 at 12:11










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185966%2ffunctions-to-convert-camelcase-strings-to-snake-case%23new-answer', 'question_page');

          );

          Post as a guest






























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          12
          down vote



          accepted










          I think regex would be the easiest to do this with. You just need to find each word in the string.



          1. The name starts with an uppercase letter. [A-Z]. However due to the way the rest of the regex works, we can change this to ., so that we match all words, even ones that start with _.


          2. The word will either contain uppercase or lowercase letters, both with other characters - excluding _.




            • Uppercase:



              1. The word won't be lowercase or have a _. [^a-z_]+

              2. You don't want the last uppercase letter or a _. (?=[A-Z_])

              3. You want the above if it's the last in the string. (?=[A-Z_]|$)



            • Lowercase:



              1. The word won't be uppercase or have a _. [^A-Z_]+



          And so you can use:



          (.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))


          You then want to apply the following to these:




          1. Prepend the name with a single _, unless:



            • It's the first word in the name

            • The word already starts with a _


          2. Convert the word to lowercase


          Making:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()


          And so I'd use:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()

          REGEX = r'(.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))'

          def _cc2jl(string):
          return re.subn(REGEX, _jl_match, string)[0]





          share|improve this answer























          • Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
            – Graipher
            Jan 25 at 19:50







          • 2




            Here is a comparison of various regexes that could be used.
            – 200_success
            Jan 25 at 19:52










          • I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
            – Richard Neumann
            Jan 26 at 9:02






          • 1




            @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
            – Peilonrayz
            Jan 26 at 11:33







          • 2




            @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
            – Peilonrayz
            Jan 26 at 12:07














          up vote
          12
          down vote



          accepted










          I think regex would be the easiest to do this with. You just need to find each word in the string.



          1. The name starts with an uppercase letter. [A-Z]. However due to the way the rest of the regex works, we can change this to ., so that we match all words, even ones that start with _.


          2. The word will either contain uppercase or lowercase letters, both with other characters - excluding _.




            • Uppercase:



              1. The word won't be lowercase or have a _. [^a-z_]+

              2. You don't want the last uppercase letter or a _. (?=[A-Z_])

              3. You want the above if it's the last in the string. (?=[A-Z_]|$)



            • Lowercase:



              1. The word won't be uppercase or have a _. [^A-Z_]+



          And so you can use:



          (.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))


          You then want to apply the following to these:




          1. Prepend the name with a single _, unless:



            • It's the first word in the name

            • The word already starts with a _


          2. Convert the word to lowercase


          Making:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()


          And so I'd use:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()

          REGEX = r'(.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))'

          def _cc2jl(string):
          return re.subn(REGEX, _jl_match, string)[0]





          share|improve this answer























          • Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
            – Graipher
            Jan 25 at 19:50







          • 2




            Here is a comparison of various regexes that could be used.
            – 200_success
            Jan 25 at 19:52










          • I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
            – Richard Neumann
            Jan 26 at 9:02






          • 1




            @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
            – Peilonrayz
            Jan 26 at 11:33







          • 2




            @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
            – Peilonrayz
            Jan 26 at 12:07












          up vote
          12
          down vote



          accepted







          up vote
          12
          down vote



          accepted






          I think regex would be the easiest to do this with. You just need to find each word in the string.



          1. The name starts with an uppercase letter. [A-Z]. However due to the way the rest of the regex works, we can change this to ., so that we match all words, even ones that start with _.


          2. The word will either contain uppercase or lowercase letters, both with other characters - excluding _.




            • Uppercase:



              1. The word won't be lowercase or have a _. [^a-z_]+

              2. You don't want the last uppercase letter or a _. (?=[A-Z_])

              3. You want the above if it's the last in the string. (?=[A-Z_]|$)



            • Lowercase:



              1. The word won't be uppercase or have a _. [^A-Z_]+



          And so you can use:



          (.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))


          You then want to apply the following to these:




          1. Prepend the name with a single _, unless:



            • It's the first word in the name

            • The word already starts with a _


          2. Convert the word to lowercase


          Making:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()


          And so I'd use:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()

          REGEX = r'(.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))'

          def _cc2jl(string):
          return re.subn(REGEX, _jl_match, string)[0]





          share|improve this answer















          I think regex would be the easiest to do this with. You just need to find each word in the string.



          1. The name starts with an uppercase letter. [A-Z]. However due to the way the rest of the regex works, we can change this to ., so that we match all words, even ones that start with _.


          2. The word will either contain uppercase or lowercase letters, both with other characters - excluding _.




            • Uppercase:



              1. The word won't be lowercase or have a _. [^a-z_]+

              2. You don't want the last uppercase letter or a _. (?=[A-Z_])

              3. You want the above if it's the last in the string. (?=[A-Z_]|$)



            • Lowercase:



              1. The word won't be uppercase or have a _. [^A-Z_]+



          And so you can use:



          (.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))


          You then want to apply the following to these:




          1. Prepend the name with a single _, unless:



            • It's the first word in the name

            • The word already starts with a _


          2. Convert the word to lowercase


          Making:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()


          And so I'd use:



          def _jl_match(match):
          group = match.group()
          prefix = bool(match.start() and not group.startswith('_'))
          return '_' * prefix + group.lower()

          REGEX = r'(.(?:[^a-z_]+(?=[A-Z_]|$)|[^A-Z_]+))'

          def _cc2jl(string):
          return re.subn(REGEX, _jl_match, string)[0]






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jan 25 at 16:24


























          answered Jan 25 at 16:17









          Peilonrayz

          24.4k336102




          24.4k336102











          • Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
            – Graipher
            Jan 25 at 19:50







          • 2




            Here is a comparison of various regexes that could be used.
            – 200_success
            Jan 25 at 19:52










          • I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
            – Richard Neumann
            Jan 26 at 9:02






          • 1




            @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
            – Peilonrayz
            Jan 26 at 11:33







          • 2




            @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
            – Peilonrayz
            Jan 26 at 12:07
















          • Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
            – Graipher
            Jan 25 at 19:50







          • 2




            Here is a comparison of various regexes that could be used.
            – 200_success
            Jan 25 at 19:52










          • I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
            – Richard Neumann
            Jan 26 at 9:02






          • 1




            @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
            – Peilonrayz
            Jan 26 at 11:33







          • 2




            @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
            – Peilonrayz
            Jan 26 at 12:07















          Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
          – Graipher
          Jan 25 at 19:50





          Nice, I knew there should be a way to do it with a single regex, but my regex-fu was too weak to come up with one...I added your function to the timing plot, and it is the fastest (so far). I did modify it so that the regex is compiled first.
          – Graipher
          Jan 25 at 19:50





          2




          2




          Here is a comparison of various regexes that could be used.
          – 200_success
          Jan 25 at 19:52




          Here is a comparison of various regexes that could be used.
          – 200_success
          Jan 25 at 19:52












          I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
          – Richard Neumann
          Jan 26 at 9:02




          I went with your solution because of its simplicity and performance: github.com/HOMEINFO/strflib/commit/…
          – Richard Neumann
          Jan 26 at 9:02




          1




          1




          @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
          – Peilonrayz
          Jan 26 at 11:33





          @RichardNeumann Thank you for that. But it wasn't the BY part you violated - and I'm happy you didn't :) It was the SA part. If SE used CC-BY-SA 4.0 and you relicensed to GPLv3, then there wouldn't be any violation. but they use CC-BY-SA 3.0, ):
          – Peilonrayz
          Jan 26 at 11:33





          2




          2




          @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
          – Peilonrayz
          Jan 26 at 12:07




          @RichardNeumann That's a good point. I wonder if you can do that... IANAL too, so I guess it's time to go to Law.SE, ;P
          – Peilonrayz
          Jan 26 at 12:07












          up vote
          16
          down vote













          A couple of things:



          1. Why 2 functions?

          2. Now that we established (1), you can get rid of the yields.
            yield is useful when you want to process stuff in chunks and do not want to wait for the entire population to be created first. This is not the case here.

          3. There are some continues that do make sense and some that are redundant. Yours are of the latter type.

          4. Nested if statements without corresponding else clauses can be merged with and if readability is not reduced.

          Putting all that together, we get the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          try:
          if my_str[i-1].islower() or my_str[i+1].islower():
          r += '_'
          except IndexError:
          pass
          r += letter.lower()
          return r

          print(cc2jl('m')) #-> m
          print(cc2jl('AA')) #-> aa
          print(cc2jl('MySQLDatabase')) #-> my_sql_database
          print(cc2jl('GarbageCollection')) #-> garbage_collection
          print(cc2jl('AAAAAAAAAAAAAAAAA')) #-> aaaaaaaaaaaaaaaaa



          Finally, some thoughts on avoiding the try block that bothers me personally.



          We can leverage from the short-circuiting used in Python's logical expressions and (re-)write the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          if my_str[i-1].islower() or (i != len(my_str)-1 and my_str[i+1].islower()):
          r += '_'
          r += letter.lower()
          return r


          Note the i != len(my_str)-1 on the left side of the and. If this returns False the my_str[i+1] is not evaluated at all and thus cannot raise the IndexError.






          share|improve this answer























          • Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
            – Graipher
            Jan 25 at 14:07










          • @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
            – Ev. Kounis
            Jan 25 at 14:11










          • You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
            – Graipher
            Jan 25 at 14:15






          • 2




            @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
            – Ev. Kounis
            Jan 25 at 14:17














          up vote
          16
          down vote













          A couple of things:



          1. Why 2 functions?

          2. Now that we established (1), you can get rid of the yields.
            yield is useful when you want to process stuff in chunks and do not want to wait for the entire population to be created first. This is not the case here.

          3. There are some continues that do make sense and some that are redundant. Yours are of the latter type.

          4. Nested if statements without corresponding else clauses can be merged with and if readability is not reduced.

          Putting all that together, we get the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          try:
          if my_str[i-1].islower() or my_str[i+1].islower():
          r += '_'
          except IndexError:
          pass
          r += letter.lower()
          return r

          print(cc2jl('m')) #-> m
          print(cc2jl('AA')) #-> aa
          print(cc2jl('MySQLDatabase')) #-> my_sql_database
          print(cc2jl('GarbageCollection')) #-> garbage_collection
          print(cc2jl('AAAAAAAAAAAAAAAAA')) #-> aaaaaaaaaaaaaaaaa



          Finally, some thoughts on avoiding the try block that bothers me personally.



          We can leverage from the short-circuiting used in Python's logical expressions and (re-)write the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          if my_str[i-1].islower() or (i != len(my_str)-1 and my_str[i+1].islower()):
          r += '_'
          r += letter.lower()
          return r


          Note the i != len(my_str)-1 on the left side of the and. If this returns False the my_str[i+1] is not evaluated at all and thus cannot raise the IndexError.






          share|improve this answer























          • Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
            – Graipher
            Jan 25 at 14:07










          • @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
            – Ev. Kounis
            Jan 25 at 14:11










          • You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
            – Graipher
            Jan 25 at 14:15






          • 2




            @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
            – Ev. Kounis
            Jan 25 at 14:17












          up vote
          16
          down vote










          up vote
          16
          down vote









          A couple of things:



          1. Why 2 functions?

          2. Now that we established (1), you can get rid of the yields.
            yield is useful when you want to process stuff in chunks and do not want to wait for the entire population to be created first. This is not the case here.

          3. There are some continues that do make sense and some that are redundant. Yours are of the latter type.

          4. Nested if statements without corresponding else clauses can be merged with and if readability is not reduced.

          Putting all that together, we get the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          try:
          if my_str[i-1].islower() or my_str[i+1].islower():
          r += '_'
          except IndexError:
          pass
          r += letter.lower()
          return r

          print(cc2jl('m')) #-> m
          print(cc2jl('AA')) #-> aa
          print(cc2jl('MySQLDatabase')) #-> my_sql_database
          print(cc2jl('GarbageCollection')) #-> garbage_collection
          print(cc2jl('AAAAAAAAAAAAAAAAA')) #-> aaaaaaaaaaaaaaaaa



          Finally, some thoughts on avoiding the try block that bothers me personally.



          We can leverage from the short-circuiting used in Python's logical expressions and (re-)write the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          if my_str[i-1].islower() or (i != len(my_str)-1 and my_str[i+1].islower()):
          r += '_'
          r += letter.lower()
          return r


          Note the i != len(my_str)-1 on the left side of the and. If this returns False the my_str[i+1] is not evaluated at all and thus cannot raise the IndexError.






          share|improve this answer















          A couple of things:



          1. Why 2 functions?

          2. Now that we established (1), you can get rid of the yields.
            yield is useful when you want to process stuff in chunks and do not want to wait for the entire population to be created first. This is not the case here.

          3. There are some continues that do make sense and some that are redundant. Yours are of the latter type.

          4. Nested if statements without corresponding else clauses can be merged with and if readability is not reduced.

          Putting all that together, we get the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          try:
          if my_str[i-1].islower() or my_str[i+1].islower():
          r += '_'
          except IndexError:
          pass
          r += letter.lower()
          return r

          print(cc2jl('m')) #-> m
          print(cc2jl('AA')) #-> aa
          print(cc2jl('MySQLDatabase')) #-> my_sql_database
          print(cc2jl('GarbageCollection')) #-> garbage_collection
          print(cc2jl('AAAAAAAAAAAAAAAAA')) #-> aaaaaaaaaaaaaaaaa



          Finally, some thoughts on avoiding the try block that bothers me personally.



          We can leverage from the short-circuiting used in Python's logical expressions and (re-)write the following:



          def cc2jl(my_str):
          """Camel case to joint-lower"""

          r = my_str[0].lower()
          for i, letter in enumerate(my_str[1:], 1):
          if letter.isupper():
          if my_str[i-1].islower() or (i != len(my_str)-1 and my_str[i+1].islower()):
          r += '_'
          r += letter.lower()
          return r


          Note the i != len(my_str)-1 on the left side of the and. If this returns False the my_str[i+1] is not evaluated at all and thus cannot raise the IndexError.







          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jan 25 at 14:23


























          answered Jan 25 at 14:04









          Ev. Kounis

          620313




          620313











          • Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
            – Graipher
            Jan 25 at 14:07










          • @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
            – Ev. Kounis
            Jan 25 at 14:11










          • You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
            – Graipher
            Jan 25 at 14:15






          • 2




            @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
            – Ev. Kounis
            Jan 25 at 14:17
















          • Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
            – Graipher
            Jan 25 at 14:07










          • @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
            – Ev. Kounis
            Jan 25 at 14:11










          • You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
            – Graipher
            Jan 25 at 14:15






          • 2




            @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
            – Ev. Kounis
            Jan 25 at 14:17















          Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
          – Graipher
          Jan 25 at 14:07




          Nice job cleaning up OP's code! I added your function to the timing plot in my answer, it is faster than my regex function and the OP's function (probably due to getting rid of the overhead of the generator).
          – Graipher
          Jan 25 at 14:07












          @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
          – Ev. Kounis
          Jan 25 at 14:11




          @Graipher The thing that I do not like the most about my approach is the try-except block but couldn't think of something more elegant.
          – Ev. Kounis
          Jan 25 at 14:11












          You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
          – Graipher
          Jan 25 at 14:15




          You could replace it with if not 0 < i < len(my_str) - 1, but I'm not sure that is any better, because it needs to check it every iteration, instead of only failing at the beginning or end of the string.
          – Graipher
          Jan 25 at 14:15




          2




          2




          @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
          – Ev. Kounis
          Jan 25 at 14:17




          @Graipher the 0 I do not need since I am iterating over my_str[1:].I tried going over my_str[1:-1] instead and handling the last letter in the return line but that wasn't any better either. Anyway, thanks!
          – Ev. Kounis
          Jan 25 at 14:17










          up vote
          12
          down vote













          First a note on naming: cc2jl is a very cryptic name. Give the public function a clearer name, like to_snake_case or something similar (note that the function does not do anything to a string that already is in snake_case).




          While I quite like the fact that you used a generator approach, which guarantees that you make only on pass over the string, this seems like the perfect place for a regular expression. Some quick googling finds this rather simple function:



          import re

          def convert(name):
          s1 = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
          return re.sub('([a-z0-9])([A-Z])', r'1_2', s1).lower()


          This is definitely shorter, even though that cryptic regex is not very easy to understand. But following your code, while written quite straightforward, is also not easy, especially with the deep nesting.



          It works by first splitting runs of upper-case letters followed by one or more lowercase letters so that the last upper-case letter goes to the lower-case letter run. So "ABCdef" becomes "AB_Cdef".



          Then it separates runs of not-upper-case letters, followed by a single upper-case letter with a "_", so "abcD" becomes "abc_D".



          To get a bit more performance out of this, you should pre-compile the regexes:



          first_cap_re = re.compile('(.)([A-Z][a-z]+)')
          all_cap_re = re.compile('([a-z0-9])([A-Z])')
          def convert(name):
          s1 = first_cap_re.sub(r'1_2', name)
          return all_cap_re.sub(r'1_2', s1).lower()



          Performance wise, all algorithms are similar:



          enter image description here



          Not quite surprising, they all give a linear behavior (as a function of string length).



          My regex version needs to make two passes over the string and is therefore consistently slower. The function written by @Ev.Kounis in his answer outperforms both of ours, but the regex approach by @Peilonrayz is even faster, because it manages to do only one pass, but with a regex.



          Note that all functions are very fast, so as long as you need this less than a few thousand times per second, any is fine.




          The test strings were generated with this code:



          import random
          import string

          strings = [''.join(random.choice(string.ascii_letters) for _ in range(n))
          for n in range(10, 2000, 20)]


          In Python 3 it could have been:



          strings = [''.join(random.choices(string.ascii_letters, k=k))
          for k in range(10, 2000, 20)]





          share|improve this answer























          • I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
            – Ev. Kounis
            Jan 25 at 14:08










          • @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
            – Graipher
            Jan 25 at 14:13











          • Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
            – Richard Neumann
            Jan 26 at 9:02















          up vote
          12
          down vote













          First a note on naming: cc2jl is a very cryptic name. Give the public function a clearer name, like to_snake_case or something similar (note that the function does not do anything to a string that already is in snake_case).




          While I quite like the fact that you used a generator approach, which guarantees that you make only on pass over the string, this seems like the perfect place for a regular expression. Some quick googling finds this rather simple function:



          import re

          def convert(name):
          s1 = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
          return re.sub('([a-z0-9])([A-Z])', r'1_2', s1).lower()


          This is definitely shorter, even though that cryptic regex is not very easy to understand. But following your code, while written quite straightforward, is also not easy, especially with the deep nesting.



          It works by first splitting runs of upper-case letters followed by one or more lowercase letters so that the last upper-case letter goes to the lower-case letter run. So "ABCdef" becomes "AB_Cdef".



          Then it separates runs of not-upper-case letters, followed by a single upper-case letter with a "_", so "abcD" becomes "abc_D".



          To get a bit more performance out of this, you should pre-compile the regexes:



          first_cap_re = re.compile('(.)([A-Z][a-z]+)')
          all_cap_re = re.compile('([a-z0-9])([A-Z])')
          def convert(name):
          s1 = first_cap_re.sub(r'1_2', name)
          return all_cap_re.sub(r'1_2', s1).lower()



          Performance wise, all algorithms are similar:



          enter image description here



          Not quite surprising, they all give a linear behavior (as a function of string length).



          My regex version needs to make two passes over the string and is therefore consistently slower. The function written by @Ev.Kounis in his answer outperforms both of ours, but the regex approach by @Peilonrayz is even faster, because it manages to do only one pass, but with a regex.



          Note that all functions are very fast, so as long as you need this less than a few thousand times per second, any is fine.




          The test strings were generated with this code:



          import random
          import string

          strings = [''.join(random.choice(string.ascii_letters) for _ in range(n))
          for n in range(10, 2000, 20)]


          In Python 3 it could have been:



          strings = [''.join(random.choices(string.ascii_letters, k=k))
          for k in range(10, 2000, 20)]





          share|improve this answer























          • I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
            – Ev. Kounis
            Jan 25 at 14:08










          • @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
            – Graipher
            Jan 25 at 14:13











          • Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
            – Richard Neumann
            Jan 26 at 9:02













          up vote
          12
          down vote










          up vote
          12
          down vote









          First a note on naming: cc2jl is a very cryptic name. Give the public function a clearer name, like to_snake_case or something similar (note that the function does not do anything to a string that already is in snake_case).




          While I quite like the fact that you used a generator approach, which guarantees that you make only on pass over the string, this seems like the perfect place for a regular expression. Some quick googling finds this rather simple function:



          import re

          def convert(name):
          s1 = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
          return re.sub('([a-z0-9])([A-Z])', r'1_2', s1).lower()


          This is definitely shorter, even though that cryptic regex is not very easy to understand. But following your code, while written quite straightforward, is also not easy, especially with the deep nesting.



          It works by first splitting runs of upper-case letters followed by one or more lowercase letters so that the last upper-case letter goes to the lower-case letter run. So "ABCdef" becomes "AB_Cdef".



          Then it separates runs of not-upper-case letters, followed by a single upper-case letter with a "_", so "abcD" becomes "abc_D".



          To get a bit more performance out of this, you should pre-compile the regexes:



          first_cap_re = re.compile('(.)([A-Z][a-z]+)')
          all_cap_re = re.compile('([a-z0-9])([A-Z])')
          def convert(name):
          s1 = first_cap_re.sub(r'1_2', name)
          return all_cap_re.sub(r'1_2', s1).lower()



          Performance wise, all algorithms are similar:



          enter image description here



          Not quite surprising, they all give a linear behavior (as a function of string length).



          My regex version needs to make two passes over the string and is therefore consistently slower. The function written by @Ev.Kounis in his answer outperforms both of ours, but the regex approach by @Peilonrayz is even faster, because it manages to do only one pass, but with a regex.



          Note that all functions are very fast, so as long as you need this less than a few thousand times per second, any is fine.




          The test strings were generated with this code:



          import random
          import string

          strings = [''.join(random.choice(string.ascii_letters) for _ in range(n))
          for n in range(10, 2000, 20)]


          In Python 3 it could have been:



          strings = [''.join(random.choices(string.ascii_letters, k=k))
          for k in range(10, 2000, 20)]





          share|improve this answer















          First a note on naming: cc2jl is a very cryptic name. Give the public function a clearer name, like to_snake_case or something similar (note that the function does not do anything to a string that already is in snake_case).




          While I quite like the fact that you used a generator approach, which guarantees that you make only on pass over the string, this seems like the perfect place for a regular expression. Some quick googling finds this rather simple function:



          import re

          def convert(name):
          s1 = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
          return re.sub('([a-z0-9])([A-Z])', r'1_2', s1).lower()


          This is definitely shorter, even though that cryptic regex is not very easy to understand. But following your code, while written quite straightforward, is also not easy, especially with the deep nesting.



          It works by first splitting runs of upper-case letters followed by one or more lowercase letters so that the last upper-case letter goes to the lower-case letter run. So "ABCdef" becomes "AB_Cdef".



          Then it separates runs of not-upper-case letters, followed by a single upper-case letter with a "_", so "abcD" becomes "abc_D".



          To get a bit more performance out of this, you should pre-compile the regexes:



          first_cap_re = re.compile('(.)([A-Z][a-z]+)')
          all_cap_re = re.compile('([a-z0-9])([A-Z])')
          def convert(name):
          s1 = first_cap_re.sub(r'1_2', name)
          return all_cap_re.sub(r'1_2', s1).lower()



          Performance wise, all algorithms are similar:



          enter image description here



          Not quite surprising, they all give a linear behavior (as a function of string length).



          My regex version needs to make two passes over the string and is therefore consistently slower. The function written by @Ev.Kounis in his answer outperforms both of ours, but the regex approach by @Peilonrayz is even faster, because it manages to do only one pass, but with a regex.



          Note that all functions are very fast, so as long as you need this less than a few thousand times per second, any is fine.




          The test strings were generated with this code:



          import random
          import string

          strings = [''.join(random.choice(string.ascii_letters) for _ in range(n))
          for n in range(10, 2000, 20)]


          In Python 3 it could have been:



          strings = [''.join(random.choices(string.ascii_letters, k=k))
          for k in range(10, 2000, 20)]






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jan 26 at 9:11


























          answered Jan 25 at 14:04









          Graipher

          20.5k43081




          20.5k43081











          • I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
            – Ev. Kounis
            Jan 25 at 14:08










          • @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
            – Graipher
            Jan 25 at 14:13











          • Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
            – Richard Neumann
            Jan 26 at 9:02

















          • I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
            – Ev. Kounis
            Jan 25 at 14:08










          • @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
            – Graipher
            Jan 25 at 14:13











          • Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
            – Richard Neumann
            Jan 26 at 9:02
















          I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
          – Ev. Kounis
          Jan 25 at 14:08




          I have the feeling that I would love this answer if I understood regex. My mind is too small for it though. +1
          – Ev. Kounis
          Jan 25 at 14:08












          @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
          – Graipher
          Jan 25 at 14:13





          @Ev.Kounis I totally agree (since I did not come up with it myself, see the link for the source). Here is at least some explanation: The first substitution takes care of continuous runs of caps (so it splits of the last capital letter from a run of capital letters), while the last substitution splits runs of lower case letters followed by an upper case letter.
          – Graipher
          Jan 25 at 14:13













          Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
          – Richard Neumann
          Jan 26 at 9:02





          Thank you for this insightful analysis and performance measurement. Because of the latter, I accepted @peilonrayz solution.
          – Richard Neumann
          Jan 26 at 9:02











          up vote
          1
          down vote













          It can be a bit shorter:



          def cc2jl(s):
          return "".join(["_"+l if i and l.isupper() and not s[i-1:i+2].isupper() else l for i, l in enumerate(s)]).lower()


          Regular Expression Alternative:



          rx = re.compile(r"(?<=.)(((?<![A-Z])[A-Z])|([A-Z](?=[a-z])))")
          def cc2jl(s):
          return rx.sub("_\1", s).lower()





          share|improve this answer























          • My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
            – M. Utku ALTINKAYA
            Jan 26 at 6:14






          • 1




            @Graipher fixed
            – M. Utku ALTINKAYA
            Jan 26 at 8:06






          • 1




            Added your function to the timings
            – Graipher
            Jan 26 at 9:12










          • "(...) compared to regex I think it is easier to read". Nope
            – ÑÒ¯Ï…к
            Jan 26 at 11:24










          • Thanks @Graipher, I've revised that code and added a regex alternative also
            – M. Utku ALTINKAYA
            Jan 26 at 12:11














          up vote
          1
          down vote













          It can be a bit shorter:



          def cc2jl(s):
          return "".join(["_"+l if i and l.isupper() and not s[i-1:i+2].isupper() else l for i, l in enumerate(s)]).lower()


          Regular Expression Alternative:



          rx = re.compile(r"(?<=.)(((?<![A-Z])[A-Z])|([A-Z](?=[a-z])))")
          def cc2jl(s):
          return rx.sub("_\1", s).lower()





          share|improve this answer























          • My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
            – M. Utku ALTINKAYA
            Jan 26 at 6:14






          • 1




            @Graipher fixed
            – M. Utku ALTINKAYA
            Jan 26 at 8:06






          • 1




            Added your function to the timings
            – Graipher
            Jan 26 at 9:12










          • "(...) compared to regex I think it is easier to read". Nope
            – ÑÒ¯Ï…к
            Jan 26 at 11:24










          • Thanks @Graipher, I've revised that code and added a regex alternative also
            – M. Utku ALTINKAYA
            Jan 26 at 12:11












          up vote
          1
          down vote










          up vote
          1
          down vote









          It can be a bit shorter:



          def cc2jl(s):
          return "".join(["_"+l if i and l.isupper() and not s[i-1:i+2].isupper() else l for i, l in enumerate(s)]).lower()


          Regular Expression Alternative:



          rx = re.compile(r"(?<=.)(((?<![A-Z])[A-Z])|([A-Z](?=[a-z])))")
          def cc2jl(s):
          return rx.sub("_\1", s).lower()





          share|improve this answer















          It can be a bit shorter:



          def cc2jl(s):
          return "".join(["_"+l if i and l.isupper() and not s[i-1:i+2].isupper() else l for i, l in enumerate(s)]).lower()


          Regular Expression Alternative:



          rx = re.compile(r"(?<=.)(((?<![A-Z])[A-Z])|([A-Z](?=[a-z])))")
          def cc2jl(s):
          return rx.sub("_\1", s).lower()






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jan 26 at 12:10


























          answered Jan 25 at 22:38









          M. Utku ALTINKAYA

          1193




          1193











          • My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
            – M. Utku ALTINKAYA
            Jan 26 at 6:14






          • 1




            @Graipher fixed
            – M. Utku ALTINKAYA
            Jan 26 at 8:06






          • 1




            Added your function to the timings
            – Graipher
            Jan 26 at 9:12










          • "(...) compared to regex I think it is easier to read". Nope
            – ÑÒ¯Ï…к
            Jan 26 at 11:24










          • Thanks @Graipher, I've revised that code and added a regex alternative also
            – M. Utku ALTINKAYA
            Jan 26 at 12:11
















          • My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
            – M. Utku ALTINKAYA
            Jan 26 at 6:14






          • 1




            @Graipher fixed
            – M. Utku ALTINKAYA
            Jan 26 at 8:06






          • 1




            Added your function to the timings
            – Graipher
            Jan 26 at 9:12










          • "(...) compared to regex I think it is easier to read". Nope
            – ÑÒ¯Ï…к
            Jan 26 at 11:24










          • Thanks @Graipher, I've revised that code and added a regex alternative also
            – M. Utku ALTINKAYA
            Jan 26 at 12:11















          My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
          – M. Utku ALTINKAYA
          Jan 26 at 6:14




          My observation is the fact that the code is longer than it is necessary, my answer is the alternate solution. Readability is subjective, compared to regex I think it is easier to read.
          – M. Utku ALTINKAYA
          Jan 26 at 6:14




          1




          1




          @Graipher fixed
          – M. Utku ALTINKAYA
          Jan 26 at 8:06




          @Graipher fixed
          – M. Utku ALTINKAYA
          Jan 26 at 8:06




          1




          1




          Added your function to the timings
          – Graipher
          Jan 26 at 9:12




          Added your function to the timings
          – Graipher
          Jan 26 at 9:12












          "(...) compared to regex I think it is easier to read". Nope
          – ÑÒ¯Ï…к
          Jan 26 at 11:24




          "(...) compared to regex I think it is easier to read". Nope
          – ÑÒ¯Ï…к
          Jan 26 at 11:24












          Thanks @Graipher, I've revised that code and added a regex alternative also
          – M. Utku ALTINKAYA
          Jan 26 at 12:11




          Thanks @Graipher, I've revised that code and added a regex alternative also
          – M. Utku ALTINKAYA
          Jan 26 at 12:11












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185966%2ffunctions-to-convert-camelcase-strings-to-snake-case%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Chat program with C++ and SFML

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          Will my employers contract hold up in court?