Filter out non-alphabetic characters from a list of words

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite
1












For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".



There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?



def clean(word):
returnword = ""
for letter in word.lower():
if letter >= 'a' and letter <='z':
# not out of bounds
returnword += letter
return returnword


def word_count_engine(document):

words = document.split() # if there are extra spaces, split() still filters empty words out FYI
words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
words = [word for word in words if word] # so filter out empty strings and get the final list of clean words

document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"






share|improve this question



























    up vote
    3
    down vote

    favorite
    1












    For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".



    There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?



    def clean(word):
    returnword = ""
    for letter in word.lower():
    if letter >= 'a' and letter <='z':
    # not out of bounds
    returnword += letter
    return returnword


    def word_count_engine(document):

    words = document.split() # if there are extra spaces, split() still filters empty words out FYI
    words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
    words = [word for word in words if word] # so filter out empty strings and get the final list of clean words

    document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"






    share|improve this question























      up vote
      3
      down vote

      favorite
      1









      up vote
      3
      down vote

      favorite
      1






      1





      For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".



      There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?



      def clean(word):
      returnword = ""
      for letter in word.lower():
      if letter >= 'a' and letter <='z':
      # not out of bounds
      returnword += letter
      return returnword


      def word_count_engine(document):

      words = document.split() # if there are extra spaces, split() still filters empty words out FYI
      words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
      words = [word for word in words if word] # so filter out empty strings and get the final list of clean words

      document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"






      share|improve this question













      For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".



      There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?



      def clean(word):
      returnword = ""
      for letter in word.lower():
      if letter >= 'a' and letter <='z':
      # not out of bounds
      returnword += letter
      return returnword


      def word_count_engine(document):

      words = document.split() # if there are extra spaces, split() still filters empty words out FYI
      words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
      words = [word for word in words if word] # so filter out empty strings and get the final list of clean words

      document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"








      share|improve this question












      share|improve this question




      share|improve this question








      edited Apr 4 at 20:55









      200_success

      123k14142399




      123k14142399









      asked Apr 4 at 20:42









      rishijd

      1585




      1585




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          6
          down vote



          accepted










          Since Python strings are immutable, appending one character at a time using += is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.



          Instead, clean() should be written like this:



          def clean(word):
          return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')


          Note that Python supports double-ended inequalities.



          The name of your word_count_engine function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:



          words = [word for word in map(clean, document.split()) if word]


          Also consider replacing all of this code with a simple regular expression substitution.






          share|improve this answer





















          • Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
            – rishijd
            Apr 5 at 0:39











          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191279%2ffilter-out-non-alphabetic-characters-from-a-list-of-words%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          6
          down vote



          accepted










          Since Python strings are immutable, appending one character at a time using += is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.



          Instead, clean() should be written like this:



          def clean(word):
          return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')


          Note that Python supports double-ended inequalities.



          The name of your word_count_engine function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:



          words = [word for word in map(clean, document.split()) if word]


          Also consider replacing all of this code with a simple regular expression substitution.






          share|improve this answer





















          • Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
            – rishijd
            Apr 5 at 0:39















          up vote
          6
          down vote



          accepted










          Since Python strings are immutable, appending one character at a time using += is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.



          Instead, clean() should be written like this:



          def clean(word):
          return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')


          Note that Python supports double-ended inequalities.



          The name of your word_count_engine function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:



          words = [word for word in map(clean, document.split()) if word]


          Also consider replacing all of this code with a simple regular expression substitution.






          share|improve this answer





















          • Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
            – rishijd
            Apr 5 at 0:39













          up vote
          6
          down vote



          accepted







          up vote
          6
          down vote



          accepted






          Since Python strings are immutable, appending one character at a time using += is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.



          Instead, clean() should be written like this:



          def clean(word):
          return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')


          Note that Python supports double-ended inequalities.



          The name of your word_count_engine function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:



          words = [word for word in map(clean, document.split()) if word]


          Also consider replacing all of this code with a simple regular expression substitution.






          share|improve this answer













          Since Python strings are immutable, appending one character at a time using += is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.



          Instead, clean() should be written like this:



          def clean(word):
          return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')


          Note that Python supports double-ended inequalities.



          The name of your word_count_engine function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:



          words = [word for word in map(clean, document.split()) if word]


          Also consider replacing all of this code with a simple regular expression substitution.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Apr 4 at 21:03









          200_success

          123k14142399




          123k14142399











          • Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
            – rishijd
            Apr 5 at 0:39

















          • Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
            – rishijd
            Apr 5 at 0:39
















          Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
          – rishijd
          Apr 5 at 0:39





          Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
          – rishijd
          Apr 5 at 0:39













           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191279%2ffilter-out-non-alphabetic-characters-from-a-list-of-words%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Chat program with C++ and SFML

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          Will my employers contract hold up in court?