Filter out non-alphabetic characters from a list of words
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".
There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine
function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?
def clean(word):
returnword = ""
for letter in word.lower():
if letter >= 'a' and letter <='z':
# not out of bounds
returnword += letter
return returnword
def word_count_engine(document):
words = document.split() # if there are extra spaces, split() still filters empty words out FYI
words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
words = [word for word in words if word] # so filter out empty strings and get the final list of clean words
document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"
python strings python-3.x interview-questions
add a comment |Â
up vote
3
down vote
favorite
For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".
There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine
function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?
def clean(word):
returnword = ""
for letter in word.lower():
if letter >= 'a' and letter <='z':
# not out of bounds
returnword += letter
return returnword
def word_count_engine(document):
words = document.split() # if there are extra spaces, split() still filters empty words out FYI
words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
words = [word for word in words if word] # so filter out empty strings and get the final list of clean words
document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"
python strings python-3.x interview-questions
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".
There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine
function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?
def clean(word):
returnword = ""
for letter in word.lower():
if letter >= 'a' and letter <='z':
# not out of bounds
returnword += letter
return returnword
def word_count_engine(document):
words = document.split() # if there are extra spaces, split() still filters empty words out FYI
words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
words = [word for word in words if word] # so filter out empty strings and get the final list of clean words
document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"
python strings python-3.x interview-questions
For coding practice / interview exercises, I'd like to know if there's an optimizaton I can make to the following, where I "clean" a given word to remove punctuation or other characters that are not within "a" to "z".
There are some great answers here to remove punctuation from a string, so my question today is not the best way how to do this, but instead whether there is an optimization I can make to my 3 lines of code below in the word_count_engine
function? Can I do this in 1 or 2 lines or make the code more efficient so it doesn't loop over the list twice (i.e. with 2 list comprehensions)?
def clean(word):
returnword = ""
for letter in word.lower():
if letter >= 'a' and letter <='z':
# not out of bounds
returnword += letter
return returnword
def word_count_engine(document):
words = document.split() # if there are extra spaces, split() still filters empty words out FYI
words = [clean(word) for word in words] # a word like "$33!" will result in an empty string though
words = [word for word in words if word] # so filter out empty strings and get the final list of clean words
document = "Practice makes perfect. you'll only get Perfect by practice. just practice! $544 test"
python strings python-3.x interview-questions
edited Apr 4 at 20:55
200_success
123k14142399
123k14142399
asked Apr 4 at 20:42
rishijd
1585
1585
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
6
down vote
accepted
Since Python strings are immutable, appending one character at a time using +=
is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.
Instead, clean()
should be written like this:
def clean(word):
return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')
Note that Python supports double-ended inequalities.
The name of your word_count_engine
function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:
words = [word for word in map(clean, document.split()) if word]
Also consider replacing all of this code with a simple regular expression substitution.
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
Since Python strings are immutable, appending one character at a time using +=
is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.
Instead, clean()
should be written like this:
def clean(word):
return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')
Note that Python supports double-ended inequalities.
The name of your word_count_engine
function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:
words = [word for word in map(clean, document.split()) if word]
Also consider replacing all of this code with a simple regular expression substitution.
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
add a comment |Â
up vote
6
down vote
accepted
Since Python strings are immutable, appending one character at a time using +=
is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.
Instead, clean()
should be written like this:
def clean(word):
return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')
Note that Python supports double-ended inequalities.
The name of your word_count_engine
function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:
words = [word for word in map(clean, document.split()) if word]
Also consider replacing all of this code with a simple regular expression substitution.
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
add a comment |Â
up vote
6
down vote
accepted
up vote
6
down vote
accepted
Since Python strings are immutable, appending one character at a time using +=
is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.
Instead, clean()
should be written like this:
def clean(word):
return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')
Note that Python supports double-ended inequalities.
The name of your word_count_engine
function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:
words = [word for word in map(clean, document.split()) if word]
Also consider replacing all of this code with a simple regular expression substitution.
Since Python strings are immutable, appending one character at a time using +=
is inefficient. You end up allocating a new string, copying all of the old string, then writing one character.
Instead, clean()
should be written like this:
def clean(word):
return ''.join(letter for letter in word.lower() if 'a' <= letter <= 'z')
Note that Python supports double-ended inequalities.
The name of your word_count_engine
function poorly describes what it does. In fact, the function doesn't print or return anything, so it's all dead code. If I had to rewrite it, though, I'd say:
words = [word for word in map(clean, document.split()) if word]
Also consider replacing all of this code with a simple regular expression substitution.
answered Apr 4 at 21:03
200_success
123k14142399
123k14142399
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
add a comment |Â
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
Awesome, thanks! I learn more than I expect through Stack Exchange thanks to people like you! Re: name of function - sorry, it's because the function is actually for something more detailed than I have described, and the above lines are just the first few lines of the function. I should have made that clear in the question/renamed it. I'll practice with regex functions next.
â rishijd
Apr 5 at 0:39
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191279%2ffilter-out-non-alphabetic-characters-from-a-list-of-words%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password