A word replacer that uses an API to check for different words to make up a new string

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.
I'd like to know from you:
- Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones
- Is my exception logic too confusing? - If so, how to improve on that?
- Is there any way to make my code more efficient?
- Is my documenting style helpful?
word_replacer.py
import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path
def sanitize_for_url(word):
"""
Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
substituted by an empty set
Args:
word (str): Word to sanitize
Returns:
str: Sanitized string
"""
return re.sub('[^a-zA-Zs:]', '', word)
def remove_escapes(word):
"""
Removes escape backslashes that are created by various security mechanisms
Args:
word (str): Word to sanitize
Returns:
Sanitized string
"""
return re.sub(r'\', '', word)
def fetch_words(url):
"""
Retrieving a json result set from the API module
An API object is instantiated and a json result set is returned by calling
the instance specific API.object.getr() function
Args:
url (str): URL string to instantiate the API object
Returns:
dict: JSON data as python dictionary
"""
api = API.API(url, False, '')
return api.getr()
def find_max_len(text):
"""
A linear search of the maximum length of a particular string
Every string in the array is looked up by its length and consequently compared
The string with the biggest length is then returned
Args:
text (arr[str]): array of strings that are compared
Returns:
str: Word with the biggest length
"""
max_length = ''
for i in text:
if len(i) > len(max_length):
max_length = i
return max_length
def find_new_word(words, word_type):
"""
Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
and returned
Args:
words (dict): A json result set as dict
word_type (str): The specific word type - this is actually needed as the key in the json result set dict
Raises:
API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
Returns:
str: New word
"""
word_categories = ["sim", "syn"]
word_list = words.get(word_type, "")
for tag in (x for x in word_categories if x in word_list):
new_word = find_max_len(word_list)
return new_word
raise API.requests.exceptions.HTTPError
def run(text):
"""
Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
and the unchanged word is added to the result array.
If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
and not spamming the server for the time being.
Args:
baseurl (str): URL to instantiate the API object
text (str): String to replace the words from
Returns:
Result string if no ValueError has been found, error message if otherwise
"""
baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
if len(text) <= 500:
try:
compare = pos_tag(text.split())
result =
for word, tag in compare:
if check_standard_word(tag):
result.append(word)
else:
url_word = sanitize_for_url(word)
if not url_word: continue
url = baseurl.format(url_word)
try:
new_word = find_new_word(fetch_words(url), determine_word_type(tag))
match = re.match('[.,-?!()]', word[-1])
if match:
result.append(new_word + match.group()) # only copies over the last character plus the new word
else:
result.append(new_word)
except API.requests.exceptions.HTTPError:
result.append(word) # old, unchanged word
continue
return remove_escapes(' '.join(result))
except ValueError:
Path("/var/www/.inactive").touch()
return "Try again later. API processing limit reached."
else: return "The text you are typing is too long to process. Sorry."
def check_standard_word(tag):
"""
Checks if the values from the compare tuple are found in the exclude array
Args:
tag (str): Tag from nltk.pos_tag(arr[str]) function
Returns:
bool: If found in the array return True, False if otherwise
"""
exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]
if tag in exclude: return True
else: return False
def omitted_words(words):
"""
Checks if new selected word is a composition of multiple words which might include
nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
actually has spaces
Args:
words(str): Sequence of words with spaces
Returns:
str: The word either unchanged or with the substitution of the grammatical words
"""
if re.match('w+s', words):
compare = pos_tag(splice_words(clean_word(words)))
for word, tag in compare:
if check_standard(tag):
print word
words = words.replace(word, '')
return words
def determine_word_type(tag):
"""
Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
Each word in the array is marked with a special tag which can be used to find the correct type of a word.
A selection is given in the arrays.
Args:
compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
Returns:
str: Word type as a string
"""
noun = ["NN", "NNS", "NNPS", "FW"]
adjective = ["JJ", "JJR", "JJS"]
verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
adverb = ["RB", "RBR"]
if tag in noun: return "noun"
elif tag in adjective: return "adjective"
elif tag in verb: return "verb"
elif tag in adverb: return "adverb"
else: return "noun"
inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
print "Try again later. API processing limit reached."
sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])
API.py
import requests
import argparse
"""
This module is a library for a typical API application
There are different variables to set
"""
class API(object):
__xrequest = ''
__api_key = ''
params =
def __init__(self, url, xrequest, api_key, **params):
"""
Init function of the API class
Args:
url (str): URL for the API to call
xrequest (bool): Switch if x-request is needed
api_key (str): API-key as a string
**params (dict): More parameters for the class to parse in the URL
Returns:
API.object: Instance of the API class
"""
parser = argparse.ArgumentParser(description='API library that works with requests')
parser.add_argument('text', nargs='*')
args = parser.parse_args()
self.url = url
self.__xrequest = xrequest
self.__api_key = api_key
self.params = params
def find_error(self, request):
"""
Find-error function that is used to check the json return dict for any error messages
Args:
request (request instance): Instance of the request class
Returns:
bool: True for success, False otherwise
"""
if 'message' or 'error' in request:
return True
else:
return False
def getr(self):
"""
Get request function to build a URL and instantiate a request object with a json result set
Returns:
dict: content of the json-page decoded with the requests.object.json() function
"""
if len(self.params) > 0:
for key, value in self.params.iteritems():
self.url += '?' + key + '=' + value
if self.__xrequest == True:
self.__xrequest = 'x-api-key': ''
self.__xrequest['x-api-key'] = self.__api_key
r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
r.raise_for_status()
if r.status_code == 303: raise requests.exceptions.HTTPError
else: return r.json()
else:
r = requests.get(self.url)# ,allow_redirects=False)
self.find_status(r, 500)
r.raise_for_status()
#if r.status_code == 303: raise requests.exceptions.HTTPError
return r.json()
def find_status(self, request, status):
"""
Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
Args:
request (requests object): Requests object
status (int): Desired status to raise an exception for
Raises:
ValueError
"""
if request.status_code == status:
raise ValueError
The repository can be found here on github.
Thank you for your help.
python python-2.7 api
add a comment |Â
up vote
3
down vote
favorite
This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.
I'd like to know from you:
- Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones
- Is my exception logic too confusing? - If so, how to improve on that?
- Is there any way to make my code more efficient?
- Is my documenting style helpful?
word_replacer.py
import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path
def sanitize_for_url(word):
"""
Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
substituted by an empty set
Args:
word (str): Word to sanitize
Returns:
str: Sanitized string
"""
return re.sub('[^a-zA-Zs:]', '', word)
def remove_escapes(word):
"""
Removes escape backslashes that are created by various security mechanisms
Args:
word (str): Word to sanitize
Returns:
Sanitized string
"""
return re.sub(r'\', '', word)
def fetch_words(url):
"""
Retrieving a json result set from the API module
An API object is instantiated and a json result set is returned by calling
the instance specific API.object.getr() function
Args:
url (str): URL string to instantiate the API object
Returns:
dict: JSON data as python dictionary
"""
api = API.API(url, False, '')
return api.getr()
def find_max_len(text):
"""
A linear search of the maximum length of a particular string
Every string in the array is looked up by its length and consequently compared
The string with the biggest length is then returned
Args:
text (arr[str]): array of strings that are compared
Returns:
str: Word with the biggest length
"""
max_length = ''
for i in text:
if len(i) > len(max_length):
max_length = i
return max_length
def find_new_word(words, word_type):
"""
Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
and returned
Args:
words (dict): A json result set as dict
word_type (str): The specific word type - this is actually needed as the key in the json result set dict
Raises:
API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
Returns:
str: New word
"""
word_categories = ["sim", "syn"]
word_list = words.get(word_type, "")
for tag in (x for x in word_categories if x in word_list):
new_word = find_max_len(word_list)
return new_word
raise API.requests.exceptions.HTTPError
def run(text):
"""
Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
and the unchanged word is added to the result array.
If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
and not spamming the server for the time being.
Args:
baseurl (str): URL to instantiate the API object
text (str): String to replace the words from
Returns:
Result string if no ValueError has been found, error message if otherwise
"""
baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
if len(text) <= 500:
try:
compare = pos_tag(text.split())
result =
for word, tag in compare:
if check_standard_word(tag):
result.append(word)
else:
url_word = sanitize_for_url(word)
if not url_word: continue
url = baseurl.format(url_word)
try:
new_word = find_new_word(fetch_words(url), determine_word_type(tag))
match = re.match('[.,-?!()]', word[-1])
if match:
result.append(new_word + match.group()) # only copies over the last character plus the new word
else:
result.append(new_word)
except API.requests.exceptions.HTTPError:
result.append(word) # old, unchanged word
continue
return remove_escapes(' '.join(result))
except ValueError:
Path("/var/www/.inactive").touch()
return "Try again later. API processing limit reached."
else: return "The text you are typing is too long to process. Sorry."
def check_standard_word(tag):
"""
Checks if the values from the compare tuple are found in the exclude array
Args:
tag (str): Tag from nltk.pos_tag(arr[str]) function
Returns:
bool: If found in the array return True, False if otherwise
"""
exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]
if tag in exclude: return True
else: return False
def omitted_words(words):
"""
Checks if new selected word is a composition of multiple words which might include
nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
actually has spaces
Args:
words(str): Sequence of words with spaces
Returns:
str: The word either unchanged or with the substitution of the grammatical words
"""
if re.match('w+s', words):
compare = pos_tag(splice_words(clean_word(words)))
for word, tag in compare:
if check_standard(tag):
print word
words = words.replace(word, '')
return words
def determine_word_type(tag):
"""
Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
Each word in the array is marked with a special tag which can be used to find the correct type of a word.
A selection is given in the arrays.
Args:
compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
Returns:
str: Word type as a string
"""
noun = ["NN", "NNS", "NNPS", "FW"]
adjective = ["JJ", "JJR", "JJS"]
verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
adverb = ["RB", "RBR"]
if tag in noun: return "noun"
elif tag in adjective: return "adjective"
elif tag in verb: return "verb"
elif tag in adverb: return "adverb"
else: return "noun"
inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
print "Try again later. API processing limit reached."
sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])
API.py
import requests
import argparse
"""
This module is a library for a typical API application
There are different variables to set
"""
class API(object):
__xrequest = ''
__api_key = ''
params =
def __init__(self, url, xrequest, api_key, **params):
"""
Init function of the API class
Args:
url (str): URL for the API to call
xrequest (bool): Switch if x-request is needed
api_key (str): API-key as a string
**params (dict): More parameters for the class to parse in the URL
Returns:
API.object: Instance of the API class
"""
parser = argparse.ArgumentParser(description='API library that works with requests')
parser.add_argument('text', nargs='*')
args = parser.parse_args()
self.url = url
self.__xrequest = xrequest
self.__api_key = api_key
self.params = params
def find_error(self, request):
"""
Find-error function that is used to check the json return dict for any error messages
Args:
request (request instance): Instance of the request class
Returns:
bool: True for success, False otherwise
"""
if 'message' or 'error' in request:
return True
else:
return False
def getr(self):
"""
Get request function to build a URL and instantiate a request object with a json result set
Returns:
dict: content of the json-page decoded with the requests.object.json() function
"""
if len(self.params) > 0:
for key, value in self.params.iteritems():
self.url += '?' + key + '=' + value
if self.__xrequest == True:
self.__xrequest = 'x-api-key': ''
self.__xrequest['x-api-key'] = self.__api_key
r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
r.raise_for_status()
if r.status_code == 303: raise requests.exceptions.HTTPError
else: return r.json()
else:
r = requests.get(self.url)# ,allow_redirects=False)
self.find_status(r, 500)
r.raise_for_status()
#if r.status_code == 303: raise requests.exceptions.HTTPError
return r.json()
def find_status(self, request, status):
"""
Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
Args:
request (requests object): Requests object
status (int): Desired status to raise an exception for
Raises:
ValueError
"""
if request.status_code == status:
raise ValueError
The repository can be found here on github.
Thank you for your help.
python python-2.7 api
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.
I'd like to know from you:
- Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones
- Is my exception logic too confusing? - If so, how to improve on that?
- Is there any way to make my code more efficient?
- Is my documenting style helpful?
word_replacer.py
import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path
def sanitize_for_url(word):
"""
Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
substituted by an empty set
Args:
word (str): Word to sanitize
Returns:
str: Sanitized string
"""
return re.sub('[^a-zA-Zs:]', '', word)
def remove_escapes(word):
"""
Removes escape backslashes that are created by various security mechanisms
Args:
word (str): Word to sanitize
Returns:
Sanitized string
"""
return re.sub(r'\', '', word)
def fetch_words(url):
"""
Retrieving a json result set from the API module
An API object is instantiated and a json result set is returned by calling
the instance specific API.object.getr() function
Args:
url (str): URL string to instantiate the API object
Returns:
dict: JSON data as python dictionary
"""
api = API.API(url, False, '')
return api.getr()
def find_max_len(text):
"""
A linear search of the maximum length of a particular string
Every string in the array is looked up by its length and consequently compared
The string with the biggest length is then returned
Args:
text (arr[str]): array of strings that are compared
Returns:
str: Word with the biggest length
"""
max_length = ''
for i in text:
if len(i) > len(max_length):
max_length = i
return max_length
def find_new_word(words, word_type):
"""
Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
and returned
Args:
words (dict): A json result set as dict
word_type (str): The specific word type - this is actually needed as the key in the json result set dict
Raises:
API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
Returns:
str: New word
"""
word_categories = ["sim", "syn"]
word_list = words.get(word_type, "")
for tag in (x for x in word_categories if x in word_list):
new_word = find_max_len(word_list)
return new_word
raise API.requests.exceptions.HTTPError
def run(text):
"""
Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
and the unchanged word is added to the result array.
If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
and not spamming the server for the time being.
Args:
baseurl (str): URL to instantiate the API object
text (str): String to replace the words from
Returns:
Result string if no ValueError has been found, error message if otherwise
"""
baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
if len(text) <= 500:
try:
compare = pos_tag(text.split())
result =
for word, tag in compare:
if check_standard_word(tag):
result.append(word)
else:
url_word = sanitize_for_url(word)
if not url_word: continue
url = baseurl.format(url_word)
try:
new_word = find_new_word(fetch_words(url), determine_word_type(tag))
match = re.match('[.,-?!()]', word[-1])
if match:
result.append(new_word + match.group()) # only copies over the last character plus the new word
else:
result.append(new_word)
except API.requests.exceptions.HTTPError:
result.append(word) # old, unchanged word
continue
return remove_escapes(' '.join(result))
except ValueError:
Path("/var/www/.inactive").touch()
return "Try again later. API processing limit reached."
else: return "The text you are typing is too long to process. Sorry."
def check_standard_word(tag):
"""
Checks if the values from the compare tuple are found in the exclude array
Args:
tag (str): Tag from nltk.pos_tag(arr[str]) function
Returns:
bool: If found in the array return True, False if otherwise
"""
exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]
if tag in exclude: return True
else: return False
def omitted_words(words):
"""
Checks if new selected word is a composition of multiple words which might include
nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
actually has spaces
Args:
words(str): Sequence of words with spaces
Returns:
str: The word either unchanged or with the substitution of the grammatical words
"""
if re.match('w+s', words):
compare = pos_tag(splice_words(clean_word(words)))
for word, tag in compare:
if check_standard(tag):
print word
words = words.replace(word, '')
return words
def determine_word_type(tag):
"""
Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
Each word in the array is marked with a special tag which can be used to find the correct type of a word.
A selection is given in the arrays.
Args:
compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
Returns:
str: Word type as a string
"""
noun = ["NN", "NNS", "NNPS", "FW"]
adjective = ["JJ", "JJR", "JJS"]
verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
adverb = ["RB", "RBR"]
if tag in noun: return "noun"
elif tag in adjective: return "adjective"
elif tag in verb: return "verb"
elif tag in adverb: return "adverb"
else: return "noun"
inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
print "Try again later. API processing limit reached."
sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])
API.py
import requests
import argparse
"""
This module is a library for a typical API application
There are different variables to set
"""
class API(object):
__xrequest = ''
__api_key = ''
params =
def __init__(self, url, xrequest, api_key, **params):
"""
Init function of the API class
Args:
url (str): URL for the API to call
xrequest (bool): Switch if x-request is needed
api_key (str): API-key as a string
**params (dict): More parameters for the class to parse in the URL
Returns:
API.object: Instance of the API class
"""
parser = argparse.ArgumentParser(description='API library that works with requests')
parser.add_argument('text', nargs='*')
args = parser.parse_args()
self.url = url
self.__xrequest = xrequest
self.__api_key = api_key
self.params = params
def find_error(self, request):
"""
Find-error function that is used to check the json return dict for any error messages
Args:
request (request instance): Instance of the request class
Returns:
bool: True for success, False otherwise
"""
if 'message' or 'error' in request:
return True
else:
return False
def getr(self):
"""
Get request function to build a URL and instantiate a request object with a json result set
Returns:
dict: content of the json-page decoded with the requests.object.json() function
"""
if len(self.params) > 0:
for key, value in self.params.iteritems():
self.url += '?' + key + '=' + value
if self.__xrequest == True:
self.__xrequest = 'x-api-key': ''
self.__xrequest['x-api-key'] = self.__api_key
r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
r.raise_for_status()
if r.status_code == 303: raise requests.exceptions.HTTPError
else: return r.json()
else:
r = requests.get(self.url)# ,allow_redirects=False)
self.find_status(r, 500)
r.raise_for_status()
#if r.status_code == 303: raise requests.exceptions.HTTPError
return r.json()
def find_status(self, request, status):
"""
Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
Args:
request (requests object): Requests object
status (int): Desired status to raise an exception for
Raises:
ValueError
"""
if request.status_code == status:
raise ValueError
The repository can be found here on github.
Thank you for your help.
python python-2.7 api
This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.
I'd like to know from you:
- Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones
- Is my exception logic too confusing? - If so, how to improve on that?
- Is there any way to make my code more efficient?
- Is my documenting style helpful?
word_replacer.py
import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path
def sanitize_for_url(word):
"""
Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
substituted by an empty set
Args:
word (str): Word to sanitize
Returns:
str: Sanitized string
"""
return re.sub('[^a-zA-Zs:]', '', word)
def remove_escapes(word):
"""
Removes escape backslashes that are created by various security mechanisms
Args:
word (str): Word to sanitize
Returns:
Sanitized string
"""
return re.sub(r'\', '', word)
def fetch_words(url):
"""
Retrieving a json result set from the API module
An API object is instantiated and a json result set is returned by calling
the instance specific API.object.getr() function
Args:
url (str): URL string to instantiate the API object
Returns:
dict: JSON data as python dictionary
"""
api = API.API(url, False, '')
return api.getr()
def find_max_len(text):
"""
A linear search of the maximum length of a particular string
Every string in the array is looked up by its length and consequently compared
The string with the biggest length is then returned
Args:
text (arr[str]): array of strings that are compared
Returns:
str: Word with the biggest length
"""
max_length = ''
for i in text:
if len(i) > len(max_length):
max_length = i
return max_length
def find_new_word(words, word_type):
"""
Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
and returned
Args:
words (dict): A json result set as dict
word_type (str): The specific word type - this is actually needed as the key in the json result set dict
Raises:
API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
Returns:
str: New word
"""
word_categories = ["sim", "syn"]
word_list = words.get(word_type, "")
for tag in (x for x in word_categories if x in word_list):
new_word = find_max_len(word_list)
return new_word
raise API.requests.exceptions.HTTPError
def run(text):
"""
Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
and the unchanged word is added to the result array.
If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
and not spamming the server for the time being.
Args:
baseurl (str): URL to instantiate the API object
text (str): String to replace the words from
Returns:
Result string if no ValueError has been found, error message if otherwise
"""
baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
if len(text) <= 500:
try:
compare = pos_tag(text.split())
result =
for word, tag in compare:
if check_standard_word(tag):
result.append(word)
else:
url_word = sanitize_for_url(word)
if not url_word: continue
url = baseurl.format(url_word)
try:
new_word = find_new_word(fetch_words(url), determine_word_type(tag))
match = re.match('[.,-?!()]', word[-1])
if match:
result.append(new_word + match.group()) # only copies over the last character plus the new word
else:
result.append(new_word)
except API.requests.exceptions.HTTPError:
result.append(word) # old, unchanged word
continue
return remove_escapes(' '.join(result))
except ValueError:
Path("/var/www/.inactive").touch()
return "Try again later. API processing limit reached."
else: return "The text you are typing is too long to process. Sorry."
def check_standard_word(tag):
"""
Checks if the values from the compare tuple are found in the exclude array
Args:
tag (str): Tag from nltk.pos_tag(arr[str]) function
Returns:
bool: If found in the array return True, False if otherwise
"""
exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]
if tag in exclude: return True
else: return False
def omitted_words(words):
"""
Checks if new selected word is a composition of multiple words which might include
nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
actually has spaces
Args:
words(str): Sequence of words with spaces
Returns:
str: The word either unchanged or with the substitution of the grammatical words
"""
if re.match('w+s', words):
compare = pos_tag(splice_words(clean_word(words)))
for word, tag in compare:
if check_standard(tag):
print word
words = words.replace(word, '')
return words
def determine_word_type(tag):
"""
Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
Each word in the array is marked with a special tag which can be used to find the correct type of a word.
A selection is given in the arrays.
Args:
compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
Returns:
str: Word type as a string
"""
noun = ["NN", "NNS", "NNPS", "FW"]
adjective = ["JJ", "JJR", "JJS"]
verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
adverb = ["RB", "RBR"]
if tag in noun: return "noun"
elif tag in adjective: return "adjective"
elif tag in verb: return "verb"
elif tag in adverb: return "adverb"
else: return "noun"
inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
print "Try again later. API processing limit reached."
sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])
API.py
import requests
import argparse
"""
This module is a library for a typical API application
There are different variables to set
"""
class API(object):
__xrequest = ''
__api_key = ''
params =
def __init__(self, url, xrequest, api_key, **params):
"""
Init function of the API class
Args:
url (str): URL for the API to call
xrequest (bool): Switch if x-request is needed
api_key (str): API-key as a string
**params (dict): More parameters for the class to parse in the URL
Returns:
API.object: Instance of the API class
"""
parser = argparse.ArgumentParser(description='API library that works with requests')
parser.add_argument('text', nargs='*')
args = parser.parse_args()
self.url = url
self.__xrequest = xrequest
self.__api_key = api_key
self.params = params
def find_error(self, request):
"""
Find-error function that is used to check the json return dict for any error messages
Args:
request (request instance): Instance of the request class
Returns:
bool: True for success, False otherwise
"""
if 'message' or 'error' in request:
return True
else:
return False
def getr(self):
"""
Get request function to build a URL and instantiate a request object with a json result set
Returns:
dict: content of the json-page decoded with the requests.object.json() function
"""
if len(self.params) > 0:
for key, value in self.params.iteritems():
self.url += '?' + key + '=' + value
if self.__xrequest == True:
self.__xrequest = 'x-api-key': ''
self.__xrequest['x-api-key'] = self.__api_key
r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
r.raise_for_status()
if r.status_code == 303: raise requests.exceptions.HTTPError
else: return r.json()
else:
r = requests.get(self.url)# ,allow_redirects=False)
self.find_status(r, 500)
r.raise_for_status()
#if r.status_code == 303: raise requests.exceptions.HTTPError
return r.json()
def find_status(self, request, status):
"""
Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
Args:
request (requests object): Requests object
status (int): Desired status to raise an exception for
Raises:
ValueError
"""
if request.status_code == status:
raise ValueError
The repository can be found here on github.
Thank you for your help.
python python-2.7 api
edited Jan 8 at 17:09
Sam Onela
5,88461545
5,88461545
asked Jan 8 at 16:25
Leo
183
183
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
Your code looks nice.
Here are a few detais:
In find_max_len
The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.
At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.
You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).
In sanitize_for_url
The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.
In check_standard_word
You could write: return tag in exclude.
In determine_word_type
Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.
Also, you may want to replace the code with a dictionnary structure:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
for type_, set_ in types.iteritems():
if tag in set_:
return type_
return 'noun'
Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
types2 = dict()
for type_, set_ in types.iteritems():
for e in set_:
assert e not in types2
types2[e] = type_
return types2.get(tag, 'noun')
(You'd need the dict building part to be moved out of the function to be performed only once).
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Your code looks nice.
Here are a few detais:
In find_max_len
The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.
At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.
You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).
In sanitize_for_url
The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.
In check_standard_word
You could write: return tag in exclude.
In determine_word_type
Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.
Also, you may want to replace the code with a dictionnary structure:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
for type_, set_ in types.iteritems():
if tag in set_:
return type_
return 'noun'
Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
types2 = dict()
for type_, set_ in types.iteritems():
for e in set_:
assert e not in types2
types2[e] = type_
return types2.get(tag, 'noun')
(You'd need the dict building part to be moved out of the function to be performed only once).
add a comment |Â
up vote
2
down vote
accepted
Your code looks nice.
Here are a few detais:
In find_max_len
The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.
At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.
You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).
In sanitize_for_url
The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.
In check_standard_word
You could write: return tag in exclude.
In determine_word_type
Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.
Also, you may want to replace the code with a dictionnary structure:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
for type_, set_ in types.iteritems():
if tag in set_:
return type_
return 'noun'
Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
types2 = dict()
for type_, set_ in types.iteritems():
for e in set_:
assert e not in types2
types2[e] = type_
return types2.get(tag, 'noun')
(You'd need the dict building part to be moved out of the function to be performed only once).
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Your code looks nice.
Here are a few detais:
In find_max_len
The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.
At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.
You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).
In sanitize_for_url
The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.
In check_standard_word
You could write: return tag in exclude.
In determine_word_type
Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.
Also, you may want to replace the code with a dictionnary structure:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
for type_, set_ in types.iteritems():
if tag in set_:
return type_
return 'noun'
Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
types2 = dict()
for type_, set_ in types.iteritems():
for e in set_:
assert e not in types2
types2[e] = type_
return types2.get(tag, 'noun')
(You'd need the dict building part to be moved out of the function to be performed only once).
Your code looks nice.
Here are a few detais:
In find_max_len
The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.
At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.
You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).
In sanitize_for_url
The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.
In check_standard_word
You could write: return tag in exclude.
In determine_word_type
Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.
Also, you may want to replace the code with a dictionnary structure:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
for type_, set_ in types.iteritems():
if tag in set_:
return type_
return 'noun'
Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:
def determine_word_type(tag):
types =
'adjective': "JJ", "JJR", "JJS",
'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
'adverb': "RB", "RBR",
'noun': "NN", "NNS", "NNPS", "FW",
types2 = dict()
for type_, set_ in types.iteritems():
for e in set_:
assert e not in types2
types2[e] = type_
return types2.get(tag, 'noun')
(You'd need the dict building part to be moved out of the function to be performed only once).
edited Jan 8 at 22:06
answered Jan 8 at 17:46
Josay
23.8k13580
23.8k13580
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184593%2fa-word-replacer-that-uses-an-api-to-check-for-different-words-to-make-up-a-new-s%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password