A word replacer that uses an API to check for different words to make up a new string

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.

I'd like to know from you:

Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones

Is my exception logic too confusing? - If so, how to improve on that?

Is there any way to make my code more efficient?

Is my documenting style helpful?

word_replacer.py

import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path

def sanitize_for_url(word):
 """
 Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
 substituted by an empty set
 Args:
 word (str): Word to sanitize
 Returns:
 str: Sanitized string
 """
 return re.sub('[^a-zA-Zs:]', '', word)


def remove_escapes(word):
 """
 Removes escape backslashes that are created by various security mechanisms
 Args:
 word (str): Word to sanitize
 Returns:
 Sanitized string
 """
 return re.sub(r'\', '', word)


def fetch_words(url):
 """
 Retrieving a json result set from the API module
 An API object is instantiated and a json result set is returned by calling
 the instance specific API.object.getr() function
 Args:
 url (str): URL string to instantiate the API object
 Returns:
 dict: JSON data as python dictionary
 """
 api = API.API(url, False, '')
 return api.getr()

def find_max_len(text):
 """
 A linear search of the maximum length of a particular string
 Every string in the array is looked up by its length and consequently compared
 The string with the biggest length is then returned
 Args:
 text (arr[str]): array of strings that are compared
 Returns:
 str: Word with the biggest length
 """
 max_length = ''
 for i in text:
 if len(i) > len(max_length):
 max_length = i
 return max_length

def find_new_word(words, word_type):
 """
 Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
 and returned
 Args:
 words (dict): A json result set as dict
 word_type (str): The specific word type - this is actually needed as the key in the json result set dict
 Raises:
 API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
 non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
 Returns:
 str: New word
 """
 word_categories = ["sim", "syn"]
 word_list = words.get(word_type, "")
 for tag in (x for x in word_categories if x in word_list):
 new_word = find_max_len(word_list)
 return new_word
 raise API.requests.exceptions.HTTPError

def run(text):
 """
 Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
 of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
 First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
 calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
 is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
 and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
 and the unchanged word is added to the result array.
 If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
 and not spamming the server for the time being.
 Args:
 baseurl (str): URL to instantiate the API object
 text (str): String to replace the words from
 Returns:
 Result string if no ValueError has been found, error message if otherwise
 """
 baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
 if len(text) <= 500:
 try:
 compare = pos_tag(text.split())
 result = 
 for word, tag in compare:
 if check_standard_word(tag):
 result.append(word)
 else:
 url_word = sanitize_for_url(word)
 if not url_word: continue
 url = baseurl.format(url_word)
 try:
 new_word = find_new_word(fetch_words(url), determine_word_type(tag))
 match = re.match('[.,-?!()]', word[-1])
 if match:
 result.append(new_word + match.group()) # only copies over the last character plus the new word
 else:
 result.append(new_word)
 except API.requests.exceptions.HTTPError:
 result.append(word) # old, unchanged word
 continue
 return remove_escapes(' '.join(result))
 except ValueError:
 Path("/var/www/.inactive").touch()
 return "Try again later. API processing limit reached."
 else: return "The text you are typing is too long to process. Sorry."

def check_standard_word(tag):
 """
 Checks if the values from the compare tuple are found in the exclude array
 Args:
 tag (str): Tag from nltk.pos_tag(arr[str]) function
 Returns:
 bool: If found in the array return True, False if otherwise
 """
 exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

 if tag in exclude: return True
 else: return False

def omitted_words(words):
 """
 Checks if new selected word is a composition of multiple words which might include
 nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
 actually has spaces
 Args:
 words(str): Sequence of words with spaces
 Returns:
 str: The word either unchanged or with the substitution of the grammatical words
 """
 if re.match('w+s', words):
 compare = pos_tag(splice_words(clean_word(words)))
 for word, tag in compare:
 if check_standard(tag):
 print word
 words = words.replace(word, '')
 return words

def determine_word_type(tag):
 """
 Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function. 
 Each word in the array is marked with a special tag which can be used to find the correct type of a word.
 A selection is given in the arrays.
 Args:
 compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
 Returns:
 str: Word type as a string
 """
 noun = ["NN", "NNS", "NNPS", "FW"]
 adjective = ["JJ", "JJR", "JJS"]
 verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
 adverb = ["RB", "RBR"]

 if tag in noun: return "noun"
 elif tag in adjective: return "adjective"
 elif tag in verb: return "verb"
 elif tag in adverb: return "adverb"
 else: return "noun"

inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
 print "Try again later. API processing limit reached."
 sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])

API.py

import requests
import argparse

"""
This module is a library for a typical API application
There are different variables to set 
"""

class API(object):
 __xrequest = ''
 __api_key = ''
 params = 
 def __init__(self, url, xrequest, api_key, **params):
 """
 Init function of the API class
 Args:
 url (str): URL for the API to call
 xrequest (bool): Switch if x-request is needed
 api_key (str): API-key as a string
 **params (dict): More parameters for the class to parse in the URL
 Returns:
 API.object: Instance of the API class
 """

 parser = argparse.ArgumentParser(description='API library that works with requests')
 parser.add_argument('text', nargs='*')
 args = parser.parse_args()

 self.url = url
 self.__xrequest = xrequest
 self.__api_key = api_key
 self.params = params

 def find_error(self, request):
 """
 Find-error function that is used to check the json return dict for any error messages
 Args:
 request (request instance): Instance of the request class
 Returns:
 bool: True for success, False otherwise
 """
 if 'message' or 'error' in request:
 return True
 else:
 return False

 def getr(self):
 """
 Get request function to build a URL and instantiate a request object with a json result set
 Returns:
 dict: content of the json-page decoded with the requests.object.json() function
 """
 if len(self.params) > 0:
 for key, value in self.params.iteritems():
 self.url += '?' + key + '=' + value
 if self.__xrequest == True:
 self.__xrequest = 'x-api-key': ''
 self.__xrequest['x-api-key'] = self.__api_key
 r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
 r.raise_for_status()
 if r.status_code == 303: raise requests.exceptions.HTTPError
 else: return r.json()
 else:
 r = requests.get(self.url)# ,allow_redirects=False)
 self.find_status(r, 500)
 r.raise_for_status()
 #if r.status_code == 303: raise requests.exceptions.HTTPError
 return r.json()

 def find_status(self, request, status):
 """
 Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
 Args:
 request (requests object): Requests object
 status (int): Desired status to raise an exception for
 Raises:
 ValueError
 """
 if request.status_code == status:
raise ValueError

The repository can be found here on github.

Thank you for your help.

edited Jan 8 at 17:09

Sam Onela

5,88461545

asked Jan 8 at 16:25

Leo

183

add a commentÂ |Â

up vote
3
down vote

favorite

This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.

I'd like to know from you:

Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones

Is my exception logic too confusing? - If so, how to improve on that?

Is there any way to make my code more efficient?

Is my documenting style helpful?

word_replacer.py

import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path

def sanitize_for_url(word):
 """
 Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
 substituted by an empty set
 Args:
 word (str): Word to sanitize
 Returns:
 str: Sanitized string
 """
 return re.sub('[^a-zA-Zs:]', '', word)


def remove_escapes(word):
 """
 Removes escape backslashes that are created by various security mechanisms
 Args:
 word (str): Word to sanitize
 Returns:
 Sanitized string
 """
 return re.sub(r'\', '', word)


def fetch_words(url):
 """
 Retrieving a json result set from the API module
 An API object is instantiated and a json result set is returned by calling
 the instance specific API.object.getr() function
 Args:
 url (str): URL string to instantiate the API object
 Returns:
 dict: JSON data as python dictionary
 """
 api = API.API(url, False, '')
 return api.getr()

def find_max_len(text):
 """
 A linear search of the maximum length of a particular string
 Every string in the array is looked up by its length and consequently compared
 The string with the biggest length is then returned
 Args:
 text (arr[str]): array of strings that are compared
 Returns:
 str: Word with the biggest length
 """
 max_length = ''
 for i in text:
 if len(i) > len(max_length):
 max_length = i
 return max_length

def find_new_word(words, word_type):
 """
 Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
 and returned
 Args:
 words (dict): A json result set as dict
 word_type (str): The specific word type - this is actually needed as the key in the json result set dict
 Raises:
 API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
 non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
 Returns:
 str: New word
 """
 word_categories = ["sim", "syn"]
 word_list = words.get(word_type, "")
 for tag in (x for x in word_categories if x in word_list):
 new_word = find_max_len(word_list)
 return new_word
 raise API.requests.exceptions.HTTPError

def run(text):
 """
 Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
 of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
 First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
 calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
 is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
 and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
 and the unchanged word is added to the result array.
 If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
 and not spamming the server for the time being.
 Args:
 baseurl (str): URL to instantiate the API object
 text (str): String to replace the words from
 Returns:
 Result string if no ValueError has been found, error message if otherwise
 """
 baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
 if len(text) <= 500:
 try:
 compare = pos_tag(text.split())
 result = 
 for word, tag in compare:
 if check_standard_word(tag):
 result.append(word)
 else:
 url_word = sanitize_for_url(word)
 if not url_word: continue
 url = baseurl.format(url_word)
 try:
 new_word = find_new_word(fetch_words(url), determine_word_type(tag))
 match = re.match('[.,-?!()]', word[-1])
 if match:
 result.append(new_word + match.group()) # only copies over the last character plus the new word
 else:
 result.append(new_word)
 except API.requests.exceptions.HTTPError:
 result.append(word) # old, unchanged word
 continue
 return remove_escapes(' '.join(result))
 except ValueError:
 Path("/var/www/.inactive").touch()
 return "Try again later. API processing limit reached."
 else: return "The text you are typing is too long to process. Sorry."

def check_standard_word(tag):
 """
 Checks if the values from the compare tuple are found in the exclude array
 Args:
 tag (str): Tag from nltk.pos_tag(arr[str]) function
 Returns:
 bool: If found in the array return True, False if otherwise
 """
 exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

 if tag in exclude: return True
 else: return False

def omitted_words(words):
 """
 Checks if new selected word is a composition of multiple words which might include
 nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
 actually has spaces
 Args:
 words(str): Sequence of words with spaces
 Returns:
 str: The word either unchanged or with the substitution of the grammatical words
 """
 if re.match('w+s', words):
 compare = pos_tag(splice_words(clean_word(words)))
 for word, tag in compare:
 if check_standard(tag):
 print word
 words = words.replace(word, '')
 return words

def determine_word_type(tag):
 """
 Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function. 
 Each word in the array is marked with a special tag which can be used to find the correct type of a word.
 A selection is given in the arrays.
 Args:
 compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
 Returns:
 str: Word type as a string
 """
 noun = ["NN", "NNS", "NNPS", "FW"]
 adjective = ["JJ", "JJR", "JJS"]
 verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
 adverb = ["RB", "RBR"]

 if tag in noun: return "noun"
 elif tag in adjective: return "adjective"
 elif tag in verb: return "verb"
 elif tag in adverb: return "adverb"
 else: return "noun"

inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
 print "Try again later. API processing limit reached."
 sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])

API.py

import requests
import argparse

"""
This module is a library for a typical API application
There are different variables to set 
"""

class API(object):
 __xrequest = ''
 __api_key = ''
 params = 
 def __init__(self, url, xrequest, api_key, **params):
 """
 Init function of the API class
 Args:
 url (str): URL for the API to call
 xrequest (bool): Switch if x-request is needed
 api_key (str): API-key as a string
 **params (dict): More parameters for the class to parse in the URL
 Returns:
 API.object: Instance of the API class
 """

 parser = argparse.ArgumentParser(description='API library that works with requests')
 parser.add_argument('text', nargs='*')
 args = parser.parse_args()

 self.url = url
 self.__xrequest = xrequest
 self.__api_key = api_key
 self.params = params

 def find_error(self, request):
 """
 Find-error function that is used to check the json return dict for any error messages
 Args:
 request (request instance): Instance of the request class
 Returns:
 bool: True for success, False otherwise
 """
 if 'message' or 'error' in request:
 return True
 else:
 return False

 def getr(self):
 """
 Get request function to build a URL and instantiate a request object with a json result set
 Returns:
 dict: content of the json-page decoded with the requests.object.json() function
 """
 if len(self.params) > 0:
 for key, value in self.params.iteritems():
 self.url += '?' + key + '=' + value
 if self.__xrequest == True:
 self.__xrequest = 'x-api-key': ''
 self.__xrequest['x-api-key'] = self.__api_key
 r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
 r.raise_for_status()
 if r.status_code == 303: raise requests.exceptions.HTTPError
 else: return r.json()
 else:
 r = requests.get(self.url)# ,allow_redirects=False)
 self.find_status(r, 500)
 r.raise_for_status()
 #if r.status_code == 303: raise requests.exceptions.HTTPError
 return r.json()

 def find_status(self, request, status):
 """
 Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
 Args:
 request (requests object): Requests object
 status (int): Desired status to raise an exception for
 Raises:
 ValueError
 """
 if request.status_code == status:
raise ValueError

The repository can be found here on github.

Thank you for your help.

edited Jan 8 at 17:09

Sam Onela

5,88461545

asked Jan 8 at 16:25

Leo

183

add a commentÂ |Â

up vote
3
down vote

favorite

This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.

I'd like to know from you:

Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones

Is my exception logic too confusing? - If so, how to improve on that?

Is there any way to make my code more efficient?

Is my documenting style helpful?

word_replacer.py

import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path

def sanitize_for_url(word):
 """
 Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
 substituted by an empty set
 Args:
 word (str): Word to sanitize
 Returns:
 str: Sanitized string
 """
 return re.sub('[^a-zA-Zs:]', '', word)


def remove_escapes(word):
 """
 Removes escape backslashes that are created by various security mechanisms
 Args:
 word (str): Word to sanitize
 Returns:
 Sanitized string
 """
 return re.sub(r'\', '', word)


def fetch_words(url):
 """
 Retrieving a json result set from the API module
 An API object is instantiated and a json result set is returned by calling
 the instance specific API.object.getr() function
 Args:
 url (str): URL string to instantiate the API object
 Returns:
 dict: JSON data as python dictionary
 """
 api = API.API(url, False, '')
 return api.getr()

def find_max_len(text):
 """
 A linear search of the maximum length of a particular string
 Every string in the array is looked up by its length and consequently compared
 The string with the biggest length is then returned
 Args:
 text (arr[str]): array of strings that are compared
 Returns:
 str: Word with the biggest length
 """
 max_length = ''
 for i in text:
 if len(i) > len(max_length):
 max_length = i
 return max_length

def find_new_word(words, word_type):
 """
 Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
 and returned
 Args:
 words (dict): A json result set as dict
 word_type (str): The specific word type - this is actually needed as the key in the json result set dict
 Raises:
 API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
 non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
 Returns:
 str: New word
 """
 word_categories = ["sim", "syn"]
 word_list = words.get(word_type, "")
 for tag in (x for x in word_categories if x in word_list):
 new_word = find_max_len(word_list)
 return new_word
 raise API.requests.exceptions.HTTPError

def run(text):
 """
 Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
 of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
 First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
 calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
 is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
 and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
 and the unchanged word is added to the result array.
 If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
 and not spamming the server for the time being.
 Args:
 baseurl (str): URL to instantiate the API object
 text (str): String to replace the words from
 Returns:
 Result string if no ValueError has been found, error message if otherwise
 """
 baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
 if len(text) <= 500:
 try:
 compare = pos_tag(text.split())
 result = 
 for word, tag in compare:
 if check_standard_word(tag):
 result.append(word)
 else:
 url_word = sanitize_for_url(word)
 if not url_word: continue
 url = baseurl.format(url_word)
 try:
 new_word = find_new_word(fetch_words(url), determine_word_type(tag))
 match = re.match('[.,-?!()]', word[-1])
 if match:
 result.append(new_word + match.group()) # only copies over the last character plus the new word
 else:
 result.append(new_word)
 except API.requests.exceptions.HTTPError:
 result.append(word) # old, unchanged word
 continue
 return remove_escapes(' '.join(result))
 except ValueError:
 Path("/var/www/.inactive").touch()
 return "Try again later. API processing limit reached."
 else: return "The text you are typing is too long to process. Sorry."

def check_standard_word(tag):
 """
 Checks if the values from the compare tuple are found in the exclude array
 Args:
 tag (str): Tag from nltk.pos_tag(arr[str]) function
 Returns:
 bool: If found in the array return True, False if otherwise
 """
 exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

 if tag in exclude: return True
 else: return False

def omitted_words(words):
 """
 Checks if new selected word is a composition of multiple words which might include
 nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
 actually has spaces
 Args:
 words(str): Sequence of words with spaces
 Returns:
 str: The word either unchanged or with the substitution of the grammatical words
 """
 if re.match('w+s', words):
 compare = pos_tag(splice_words(clean_word(words)))
 for word, tag in compare:
 if check_standard(tag):
 print word
 words = words.replace(word, '')
 return words

def determine_word_type(tag):
 """
 Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function. 
 Each word in the array is marked with a special tag which can be used to find the correct type of a word.
 A selection is given in the arrays.
 Args:
 compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
 Returns:
 str: Word type as a string
 """
 noun = ["NN", "NNS", "NNPS", "FW"]
 adjective = ["JJ", "JJR", "JJS"]
 verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
 adverb = ["RB", "RBR"]

 if tag in noun: return "noun"
 elif tag in adjective: return "adjective"
 elif tag in verb: return "verb"
 elif tag in adverb: return "adverb"
 else: return "noun"

inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
 print "Try again later. API processing limit reached."
 sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])

API.py

import requests
import argparse

"""
This module is a library for a typical API application
There are different variables to set 
"""

class API(object):
 __xrequest = ''
 __api_key = ''
 params = 
 def __init__(self, url, xrequest, api_key, **params):
 """
 Init function of the API class
 Args:
 url (str): URL for the API to call
 xrequest (bool): Switch if x-request is needed
 api_key (str): API-key as a string
 **params (dict): More parameters for the class to parse in the URL
 Returns:
 API.object: Instance of the API class
 """

 parser = argparse.ArgumentParser(description='API library that works with requests')
 parser.add_argument('text', nargs='*')
 args = parser.parse_args()

 self.url = url
 self.__xrequest = xrequest
 self.__api_key = api_key
 self.params = params

 def find_error(self, request):
 """
 Find-error function that is used to check the json return dict for any error messages
 Args:
 request (request instance): Instance of the request class
 Returns:
 bool: True for success, False otherwise
 """
 if 'message' or 'error' in request:
 return True
 else:
 return False

 def getr(self):
 """
 Get request function to build a URL and instantiate a request object with a json result set
 Returns:
 dict: content of the json-page decoded with the requests.object.json() function
 """
 if len(self.params) > 0:
 for key, value in self.params.iteritems():
 self.url += '?' + key + '=' + value
 if self.__xrequest == True:
 self.__xrequest = 'x-api-key': ''
 self.__xrequest['x-api-key'] = self.__api_key
 r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
 r.raise_for_status()
 if r.status_code == 303: raise requests.exceptions.HTTPError
 else: return r.json()
 else:
 r = requests.get(self.url)# ,allow_redirects=False)
 self.find_status(r, 500)
 r.raise_for_status()
 #if r.status_code == 303: raise requests.exceptions.HTTPError
 return r.json()

 def find_status(self, request, status):
 """
 Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
 Args:
 request (requests object): Requests object
 status (int): Desired status to raise an exception for
 Raises:
 ValueError
 """
 if request.status_code == status:
raise ValueError

The repository can be found here on github.

Thank you for your help.

edited Jan 8 at 17:09

Sam Onela

5,88461545

asked Jan 8 at 16:25

Leo

183

This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.

I'd like to know from you:

Can my code be called 'pythonic'?
- I've tried to get rid of calling arrays via index variables as much as possible
- I've also tried to use Python specific functions rather than to build new, redundant ones

Is my exception logic too confusing? - If so, how to improve on that?

Is there any way to make my code more efficient?

Is my documenting style helpful?

word_replacer.py

import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path

def sanitize_for_url(word):
 """
 Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
 substituted by an empty set
 Args:
 word (str): Word to sanitize
 Returns:
 str: Sanitized string
 """
 return re.sub('[^a-zA-Zs:]', '', word)


def remove_escapes(word):
 """
 Removes escape backslashes that are created by various security mechanisms
 Args:
 word (str): Word to sanitize
 Returns:
 Sanitized string
 """
 return re.sub(r'\', '', word)


def fetch_words(url):
 """
 Retrieving a json result set from the API module
 An API object is instantiated and a json result set is returned by calling
 the instance specific API.object.getr() function
 Args:
 url (str): URL string to instantiate the API object
 Returns:
 dict: JSON data as python dictionary
 """
 api = API.API(url, False, '')
 return api.getr()

def find_max_len(text):
 """
 A linear search of the maximum length of a particular string
 Every string in the array is looked up by its length and consequently compared
 The string with the biggest length is then returned
 Args:
 text (arr[str]): array of strings that are compared
 Returns:
 str: Word with the biggest length
 """
 max_length = ''
 for i in text:
 if len(i) > len(max_length):
 max_length = i
 return max_length

def find_new_word(words, word_type):
 """
 Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
 and returned
 Args:
 words (dict): A json result set as dict
 word_type (str): The specific word type - this is actually needed as the key in the json result set dict
 Raises:
 API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
 non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
 Returns:
 str: New word
 """
 word_categories = ["sim", "syn"]
 word_list = words.get(word_type, "")
 for tag in (x for x in word_categories if x in word_list):
 new_word = find_max_len(word_list)
 return new_word
 raise API.requests.exceptions.HTTPError

def run(text):
 """
 Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
 of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
 First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
 calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
 is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
 and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
 and the unchanged word is added to the result array.
 If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
 and not spamming the server for the time being.
 Args:
 baseurl (str): URL to instantiate the API object
 text (str): String to replace the words from
 Returns:
 Result string if no ValueError has been found, error message if otherwise
 """
 baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
 if len(text) <= 500:
 try:
 compare = pos_tag(text.split())
 result = 
 for word, tag in compare:
 if check_standard_word(tag):
 result.append(word)
 else:
 url_word = sanitize_for_url(word)
 if not url_word: continue
 url = baseurl.format(url_word)
 try:
 new_word = find_new_word(fetch_words(url), determine_word_type(tag))
 match = re.match('[.,-?!()]', word[-1])
 if match:
 result.append(new_word + match.group()) # only copies over the last character plus the new word
 else:
 result.append(new_word)
 except API.requests.exceptions.HTTPError:
 result.append(word) # old, unchanged word
 continue
 return remove_escapes(' '.join(result))
 except ValueError:
 Path("/var/www/.inactive").touch()
 return "Try again later. API processing limit reached."
 else: return "The text you are typing is too long to process. Sorry."

def check_standard_word(tag):
 """
 Checks if the values from the compare tuple are found in the exclude array
 Args:
 tag (str): Tag from nltk.pos_tag(arr[str]) function
 Returns:
 bool: If found in the array return True, False if otherwise
 """
 exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

 if tag in exclude: return True
 else: return False

def omitted_words(words):
 """
 Checks if new selected word is a composition of multiple words which might include
 nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
 actually has spaces
 Args:
 words(str): Sequence of words with spaces
 Returns:
 str: The word either unchanged or with the substitution of the grammatical words
 """
 if re.match('w+s', words):
 compare = pos_tag(splice_words(clean_word(words)))
 for word, tag in compare:
 if check_standard(tag):
 print word
 words = words.replace(word, '')
 return words

def determine_word_type(tag):
 """
 Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function. 
 Each word in the array is marked with a special tag which can be used to find the correct type of a word.
 A selection is given in the arrays.
 Args:
 compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
 Returns:
 str: Word type as a string
 """
 noun = ["NN", "NNS", "NNPS", "FW"]
 adjective = ["JJ", "JJR", "JJS"]
 verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
 adverb = ["RB", "RBR"]

 if tag in noun: return "noun"
 elif tag in adjective: return "adjective"
 elif tag in verb: return "verb"
 elif tag in adverb: return "adverb"
 else: return "noun"

inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
 print "Try again later. API processing limit reached."
 sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])

API.py

import requests
import argparse

"""
This module is a library for a typical API application
There are different variables to set 
"""

class API(object):
 __xrequest = ''
 __api_key = ''
 params = 
 def __init__(self, url, xrequest, api_key, **params):
 """
 Init function of the API class
 Args:
 url (str): URL for the API to call
 xrequest (bool): Switch if x-request is needed
 api_key (str): API-key as a string
 **params (dict): More parameters for the class to parse in the URL
 Returns:
 API.object: Instance of the API class
 """

 parser = argparse.ArgumentParser(description='API library that works with requests')
 parser.add_argument('text', nargs='*')
 args = parser.parse_args()

 self.url = url
 self.__xrequest = xrequest
 self.__api_key = api_key
 self.params = params

 def find_error(self, request):
 """
 Find-error function that is used to check the json return dict for any error messages
 Args:
 request (request instance): Instance of the request class
 Returns:
 bool: True for success, False otherwise
 """
 if 'message' or 'error' in request:
 return True
 else:
 return False

 def getr(self):
 """
 Get request function to build a URL and instantiate a request object with a json result set
 Returns:
 dict: content of the json-page decoded with the requests.object.json() function
 """
 if len(self.params) > 0:
 for key, value in self.params.iteritems():
 self.url += '?' + key + '=' + value
 if self.__xrequest == True:
 self.__xrequest = 'x-api-key': ''
 self.__xrequest['x-api-key'] = self.__api_key
 r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
 r.raise_for_status()
 if r.status_code == 303: raise requests.exceptions.HTTPError
 else: return r.json()
 else:
 r = requests.get(self.url)# ,allow_redirects=False)
 self.find_status(r, 500)
 r.raise_for_status()
 #if r.status_code == 303: raise requests.exceptions.HTTPError
 return r.json()

 def find_status(self, request, status):
 """
 Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
 Args:
 request (requests object): Requests object
 status (int): Desired status to raise an exception for
 Raises:
 ValueError
 """
 if request.status_code == status:
raise ValueError

The repository can be found here on github.

Thank you for your help.

edited Jan 8 at 17:09

Sam Onela

5,88461545

asked Jan 8 at 16:25

Leo

183

edited Jan 8 at 17:09

Sam Onela

5,88461545

edited Jan 8 at 17:09

Sam Onela

5,88461545

edited Jan 8 at 17:09

Sam Onela

5,88461545

asked Jan 8 at 16:25

Leo

183

asked Jan 8 at 16:25

Leo

183

asked Jan 8 at 16:25

Leo

183

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

Your code looks nice.

Here are a few detais:

In `find_max_len`

The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.

At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.

You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).

In `sanitize_for_url`

The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.

In `check_standard_word`

You could write: return tag in exclude.

In `determine_word_type`

Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.

Also, you may want to replace the code with a dictionnary structure:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 for type_, set_ in types.iteritems():
 if tag in set_:
 return type_
 return 'noun'

Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 types2 = dict()
 for type_, set_ in types.iteritems():
 for e in set_:
 assert e not in types2
 types2[e] = type_
 return types2.get(tag, 'noun')

(You'd need the dict building part to be moved out of the function to be performed only once).

edited Jan 8 at 22:06

answered Jan 8 at 17:46

Josay

23.8k13580

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184593%2fa-word-replacer-that-uses-an-api-to-check-for-different-words-to-make-up-a-new-s%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

Your code looks nice.

Here are a few detais:

In `find_max_len`

The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.

At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.

In `sanitize_for_url`

The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.

In `check_standard_word`

You could write: return tag in exclude.

In `determine_word_type`

Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.

Also, you may want to replace the code with a dictionnary structure:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 for type_, set_ in types.iteritems():
 if tag in set_:
 return type_
 return 'noun'

Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 types2 = dict()
 for type_, set_ in types.iteritems():
 for e in set_:
 assert e not in types2
 types2[e] = type_
 return types2.get(tag, 'noun')

(You'd need the dict building part to be moved out of the function to be performed only once).

edited Jan 8 at 22:06

answered Jan 8 at 17:46

Josay

23.8k13580

add a commentÂ |Â

up vote
2
down vote

accepted

Your code looks nice.

Here are a few detais:

In `find_max_len`

The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.

At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.

In `sanitize_for_url`

The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.

In `check_standard_word`

You could write: return tag in exclude.

In `determine_word_type`

Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.

Also, you may want to replace the code with a dictionnary structure:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 for type_, set_ in types.iteritems():
 if tag in set_:
 return type_
 return 'noun'

Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 types2 = dict()
 for type_, set_ in types.iteritems():
 for e in set_:
 assert e not in types2
 types2[e] = type_
 return types2.get(tag, 'noun')

(You'd need the dict building part to be moved out of the function to be performed only once).

edited Jan 8 at 22:06

answered Jan 8 at 17:46

Josay

23.8k13580

add a commentÂ |Â

up vote
2
down vote

accepted

Your code looks nice.

Here are a few detais:

In `find_max_len`

The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.

At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.

In `sanitize_for_url`

The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.

In `check_standard_word`

You could write: return tag in exclude.

In `determine_word_type`

Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.

Also, you may want to replace the code with a dictionnary structure:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 for type_, set_ in types.iteritems():
 if tag in set_:
 return type_
 return 'noun'

Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 types2 = dict()
 for type_, set_ in types.iteritems():
 for e in set_:
 assert e not in types2
 types2[e] = type_
 return types2.get(tag, 'noun')

(You'd need the dict building part to be moved out of the function to be performed only once).

edited Jan 8 at 22:06

answered Jan 8 at 17:46

Josay

23.8k13580

Your code looks nice.

Here are a few detais:

In `find_max_len`

The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.

At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.

In `sanitize_for_url`

The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.

In `check_standard_word`

You could write: return tag in exclude.

In `determine_word_type`

Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.

Also, you may want to replace the code with a dictionnary structure:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 for type_, set_ in types.iteritems():
 if tag in set_:
 return type_
 return 'noun'

Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:

def determine_word_type(tag):
 types = 
 'adjective': "JJ", "JJR", "JJS",
 'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
 'adverb': "RB", "RBR",
 'noun': "NN", "NNS", "NNPS", "FW",
 
 types2 = dict()
 for type_, set_ in types.iteritems():
 for e in set_:
 assert e not in types2
 types2[e] = type_
 return types2.get(tag, 'noun')

(You'd need the dict building part to be moved out of the function to be performed only once).

edited Jan 8 at 22:06

answered Jan 8 at 17:46

Josay

23.8k13580

edited Jan 8 at 22:06

answered Jan 8 at 17:46

Josay

23.8k13580

answered Jan 8 at 17:46

Josay

23.8k13580

answered Jan 8 at 17:46

Josay

23.8k13580

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name