A word replacer that uses an API to check for different words to make up a new string

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.



I'd like to know from you:



  • Can my code be called 'pythonic'?

    • I've tried to get rid of calling arrays via index variables as much as possible

    • I've also tried to use Python specific functions rather than to build new, redundant ones


  • Is my exception logic too confusing? - If so, how to improve on that?

  • Is there any way to make my code more efficient?

  • Is my documenting style helpful?

word_replacer.py



import API
import re
from nltk import pos_tag
import sys
from pathlib2 import Path

def sanitize_for_url(word):
"""
Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
substituted by an empty set
Args:
word (str): Word to sanitize
Returns:
str: Sanitized string
"""
return re.sub('[^a-zA-Zs:]', '', word)


def remove_escapes(word):
"""
Removes escape backslashes that are created by various security mechanisms
Args:
word (str): Word to sanitize
Returns:
Sanitized string
"""
return re.sub(r'\', '', word)


def fetch_words(url):
"""
Retrieving a json result set from the API module
An API object is instantiated and a json result set is returned by calling
the instance specific API.object.getr() function
Args:
url (str): URL string to instantiate the API object
Returns:
dict: JSON data as python dictionary
"""
api = API.API(url, False, '')
return api.getr()

def find_max_len(text):
"""
A linear search of the maximum length of a particular string
Every string in the array is looked up by its length and consequently compared
The string with the biggest length is then returned
Args:
text (arr[str]): array of strings that are compared
Returns:
str: Word with the biggest length
"""
max_length = ''
for i in text:
if len(i) > len(max_length):
max_length = i
return max_length

def find_new_word(words, word_type):
"""
Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
and returned
Args:
words (dict): A json result set as dict
word_type (str): The specific word type - this is actually needed as the key in the json result set dict
Raises:
API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
Returns:
str: New word
"""
word_categories = ["sim", "syn"]
word_list = words.get(word_type, "")
for tag in (x for x in word_categories if x in word_list):
new_word = find_max_len(word_list)
return new_word
raise API.requests.exceptions.HTTPError

def run(text):
"""
Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
and the unchanged word is added to the result array.
If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
and not spamming the server for the time being.
Args:
baseurl (str): URL to instantiate the API object
text (str): String to replace the words from
Returns:
Result string if no ValueError has been found, error message if otherwise
"""
baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
if len(text) <= 500:
try:
compare = pos_tag(text.split())
result =
for word, tag in compare:
if check_standard_word(tag):
result.append(word)
else:
url_word = sanitize_for_url(word)
if not url_word: continue
url = baseurl.format(url_word)
try:
new_word = find_new_word(fetch_words(url), determine_word_type(tag))
match = re.match('[.,-?!()]', word[-1])
if match:
result.append(new_word + match.group()) # only copies over the last character plus the new word
else:
result.append(new_word)
except API.requests.exceptions.HTTPError:
result.append(word) # old, unchanged word
continue
return remove_escapes(' '.join(result))
except ValueError:
Path("/var/www/.inactive").touch()
return "Try again later. API processing limit reached."
else: return "The text you are typing is too long to process. Sorry."

def check_standard_word(tag):
"""
Checks if the values from the compare tuple are found in the exclude array
Args:
tag (str): Tag from nltk.pos_tag(arr[str]) function
Returns:
bool: If found in the array return True, False if otherwise
"""
exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

if tag in exclude: return True
else: return False

def omitted_words(words):
"""
Checks if new selected word is a composition of multiple words which might include
nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
actually has spaces
Args:
words(str): Sequence of words with spaces
Returns:
str: The word either unchanged or with the substitution of the grammatical words
"""
if re.match('w+s', words):
compare = pos_tag(splice_words(clean_word(words)))
for word, tag in compare:
if check_standard(tag):
print word
words = words.replace(word, '')
return words

def determine_word_type(tag):
"""
Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
Each word in the array is marked with a special tag which can be used to find the correct type of a word.
A selection is given in the arrays.
Args:
compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
Returns:
str: Word type as a string
"""
noun = ["NN", "NNS", "NNPS", "FW"]
adjective = ["JJ", "JJR", "JJS"]
verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
adverb = ["RB", "RBR"]

if tag in noun: return "noun"
elif tag in adjective: return "adjective"
elif tag in verb: return "verb"
elif tag in adverb: return "adverb"
else: return "noun"

inactive_switch = Path("/var/www/.inactive")
if inactive_switch.is_file():
print "Try again later. API processing limit reached."
sys.exit()
if len(sys.argv) > 1: print run(sys.argv[1])


API.py



import requests
import argparse

"""
This module is a library for a typical API application
There are different variables to set
"""

class API(object):
__xrequest = ''
__api_key = ''
params =
def __init__(self, url, xrequest, api_key, **params):
"""
Init function of the API class
Args:
url (str): URL for the API to call
xrequest (bool): Switch if x-request is needed
api_key (str): API-key as a string
**params (dict): More parameters for the class to parse in the URL
Returns:
API.object: Instance of the API class
"""

parser = argparse.ArgumentParser(description='API library that works with requests')
parser.add_argument('text', nargs='*')
args = parser.parse_args()

self.url = url
self.__xrequest = xrequest
self.__api_key = api_key
self.params = params

def find_error(self, request):
"""
Find-error function that is used to check the json return dict for any error messages
Args:
request (request instance): Instance of the request class
Returns:
bool: True for success, False otherwise
"""
if 'message' or 'error' in request:
return True
else:
return False

def getr(self):
"""
Get request function to build a URL and instantiate a request object with a json result set
Returns:
dict: content of the json-page decoded with the requests.object.json() function
"""
if len(self.params) > 0:
for key, value in self.params.iteritems():
self.url += '?' + key + '=' + value
if self.__xrequest == True:
self.__xrequest = 'x-api-key': ''
self.__xrequest['x-api-key'] = self.__api_key
r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
r.raise_for_status()
if r.status_code == 303: raise requests.exceptions.HTTPError
else: return r.json()
else:
r = requests.get(self.url)# ,allow_redirects=False)
self.find_status(r, 500)
r.raise_for_status()
#if r.status_code == 303: raise requests.exceptions.HTTPError
return r.json()

def find_status(self, request, status):
"""
Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
Args:
request (requests object): Requests object
status (int): Desired status to raise an exception for
Raises:
ValueError
"""
if request.status_code == status:
raise ValueError


The repository can be found here on github.



Thank you for your help.







share|improve this question



























    up vote
    3
    down vote

    favorite












    This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.



    I'd like to know from you:



    • Can my code be called 'pythonic'?

      • I've tried to get rid of calling arrays via index variables as much as possible

      • I've also tried to use Python specific functions rather than to build new, redundant ones


    • Is my exception logic too confusing? - If so, how to improve on that?

    • Is there any way to make my code more efficient?

    • Is my documenting style helpful?

    word_replacer.py



    import API
    import re
    from nltk import pos_tag
    import sys
    from pathlib2 import Path

    def sanitize_for_url(word):
    """
    Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
    substituted by an empty set
    Args:
    word (str): Word to sanitize
    Returns:
    str: Sanitized string
    """
    return re.sub('[^a-zA-Zs:]', '', word)


    def remove_escapes(word):
    """
    Removes escape backslashes that are created by various security mechanisms
    Args:
    word (str): Word to sanitize
    Returns:
    Sanitized string
    """
    return re.sub(r'\', '', word)


    def fetch_words(url):
    """
    Retrieving a json result set from the API module
    An API object is instantiated and a json result set is returned by calling
    the instance specific API.object.getr() function
    Args:
    url (str): URL string to instantiate the API object
    Returns:
    dict: JSON data as python dictionary
    """
    api = API.API(url, False, '')
    return api.getr()

    def find_max_len(text):
    """
    A linear search of the maximum length of a particular string
    Every string in the array is looked up by its length and consequently compared
    The string with the biggest length is then returned
    Args:
    text (arr[str]): array of strings that are compared
    Returns:
    str: Word with the biggest length
    """
    max_length = ''
    for i in text:
    if len(i) > len(max_length):
    max_length = i
    return max_length

    def find_new_word(words, word_type):
    """
    Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
    and returned
    Args:
    words (dict): A json result set as dict
    word_type (str): The specific word type - this is actually needed as the key in the json result set dict
    Raises:
    API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
    non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
    Returns:
    str: New word
    """
    word_categories = ["sim", "syn"]
    word_list = words.get(word_type, "")
    for tag in (x for x in word_categories if x in word_list):
    new_word = find_max_len(word_list)
    return new_word
    raise API.requests.exceptions.HTTPError

    def run(text):
    """
    Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
    of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
    First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
    calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
    is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
    and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
    and the unchanged word is added to the result array.
    If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
    and not spamming the server for the time being.
    Args:
    baseurl (str): URL to instantiate the API object
    text (str): String to replace the words from
    Returns:
    Result string if no ValueError has been found, error message if otherwise
    """
    baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
    if len(text) <= 500:
    try:
    compare = pos_tag(text.split())
    result =
    for word, tag in compare:
    if check_standard_word(tag):
    result.append(word)
    else:
    url_word = sanitize_for_url(word)
    if not url_word: continue
    url = baseurl.format(url_word)
    try:
    new_word = find_new_word(fetch_words(url), determine_word_type(tag))
    match = re.match('[.,-?!()]', word[-1])
    if match:
    result.append(new_word + match.group()) # only copies over the last character plus the new word
    else:
    result.append(new_word)
    except API.requests.exceptions.HTTPError:
    result.append(word) # old, unchanged word
    continue
    return remove_escapes(' '.join(result))
    except ValueError:
    Path("/var/www/.inactive").touch()
    return "Try again later. API processing limit reached."
    else: return "The text you are typing is too long to process. Sorry."

    def check_standard_word(tag):
    """
    Checks if the values from the compare tuple are found in the exclude array
    Args:
    tag (str): Tag from nltk.pos_tag(arr[str]) function
    Returns:
    bool: If found in the array return True, False if otherwise
    """
    exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

    if tag in exclude: return True
    else: return False

    def omitted_words(words):
    """
    Checks if new selected word is a composition of multiple words which might include
    nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
    actually has spaces
    Args:
    words(str): Sequence of words with spaces
    Returns:
    str: The word either unchanged or with the substitution of the grammatical words
    """
    if re.match('w+s', words):
    compare = pos_tag(splice_words(clean_word(words)))
    for word, tag in compare:
    if check_standard(tag):
    print word
    words = words.replace(word, '')
    return words

    def determine_word_type(tag):
    """
    Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
    Each word in the array is marked with a special tag which can be used to find the correct type of a word.
    A selection is given in the arrays.
    Args:
    compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
    Returns:
    str: Word type as a string
    """
    noun = ["NN", "NNS", "NNPS", "FW"]
    adjective = ["JJ", "JJR", "JJS"]
    verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
    adverb = ["RB", "RBR"]

    if tag in noun: return "noun"
    elif tag in adjective: return "adjective"
    elif tag in verb: return "verb"
    elif tag in adverb: return "adverb"
    else: return "noun"

    inactive_switch = Path("/var/www/.inactive")
    if inactive_switch.is_file():
    print "Try again later. API processing limit reached."
    sys.exit()
    if len(sys.argv) > 1: print run(sys.argv[1])


    API.py



    import requests
    import argparse

    """
    This module is a library for a typical API application
    There are different variables to set
    """

    class API(object):
    __xrequest = ''
    __api_key = ''
    params =
    def __init__(self, url, xrequest, api_key, **params):
    """
    Init function of the API class
    Args:
    url (str): URL for the API to call
    xrequest (bool): Switch if x-request is needed
    api_key (str): API-key as a string
    **params (dict): More parameters for the class to parse in the URL
    Returns:
    API.object: Instance of the API class
    """

    parser = argparse.ArgumentParser(description='API library that works with requests')
    parser.add_argument('text', nargs='*')
    args = parser.parse_args()

    self.url = url
    self.__xrequest = xrequest
    self.__api_key = api_key
    self.params = params

    def find_error(self, request):
    """
    Find-error function that is used to check the json return dict for any error messages
    Args:
    request (request instance): Instance of the request class
    Returns:
    bool: True for success, False otherwise
    """
    if 'message' or 'error' in request:
    return True
    else:
    return False

    def getr(self):
    """
    Get request function to build a URL and instantiate a request object with a json result set
    Returns:
    dict: content of the json-page decoded with the requests.object.json() function
    """
    if len(self.params) > 0:
    for key, value in self.params.iteritems():
    self.url += '?' + key + '=' + value
    if self.__xrequest == True:
    self.__xrequest = 'x-api-key': ''
    self.__xrequest['x-api-key'] = self.__api_key
    r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
    r.raise_for_status()
    if r.status_code == 303: raise requests.exceptions.HTTPError
    else: return r.json()
    else:
    r = requests.get(self.url)# ,allow_redirects=False)
    self.find_status(r, 500)
    r.raise_for_status()
    #if r.status_code == 303: raise requests.exceptions.HTTPError
    return r.json()

    def find_status(self, request, status):
    """
    Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
    Args:
    request (requests object): Requests object
    status (int): Desired status to raise an exception for
    Raises:
    ValueError
    """
    if request.status_code == status:
    raise ValueError


    The repository can be found here on github.



    Thank you for your help.







    share|improve this question























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.



      I'd like to know from you:



      • Can my code be called 'pythonic'?

        • I've tried to get rid of calling arrays via index variables as much as possible

        • I've also tried to use Python specific functions rather than to build new, redundant ones


      • Is my exception logic too confusing? - If so, how to improve on that?

      • Is there any way to make my code more efficient?

      • Is my documenting style helpful?

      word_replacer.py



      import API
      import re
      from nltk import pos_tag
      import sys
      from pathlib2 import Path

      def sanitize_for_url(word):
      """
      Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
      substituted by an empty set
      Args:
      word (str): Word to sanitize
      Returns:
      str: Sanitized string
      """
      return re.sub('[^a-zA-Zs:]', '', word)


      def remove_escapes(word):
      """
      Removes escape backslashes that are created by various security mechanisms
      Args:
      word (str): Word to sanitize
      Returns:
      Sanitized string
      """
      return re.sub(r'\', '', word)


      def fetch_words(url):
      """
      Retrieving a json result set from the API module
      An API object is instantiated and a json result set is returned by calling
      the instance specific API.object.getr() function
      Args:
      url (str): URL string to instantiate the API object
      Returns:
      dict: JSON data as python dictionary
      """
      api = API.API(url, False, '')
      return api.getr()

      def find_max_len(text):
      """
      A linear search of the maximum length of a particular string
      Every string in the array is looked up by its length and consequently compared
      The string with the biggest length is then returned
      Args:
      text (arr[str]): array of strings that are compared
      Returns:
      str: Word with the biggest length
      """
      max_length = ''
      for i in text:
      if len(i) > len(max_length):
      max_length = i
      return max_length

      def find_new_word(words, word_type):
      """
      Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
      and returned
      Args:
      words (dict): A json result set as dict
      word_type (str): The specific word type - this is actually needed as the key in the json result set dict
      Raises:
      API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
      non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
      Returns:
      str: New word
      """
      word_categories = ["sim", "syn"]
      word_list = words.get(word_type, "")
      for tag in (x for x in word_categories if x in word_list):
      new_word = find_max_len(word_list)
      return new_word
      raise API.requests.exceptions.HTTPError

      def run(text):
      """
      Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
      of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
      First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
      calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
      is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
      and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
      and the unchanged word is added to the result array.
      If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
      and not spamming the server for the time being.
      Args:
      baseurl (str): URL to instantiate the API object
      text (str): String to replace the words from
      Returns:
      Result string if no ValueError has been found, error message if otherwise
      """
      baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
      if len(text) <= 500:
      try:
      compare = pos_tag(text.split())
      result =
      for word, tag in compare:
      if check_standard_word(tag):
      result.append(word)
      else:
      url_word = sanitize_for_url(word)
      if not url_word: continue
      url = baseurl.format(url_word)
      try:
      new_word = find_new_word(fetch_words(url), determine_word_type(tag))
      match = re.match('[.,-?!()]', word[-1])
      if match:
      result.append(new_word + match.group()) # only copies over the last character plus the new word
      else:
      result.append(new_word)
      except API.requests.exceptions.HTTPError:
      result.append(word) # old, unchanged word
      continue
      return remove_escapes(' '.join(result))
      except ValueError:
      Path("/var/www/.inactive").touch()
      return "Try again later. API processing limit reached."
      else: return "The text you are typing is too long to process. Sorry."

      def check_standard_word(tag):
      """
      Checks if the values from the compare tuple are found in the exclude array
      Args:
      tag (str): Tag from nltk.pos_tag(arr[str]) function
      Returns:
      bool: If found in the array return True, False if otherwise
      """
      exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

      if tag in exclude: return True
      else: return False

      def omitted_words(words):
      """
      Checks if new selected word is a composition of multiple words which might include
      nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
      actually has spaces
      Args:
      words(str): Sequence of words with spaces
      Returns:
      str: The word either unchanged or with the substitution of the grammatical words
      """
      if re.match('w+s', words):
      compare = pos_tag(splice_words(clean_word(words)))
      for word, tag in compare:
      if check_standard(tag):
      print word
      words = words.replace(word, '')
      return words

      def determine_word_type(tag):
      """
      Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
      Each word in the array is marked with a special tag which can be used to find the correct type of a word.
      A selection is given in the arrays.
      Args:
      compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
      Returns:
      str: Word type as a string
      """
      noun = ["NN", "NNS", "NNPS", "FW"]
      adjective = ["JJ", "JJR", "JJS"]
      verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
      adverb = ["RB", "RBR"]

      if tag in noun: return "noun"
      elif tag in adjective: return "adjective"
      elif tag in verb: return "verb"
      elif tag in adverb: return "adverb"
      else: return "noun"

      inactive_switch = Path("/var/www/.inactive")
      if inactive_switch.is_file():
      print "Try again later. API processing limit reached."
      sys.exit()
      if len(sys.argv) > 1: print run(sys.argv[1])


      API.py



      import requests
      import argparse

      """
      This module is a library for a typical API application
      There are different variables to set
      """

      class API(object):
      __xrequest = ''
      __api_key = ''
      params =
      def __init__(self, url, xrequest, api_key, **params):
      """
      Init function of the API class
      Args:
      url (str): URL for the API to call
      xrequest (bool): Switch if x-request is needed
      api_key (str): API-key as a string
      **params (dict): More parameters for the class to parse in the URL
      Returns:
      API.object: Instance of the API class
      """

      parser = argparse.ArgumentParser(description='API library that works with requests')
      parser.add_argument('text', nargs='*')
      args = parser.parse_args()

      self.url = url
      self.__xrequest = xrequest
      self.__api_key = api_key
      self.params = params

      def find_error(self, request):
      """
      Find-error function that is used to check the json return dict for any error messages
      Args:
      request (request instance): Instance of the request class
      Returns:
      bool: True for success, False otherwise
      """
      if 'message' or 'error' in request:
      return True
      else:
      return False

      def getr(self):
      """
      Get request function to build a URL and instantiate a request object with a json result set
      Returns:
      dict: content of the json-page decoded with the requests.object.json() function
      """
      if len(self.params) > 0:
      for key, value in self.params.iteritems():
      self.url += '?' + key + '=' + value
      if self.__xrequest == True:
      self.__xrequest = 'x-api-key': ''
      self.__xrequest['x-api-key'] = self.__api_key
      r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
      r.raise_for_status()
      if r.status_code == 303: raise requests.exceptions.HTTPError
      else: return r.json()
      else:
      r = requests.get(self.url)# ,allow_redirects=False)
      self.find_status(r, 500)
      r.raise_for_status()
      #if r.status_code == 303: raise requests.exceptions.HTTPError
      return r.json()

      def find_status(self, request, status):
      """
      Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
      Args:
      request (requests object): Requests object
      status (int): Desired status to raise an exception for
      Raises:
      ValueError
      """
      if request.status_code == status:
      raise ValueError


      The repository can be found here on github.



      Thank you for your help.







      share|improve this question













      This is one of my first finished programs that I've written to date. I am not yet very fond with Python, so please bare with me.



      I'd like to know from you:



      • Can my code be called 'pythonic'?

        • I've tried to get rid of calling arrays via index variables as much as possible

        • I've also tried to use Python specific functions rather than to build new, redundant ones


      • Is my exception logic too confusing? - If so, how to improve on that?

      • Is there any way to make my code more efficient?

      • Is my documenting style helpful?

      word_replacer.py



      import API
      import re
      from nltk import pos_tag
      import sys
      from pathlib2 import Path

      def sanitize_for_url(word):
      """
      Sanitizing of a word with a regex search string - everything that is not alphanumeric, a space or a colon is
      substituted by an empty set
      Args:
      word (str): Word to sanitize
      Returns:
      str: Sanitized string
      """
      return re.sub('[^a-zA-Zs:]', '', word)


      def remove_escapes(word):
      """
      Removes escape backslashes that are created by various security mechanisms
      Args:
      word (str): Word to sanitize
      Returns:
      Sanitized string
      """
      return re.sub(r'\', '', word)


      def fetch_words(url):
      """
      Retrieving a json result set from the API module
      An API object is instantiated and a json result set is returned by calling
      the instance specific API.object.getr() function
      Args:
      url (str): URL string to instantiate the API object
      Returns:
      dict: JSON data as python dictionary
      """
      api = API.API(url, False, '')
      return api.getr()

      def find_max_len(text):
      """
      A linear search of the maximum length of a particular string
      Every string in the array is looked up by its length and consequently compared
      The string with the biggest length is then returned
      Args:
      text (arr[str]): array of strings that are compared
      Returns:
      str: Word with the biggest length
      """
      max_length = ''
      for i in text:
      if len(i) > len(max_length):
      max_length = i
      return max_length

      def find_new_word(words, word_type):
      """
      Checks if the word type is found in the words dict. If so the word with the biggest length is chosen
      and returned
      Args:
      words (dict): A json result set as dict
      word_type (str): The specific word type - this is actually needed as the key in the json result set dict
      Raises:
      API.requests.exceptions.HTTPError: If the key is not found in the dict (and therefore the word type is
      non-existent) - a requests.exceptions.HTTPError is raised for easier logic in the run function
      Returns:
      str: New word
      """
      word_categories = ["sim", "syn"]
      word_list = words.get(word_type, "")
      for tag in (x for x in word_categories if x in word_list):
      new_word = find_max_len(word_list)
      return new_word
      raise API.requests.exceptions.HTTPError

      def run(text):
      """
      Main function that brings everything together - the first part of the URL is used as a parameter for the instantiation
      of the API object. The string (that may be multiple sentences) is then replaced by calling other functions.
      First the string is assigned to an array of strings calling splice_words(str). Then a tuple is assigned by
      calling NLTK.pos_tag(arr[str]). A loop to the length of the text array is then started - checking if the particular word
      is a word in the standard list - check_standard(tuple[str, str]). If not, the sanitization method clean_word[str] is called
      and the URL build. The new word is then appended to the result array. If an exception was raised, all operations are skipped
      and the unchanged word is added to the result array.
      If the API comes to a halt (due to processing limits of the API key), an empty file is set to ensure stopping
      and not spamming the server for the time being.
      Args:
      baseurl (str): URL to instantiate the API object
      text (str): String to replace the words from
      Returns:
      Result string if no ValueError has been found, error message if otherwise
      """
      baseurl = "http://words.bighugelabs.com/api/2/0311fc4c609183416bf8bae6780fb886//json"
      if len(text) <= 500:
      try:
      compare = pos_tag(text.split())
      result =
      for word, tag in compare:
      if check_standard_word(tag):
      result.append(word)
      else:
      url_word = sanitize_for_url(word)
      if not url_word: continue
      url = baseurl.format(url_word)
      try:
      new_word = find_new_word(fetch_words(url), determine_word_type(tag))
      match = re.match('[.,-?!()]', word[-1])
      if match:
      result.append(new_word + match.group()) # only copies over the last character plus the new word
      else:
      result.append(new_word)
      except API.requests.exceptions.HTTPError:
      result.append(word) # old, unchanged word
      continue
      return remove_escapes(' '.join(result))
      except ValueError:
      Path("/var/www/.inactive").touch()
      return "Try again later. API processing limit reached."
      else: return "The text you are typing is too long to process. Sorry."

      def check_standard_word(tag):
      """
      Checks if the values from the compare tuple are found in the exclude array
      Args:
      tag (str): Tag from nltk.pos_tag(arr[str]) function
      Returns:
      bool: If found in the array return True, False if otherwise
      """
      exclude = ["MD", "DT", "PRP", "$PRP", "IN", "CC", "CD", "EX", "NNP", "NNPS", "POS", "PDT", "RP", "WDT", "SYM", "TO"]

      if tag in exclude: return True
      else: return False

      def omitted_words(words):
      """
      Checks if new selected word is a composition of multiple words which might include
      nonsensical grammatical words which are substituted by an empty set. First regex check is to ensure the new word
      actually has spaces
      Args:
      words(str): Sequence of words with spaces
      Returns:
      str: The word either unchanged or with the substitution of the grammatical words
      """
      if re.match('w+s', words):
      compare = pos_tag(splice_words(clean_word(words)))
      for word, tag in compare:
      if check_standard(tag):
      print word
      words = words.replace(word, '')
      return words

      def determine_word_type(tag):
      """
      Determines the word type by checking the tuple created by the nltk.pos_tag(arr[str]) function.
      Each word in the array is marked with a special tag which can be used to find the correct type of a word.
      A selection is given in the arrays.
      Args:
      compare (tuple[str]): Tuple of strings - the word is in the first row, the tag in the second
      Returns:
      str: Word type as a string
      """
      noun = ["NN", "NNS", "NNPS", "FW"]
      adjective = ["JJ", "JJR", "JJS"]
      verb = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"]
      adverb = ["RB", "RBR"]

      if tag in noun: return "noun"
      elif tag in adjective: return "adjective"
      elif tag in verb: return "verb"
      elif tag in adverb: return "adverb"
      else: return "noun"

      inactive_switch = Path("/var/www/.inactive")
      if inactive_switch.is_file():
      print "Try again later. API processing limit reached."
      sys.exit()
      if len(sys.argv) > 1: print run(sys.argv[1])


      API.py



      import requests
      import argparse

      """
      This module is a library for a typical API application
      There are different variables to set
      """

      class API(object):
      __xrequest = ''
      __api_key = ''
      params =
      def __init__(self, url, xrequest, api_key, **params):
      """
      Init function of the API class
      Args:
      url (str): URL for the API to call
      xrequest (bool): Switch if x-request is needed
      api_key (str): API-key as a string
      **params (dict): More parameters for the class to parse in the URL
      Returns:
      API.object: Instance of the API class
      """

      parser = argparse.ArgumentParser(description='API library that works with requests')
      parser.add_argument('text', nargs='*')
      args = parser.parse_args()

      self.url = url
      self.__xrequest = xrequest
      self.__api_key = api_key
      self.params = params

      def find_error(self, request):
      """
      Find-error function that is used to check the json return dict for any error messages
      Args:
      request (request instance): Instance of the request class
      Returns:
      bool: True for success, False otherwise
      """
      if 'message' or 'error' in request:
      return True
      else:
      return False

      def getr(self):
      """
      Get request function to build a URL and instantiate a request object with a json result set
      Returns:
      dict: content of the json-page decoded with the requests.object.json() function
      """
      if len(self.params) > 0:
      for key, value in self.params.iteritems():
      self.url += '?' + key + '=' + value
      if self.__xrequest == True:
      self.__xrequest = 'x-api-key': ''
      self.__xrequest['x-api-key'] = self.__api_key
      r = requests.get(self.url, headers=self.__xrequest, allow_redirects=False)
      r.raise_for_status()
      if r.status_code == 303: raise requests.exceptions.HTTPError
      else: return r.json()
      else:
      r = requests.get(self.url)# ,allow_redirects=False)
      self.find_status(r, 500)
      r.raise_for_status()
      #if r.status_code == 303: raise requests.exceptions.HTTPError
      return r.json()

      def find_status(self, request, status):
      """
      Find status function that checks for a certain status in the requests.object.status_code int and raise a ValueError accordingly
      Args:
      request (requests object): Requests object
      status (int): Desired status to raise an exception for
      Raises:
      ValueError
      """
      if request.status_code == status:
      raise ValueError


      The repository can be found here on github.



      Thank you for your help.









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 8 at 17:09









      Sam Onela

      5,88461545




      5,88461545









      asked Jan 8 at 16:25









      Leo

      183




      183




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Your code looks nice.



          Here are a few detais:



          In find_max_len



          The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.



          At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.



          You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).



          In sanitize_for_url



          The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.



          In check_standard_word



          You could write: return tag in exclude.



          In determine_word_type



          Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.



          Also, you may want to replace the code with a dictionnary structure:



          def determine_word_type(tag):
          types =
          'adjective': "JJ", "JJR", "JJS",
          'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
          'adverb': "RB", "RBR",
          'noun': "NN", "NNS", "NNPS", "FW",

          for type_, set_ in types.iteritems():
          if tag in set_:
          return type_
          return 'noun'


          Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:



          def determine_word_type(tag):
          types =
          'adjective': "JJ", "JJR", "JJS",
          'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
          'adverb': "RB", "RBR",
          'noun': "NN", "NNS", "NNPS", "FW",

          types2 = dict()
          for type_, set_ in types.iteritems():
          for e in set_:
          assert e not in types2
          types2[e] = type_
          return types2.get(tag, 'noun')


          (You'd need the dict building part to be moved out of the function to be performed only once).






          share|improve this answer























            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184593%2fa-word-replacer-that-uses-an-api-to-check-for-different-words-to-make-up-a-new-s%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            Your code looks nice.



            Here are a few detais:



            In find_max_len



            The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.



            At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.



            You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).



            In sanitize_for_url



            The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.



            In check_standard_word



            You could write: return tag in exclude.



            In determine_word_type



            Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.



            Also, you may want to replace the code with a dictionnary structure:



            def determine_word_type(tag):
            types =
            'adjective': "JJ", "JJR", "JJS",
            'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
            'adverb': "RB", "RBR",
            'noun': "NN", "NNS", "NNPS", "FW",

            for type_, set_ in types.iteritems():
            if tag in set_:
            return type_
            return 'noun'


            Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:



            def determine_word_type(tag):
            types =
            'adjective': "JJ", "JJR", "JJS",
            'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
            'adverb': "RB", "RBR",
            'noun': "NN", "NNS", "NNPS", "FW",

            types2 = dict()
            for type_, set_ in types.iteritems():
            for e in set_:
            assert e not in types2
            types2[e] = type_
            return types2.get(tag, 'noun')


            (You'd need the dict building part to be moved out of the function to be performed only once).






            share|improve this answer



























              up vote
              2
              down vote



              accepted










              Your code looks nice.



              Here are a few detais:



              In find_max_len



              The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.



              At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.



              You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).



              In sanitize_for_url



              The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.



              In check_standard_word



              You could write: return tag in exclude.



              In determine_word_type



              Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.



              Also, you may want to replace the code with a dictionnary structure:



              def determine_word_type(tag):
              types =
              'adjective': "JJ", "JJR", "JJS",
              'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
              'adverb': "RB", "RBR",
              'noun': "NN", "NNS", "NNPS", "FW",

              for type_, set_ in types.iteritems():
              if tag in set_:
              return type_
              return 'noun'


              Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:



              def determine_word_type(tag):
              types =
              'adjective': "JJ", "JJR", "JJS",
              'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
              'adverb': "RB", "RBR",
              'noun': "NN", "NNS", "NNPS", "FW",

              types2 = dict()
              for type_, set_ in types.iteritems():
              for e in set_:
              assert e not in types2
              types2[e] = type_
              return types2.get(tag, 'noun')


              (You'd need the dict building part to be moved out of the function to be performed only once).






              share|improve this answer

























                up vote
                2
                down vote



                accepted







                up vote
                2
                down vote



                accepted






                Your code looks nice.



                Here are a few detais:



                In find_max_len



                The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.



                At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.



                You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).



                In sanitize_for_url



                The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.



                In check_standard_word



                You could write: return tag in exclude.



                In determine_word_type



                Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.



                Also, you may want to replace the code with a dictionnary structure:



                def determine_word_type(tag):
                types =
                'adjective': "JJ", "JJR", "JJS",
                'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                'adverb': "RB", "RBR",
                'noun': "NN", "NNS", "NNPS", "FW",

                for type_, set_ in types.iteritems():
                if tag in set_:
                return type_
                return 'noun'


                Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:



                def determine_word_type(tag):
                types =
                'adjective': "JJ", "JJR", "JJS",
                'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                'adverb': "RB", "RBR",
                'noun': "NN", "NNS", "NNPS", "FW",

                types2 = dict()
                for type_, set_ in types.iteritems():
                for e in set_:
                assert e not in types2
                types2[e] = type_
                return types2.get(tag, 'noun')


                (You'd need the dict building part to be moved out of the function to be performed only once).






                share|improve this answer















                Your code looks nice.



                Here are a few detais:



                In find_max_len



                The name max_length suggests an positive integer value corresponding to a length. We actually use it for a string, which may be slightly confusing.



                At every iteration, you compute the length of 2 strings which is probably more than required for an optimal strategy.



                You are lucky because the problem you are trying to solve has a generic solution : max which in your case gives return max(text, key=len, default='') (I've kept '' as a default value as it corresponds to the current behavior but maybe an exception is a more desirable way to handle an empty list).



                In sanitize_for_url



                The docstring says "alphanumeric" but the regexp does not include numbers. Also, if your pont is just to make an URL from a string, you may find better option in the urllib.parse module.



                In check_standard_word



                You could write: return tag in exclude.



                In determine_word_type



                Instead of using lists, you could use sets which is a data type more relevant to what you are trying to achieve.



                Also, you may want to replace the code with a dictionnary structure:



                def determine_word_type(tag):
                types =
                'adjective': "JJ", "JJR", "JJS",
                'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                'adverb': "RB", "RBR",
                'noun': "NN", "NNS", "NNPS", "FW",

                for type_, set_ in types.iteritems():
                if tag in set_:
                return type_
                return 'noun'


                Also, if you want to make lookup faster, you could build a dictionnary from the initial dict mapping word to their types:



                def determine_word_type(tag):
                types =
                'adjective': "JJ", "JJR", "JJS",
                'verb': "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                'adverb': "RB", "RBR",
                'noun': "NN", "NNS", "NNPS", "FW",

                types2 = dict()
                for type_, set_ in types.iteritems():
                for e in set_:
                assert e not in types2
                types2[e] = type_
                return types2.get(tag, 'noun')


                (You'd need the dict building part to be moved out of the function to be performed only once).







                share|improve this answer















                share|improve this answer



                share|improve this answer








                edited Jan 8 at 22:06


























                answered Jan 8 at 17:46









                Josay

                23.8k13580




                23.8k13580






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184593%2fa-word-replacer-that-uses-an-api-to-check-for-different-words-to-make-up-a-new-s%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Python Lists

                    Aion

                    JavaScript Array Iteration Methods