Models for analysing Twitter data (*which based on tweepy package)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
2












I have three modules here models.py, functionals.py, and adjustment.py. These three may be used to perform analysis of Twitter data through tweepy objects.



The Tweets class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors class.



The Authors class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot function).



Here is an example usage :



import numpy
from models import Authors
from models import Tweets

import matplotlib.pyplot as plt

stats = numpy.load('testfile.npy')
tweets = Tweets(stats)

Model = Authors(tweets)

fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))

fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
plt.show()


The example above plots the barplots, with bar colors representing the incolor_measurement variable.



The testfile.npy is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy, collected a few days ago.



My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?




The modules :



adjustment.py:



import copy

class Adjustment(object):

def __init__(self, title = "an adjustment.", **kwargs):

self.title = title

if 'freq_lim' in kwargs:
self.freq_lim = kwargs['freq_lim']
else:
self.freq_lim = None

if 'char_lim' in kwargs:
self.char_lim = kwargs['char_lim']
else:
self.char_lim = None

if 'exclude' in kwargs:
self.exclude = kwargs['exclude']
else:
self.exclude = None


functionals.py:



def filter_unique(tweet_stats_list, output = 'status'):

stats = tweet_stats_list
unique =

for tweet in stats:

try:
if not (tweet.retweeted_status in unique):
if output == 'status':
unique.append(tweet.retweeted_status)
elif output == 'text':
unique.append(tweet.retweeted_status.text)

except:
if not (tweet in unique):
if output == 'status':
unique.append(tweet)
elif output == 'text':
unique.append(tweet.text)

return unique

def split_texts(texts, adjustment = None):
split_list =
for text in texts:
split = text.split()
split_list.extend(split)
return split_list


models.py:



import copy
import numpy
import matplotlib.pyplot as plt
from matplotlib import colors as mplcolors
from . import adjustment as adjust
from . import functionals as func


class Tweets(list):

"""

Tweets model.

"""

def __init__(self, *args, filter_by_unique = False, **kwargs):

if filter_by_unique:
tweets = func.filter_unique(args[0])
else:
tweets = args[0]

list.__init__(self, tweets, **kwargs)

@property
def sorted_by_time(self):
return sorted(self, key = lambda x: x.created_at)

@property
def oldest(self):
return self.sorted_by_time[0]

@property
def newest(self):
return self.sorted_by_time[-1]


class Authors(object):

"""

Authors model.

"""

def __init__(self, tweets):

self.tweets = tweets
self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.followers_count = author: self.authors[author].followers_count for author in self.authors
self.following_count = author: self.authors[author].friends_count for author in self.authors
self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author

def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):

measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])

if type(color) == str:
color = mplcolors.hex2color(mplcolors.cnames[color])

colors = len(self.authors)*[color]
if incolor_measurement != None:
minor_max = max(measurements[incolor_measurement].values())
transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
colors = [(color[0], color[1], color[2], trans) for trans in transparency]

var = [i+height for i in range(len(self.authors))]
ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
ax.set_yticks(var)
ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)

if incolor_measurement != None:
ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
else :
ax.set_xlabel(measurement)
plt.tight_layout()




Result of example usage :



enter image description here



enter image description here







share|improve this question



























    up vote
    1
    down vote

    favorite
    2












    I have three modules here models.py, functionals.py, and adjustment.py. These three may be used to perform analysis of Twitter data through tweepy objects.



    The Tweets class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors class.



    The Authors class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot function).



    Here is an example usage :



    import numpy
    from models import Authors
    from models import Tweets

    import matplotlib.pyplot as plt

    stats = numpy.load('testfile.npy')
    tweets = Tweets(stats)

    Model = Authors(tweets)

    fig, ax = plt.subplots(1, 1)
    Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))

    fig, ax = plt.subplots(1, 1)
    Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
    plt.show()


    The example above plots the barplots, with bar colors representing the incolor_measurement variable.



    The testfile.npy is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy, collected a few days ago.



    My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?




    The modules :



    adjustment.py:



    import copy

    class Adjustment(object):

    def __init__(self, title = "an adjustment.", **kwargs):

    self.title = title

    if 'freq_lim' in kwargs:
    self.freq_lim = kwargs['freq_lim']
    else:
    self.freq_lim = None

    if 'char_lim' in kwargs:
    self.char_lim = kwargs['char_lim']
    else:
    self.char_lim = None

    if 'exclude' in kwargs:
    self.exclude = kwargs['exclude']
    else:
    self.exclude = None


    functionals.py:



    def filter_unique(tweet_stats_list, output = 'status'):

    stats = tweet_stats_list
    unique =

    for tweet in stats:

    try:
    if not (tweet.retweeted_status in unique):
    if output == 'status':
    unique.append(tweet.retweeted_status)
    elif output == 'text':
    unique.append(tweet.retweeted_status.text)

    except:
    if not (tweet in unique):
    if output == 'status':
    unique.append(tweet)
    elif output == 'text':
    unique.append(tweet.text)

    return unique

    def split_texts(texts, adjustment = None):
    split_list =
    for text in texts:
    split = text.split()
    split_list.extend(split)
    return split_list


    models.py:



    import copy
    import numpy
    import matplotlib.pyplot as plt
    from matplotlib import colors as mplcolors
    from . import adjustment as adjust
    from . import functionals as func


    class Tweets(list):

    """

    Tweets model.

    """

    def __init__(self, *args, filter_by_unique = False, **kwargs):

    if filter_by_unique:
    tweets = func.filter_unique(args[0])
    else:
    tweets = args[0]

    list.__init__(self, tweets, **kwargs)

    @property
    def sorted_by_time(self):
    return sorted(self, key = lambda x: x.created_at)

    @property
    def oldest(self):
    return self.sorted_by_time[0]

    @property
    def newest(self):
    return self.sorted_by_time[-1]


    class Authors(object):

    """

    Authors model.

    """

    def __init__(self, tweets):

    self.tweets = tweets
    self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
    self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
    self.followers_count = author: self.authors[author].followers_count for author in self.authors
    self.following_count = author: self.authors[author].friends_count for author in self.authors
    self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
    self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
    self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author

    def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):

    measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
    sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])

    if type(color) == str:
    color = mplcolors.hex2color(mplcolors.cnames[color])

    colors = len(self.authors)*[color]
    if incolor_measurement != None:
    minor_max = max(measurements[incolor_measurement].values())
    transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
    colors = [(color[0], color[1], color[2], trans) for trans in transparency]

    var = [i+height for i in range(len(self.authors))]
    ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
    ax.set_yticks(var)
    ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)

    if incolor_measurement != None:
    ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
    else :
    ax.set_xlabel(measurement)
    plt.tight_layout()




    Result of example usage :



    enter image description here



    enter image description here







    share|improve this question























      up vote
      1
      down vote

      favorite
      2









      up vote
      1
      down vote

      favorite
      2






      2





      I have three modules here models.py, functionals.py, and adjustment.py. These three may be used to perform analysis of Twitter data through tweepy objects.



      The Tweets class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors class.



      The Authors class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot function).



      Here is an example usage :



      import numpy
      from models import Authors
      from models import Tweets

      import matplotlib.pyplot as plt

      stats = numpy.load('testfile.npy')
      tweets = Tweets(stats)

      Model = Authors(tweets)

      fig, ax = plt.subplots(1, 1)
      Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))

      fig, ax = plt.subplots(1, 1)
      Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
      plt.show()


      The example above plots the barplots, with bar colors representing the incolor_measurement variable.



      The testfile.npy is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy, collected a few days ago.



      My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?




      The modules :



      adjustment.py:



      import copy

      class Adjustment(object):

      def __init__(self, title = "an adjustment.", **kwargs):

      self.title = title

      if 'freq_lim' in kwargs:
      self.freq_lim = kwargs['freq_lim']
      else:
      self.freq_lim = None

      if 'char_lim' in kwargs:
      self.char_lim = kwargs['char_lim']
      else:
      self.char_lim = None

      if 'exclude' in kwargs:
      self.exclude = kwargs['exclude']
      else:
      self.exclude = None


      functionals.py:



      def filter_unique(tweet_stats_list, output = 'status'):

      stats = tweet_stats_list
      unique =

      for tweet in stats:

      try:
      if not (tweet.retweeted_status in unique):
      if output == 'status':
      unique.append(tweet.retweeted_status)
      elif output == 'text':
      unique.append(tweet.retweeted_status.text)

      except:
      if not (tweet in unique):
      if output == 'status':
      unique.append(tweet)
      elif output == 'text':
      unique.append(tweet.text)

      return unique

      def split_texts(texts, adjustment = None):
      split_list =
      for text in texts:
      split = text.split()
      split_list.extend(split)
      return split_list


      models.py:



      import copy
      import numpy
      import matplotlib.pyplot as plt
      from matplotlib import colors as mplcolors
      from . import adjustment as adjust
      from . import functionals as func


      class Tweets(list):

      """

      Tweets model.

      """

      def __init__(self, *args, filter_by_unique = False, **kwargs):

      if filter_by_unique:
      tweets = func.filter_unique(args[0])
      else:
      tweets = args[0]

      list.__init__(self, tweets, **kwargs)

      @property
      def sorted_by_time(self):
      return sorted(self, key = lambda x: x.created_at)

      @property
      def oldest(self):
      return self.sorted_by_time[0]

      @property
      def newest(self):
      return self.sorted_by_time[-1]


      class Authors(object):

      """

      Authors model.

      """

      def __init__(self, tweets):

      self.tweets = tweets
      self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
      self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
      self.followers_count = author: self.authors[author].followers_count for author in self.authors
      self.following_count = author: self.authors[author].friends_count for author in self.authors
      self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
      self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
      self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author

      def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):

      measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
      sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])

      if type(color) == str:
      color = mplcolors.hex2color(mplcolors.cnames[color])

      colors = len(self.authors)*[color]
      if incolor_measurement != None:
      minor_max = max(measurements[incolor_measurement].values())
      transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
      colors = [(color[0], color[1], color[2], trans) for trans in transparency]

      var = [i+height for i in range(len(self.authors))]
      ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
      ax.set_yticks(var)
      ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)

      if incolor_measurement != None:
      ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
      else :
      ax.set_xlabel(measurement)
      plt.tight_layout()




      Result of example usage :



      enter image description here



      enter image description here







      share|improve this question













      I have three modules here models.py, functionals.py, and adjustment.py. These three may be used to perform analysis of Twitter data through tweepy objects.



      The Tweets class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors class.



      The Authors class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot function).



      Here is an example usage :



      import numpy
      from models import Authors
      from models import Tweets

      import matplotlib.pyplot as plt

      stats = numpy.load('testfile.npy')
      tweets = Tweets(stats)

      Model = Authors(tweets)

      fig, ax = plt.subplots(1, 1)
      Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))

      fig, ax = plt.subplots(1, 1)
      Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
      plt.show()


      The example above plots the barplots, with bar colors representing the incolor_measurement variable.



      The testfile.npy is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy, collected a few days ago.



      My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?




      The modules :



      adjustment.py:



      import copy

      class Adjustment(object):

      def __init__(self, title = "an adjustment.", **kwargs):

      self.title = title

      if 'freq_lim' in kwargs:
      self.freq_lim = kwargs['freq_lim']
      else:
      self.freq_lim = None

      if 'char_lim' in kwargs:
      self.char_lim = kwargs['char_lim']
      else:
      self.char_lim = None

      if 'exclude' in kwargs:
      self.exclude = kwargs['exclude']
      else:
      self.exclude = None


      functionals.py:



      def filter_unique(tweet_stats_list, output = 'status'):

      stats = tweet_stats_list
      unique =

      for tweet in stats:

      try:
      if not (tweet.retweeted_status in unique):
      if output == 'status':
      unique.append(tweet.retweeted_status)
      elif output == 'text':
      unique.append(tweet.retweeted_status.text)

      except:
      if not (tweet in unique):
      if output == 'status':
      unique.append(tweet)
      elif output == 'text':
      unique.append(tweet.text)

      return unique

      def split_texts(texts, adjustment = None):
      split_list =
      for text in texts:
      split = text.split()
      split_list.extend(split)
      return split_list


      models.py:



      import copy
      import numpy
      import matplotlib.pyplot as plt
      from matplotlib import colors as mplcolors
      from . import adjustment as adjust
      from . import functionals as func


      class Tweets(list):

      """

      Tweets model.

      """

      def __init__(self, *args, filter_by_unique = False, **kwargs):

      if filter_by_unique:
      tweets = func.filter_unique(args[0])
      else:
      tweets = args[0]

      list.__init__(self, tweets, **kwargs)

      @property
      def sorted_by_time(self):
      return sorted(self, key = lambda x: x.created_at)

      @property
      def oldest(self):
      return self.sorted_by_time[0]

      @property
      def newest(self):
      return self.sorted_by_time[-1]


      class Authors(object):

      """

      Authors model.

      """

      def __init__(self, tweets):

      self.tweets = tweets
      self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
      self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
      self.followers_count = author: self.authors[author].followers_count for author in self.authors
      self.following_count = author: self.authors[author].friends_count for author in self.authors
      self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
      self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
      self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author

      def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):

      measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
      sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])

      if type(color) == str:
      color = mplcolors.hex2color(mplcolors.cnames[color])

      colors = len(self.authors)*[color]
      if incolor_measurement != None:
      minor_max = max(measurements[incolor_measurement].values())
      transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
      colors = [(color[0], color[1], color[2], trans) for trans in transparency]

      var = [i+height for i in range(len(self.authors))]
      ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
      ax.set_yticks(var)
      ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)

      if incolor_measurement != None:
      ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
      else :
      ax.set_xlabel(measurement)
      plt.tight_layout()




      Result of example usage :



      enter image description here



      enter image description here









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jul 30 at 17:51
























      asked Jul 17 at 14:06









      Arief

      405112




      405112




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          You’ve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py into models.py and radial_bar.py into functional.py (and, btw, this kind of file is usually named utils.py).



          Out of the 5 units presented here, 2 are unused: split_texts and Adjustment. So a quick word about them:



          • You can drop the adjustment parameter of split_texts as it is unused;

          • You can use itertools.chain.from_iterable to flatten nested iterables;

          • You can simplify optional arguments by dropping **kwargs in favor of default values.

          All in all, these two can be condensed to:



          import itertools


          class Adjustment:
          def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
          self.title = title
          self.freq_lim = freq_lim
          self.char_lim = char_lim
          self.exclude = exclude


          def split_texts(texts):
          return list(itertools.chain.from_iterable(map(str.split, texts)))


          But the Adjustment class would be even more succint and easily extensible if it were one of the new dataclasses:



          from dataclasses import dataclass


          @dataclass(frozen=True)
          class Adjustment:
          title: str = "An adjustment."
          freq_lim: int = None
          char_lim: int = None
          exclude: str = None



          filter_unique doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output is 'text' because you store the .text attribute of your objects in unique but you are only checking if the object itself is in unique, not its text. Second, you return unique which may contain tweets, statuses or text; but your usage in the Tweet class suggest that you want to return tweets every time.



          To improve things, we could:



          • check unicity using a set whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ for lists;

          • extract the desired attribute out of the object to test for unicity but return the actual tweet;

          • extract attributes step by step instead of having 4 explicit cases;

          • turn the function into a generator, since it will be fed into the list constructor anyway.

          Proposed improvements:



          from contextlib import suppress


          def filter_unique(tweet_stats_list, output='status'):
          uniques = set()
          for tweet in tweet_stats_list:
          tweet_attr = tweet
          with suppress(AttributeError):
          tweet_attr = tweet_attr.retweeted_status
          if output == 'text':
          tweet_attr = tweet_attr.text
          if tweet_attr not in uniques:
          uniques.add(tweet_attr)
          yield tweet



          The Tweet class feels somewhat fine but I don't understand how it improves value over a simple list based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet in the module models, sure this is some kind of """Tweet model"""…



          I also don't understand why you allow for variable number of arguments (using *args) but only ever use args[0]: you’re not even guaranteed that there is at least 1 element in args. Better use an explicit argument here.



          Lastly, you should use super() instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet and list).



          Proposed improvements:



          import operator


          class Tweets(list):
          def __init__(self, tweets, filter_by_unique=False, **kwargs):
          if filter_by_unique:
          tweets = filter_unique(tweets)

          super().__init__(tweets, **kwargs)

          @property
          def sorted_by_time(self):
          return sorted(self, key=operator.attrgetter('created_at'))

          @property
          def oldest(self):
          return min(self, key=operator.attrgetter('created_at'))

          @property
          def newest(self):
          return max(self, key=operator.attrgetter('created_at'))



          The Author class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.



          Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:



          • Sort the array of tweets by author names before processing it;


          • Group the array by author names to gather together the tweets of a single author.

          Note that I don't know tweepy enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)



          Proposed improvements:



          import operator


          class Authors(object):
          def __init__(self, tweets):
          unicity_key = operator.attrgetter('author.name')
          tweets = sorted(tweets, key=unicity_key)

          self.followers_count =
          self.following_count =
          self.total_tweets =
          self.tweets_count =
          for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
          author_tweets = list(author_tweets)
          author = author_tweets[0].author
          self.followers_count[author.name] = author.followers_count
          self.following_count[author.name] = author.friends_count
          self.total_tweets[author.name] = author.statuses_count
          self.tweets_count[author.name] = len(author_tweets)

          def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
          measurements =
          'Followers': self.followers_count,
          'Following': self.following_count,
          'Total Tweets': self.total_tweets,
          'Sample Tweets': self.tweets_count,

          author_measurement = measurements[measurement]
          sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)

          if isinstance(color, str):
          color = mplcolors.hex2color(mplcolors.cnames[color])

          if incolor_measurement is not None:
          color_measurement = measurements[incolor_measurement]
          minor_max = max(color_measurement.values())
          colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
          measurement = ' (color: )'.format(measurement, incolor_measurement)
          else:
          colors = [color] * len(author_measurement)

          ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
          ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
          ax.set_yticks(ticks)
          ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
          ax.set_xlabel(measurement)
          plt.tight_layout()





          share|improve this answer























          • Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
            – Arief
            Jul 18 at 10:06











          • @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
            – Mathias Ettinger
            Jul 18 at 10:22










          • Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
            – Arief
            Jul 26 at 10:44







          • 1




            @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
            – Mathias Ettinger
            Jul 26 at 11:45







          • 1




            @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
            – Mathias Ettinger
            Jul 31 at 6:41










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199686%2fmodels-for-analysing-twitter-data-which-based-on-tweepy-package%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          You’ve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py into models.py and radial_bar.py into functional.py (and, btw, this kind of file is usually named utils.py).



          Out of the 5 units presented here, 2 are unused: split_texts and Adjustment. So a quick word about them:



          • You can drop the adjustment parameter of split_texts as it is unused;

          • You can use itertools.chain.from_iterable to flatten nested iterables;

          • You can simplify optional arguments by dropping **kwargs in favor of default values.

          All in all, these two can be condensed to:



          import itertools


          class Adjustment:
          def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
          self.title = title
          self.freq_lim = freq_lim
          self.char_lim = char_lim
          self.exclude = exclude


          def split_texts(texts):
          return list(itertools.chain.from_iterable(map(str.split, texts)))


          But the Adjustment class would be even more succint and easily extensible if it were one of the new dataclasses:



          from dataclasses import dataclass


          @dataclass(frozen=True)
          class Adjustment:
          title: str = "An adjustment."
          freq_lim: int = None
          char_lim: int = None
          exclude: str = None



          filter_unique doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output is 'text' because you store the .text attribute of your objects in unique but you are only checking if the object itself is in unique, not its text. Second, you return unique which may contain tweets, statuses or text; but your usage in the Tweet class suggest that you want to return tweets every time.



          To improve things, we could:



          • check unicity using a set whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ for lists;

          • extract the desired attribute out of the object to test for unicity but return the actual tweet;

          • extract attributes step by step instead of having 4 explicit cases;

          • turn the function into a generator, since it will be fed into the list constructor anyway.

          Proposed improvements:



          from contextlib import suppress


          def filter_unique(tweet_stats_list, output='status'):
          uniques = set()
          for tweet in tweet_stats_list:
          tweet_attr = tweet
          with suppress(AttributeError):
          tweet_attr = tweet_attr.retweeted_status
          if output == 'text':
          tweet_attr = tweet_attr.text
          if tweet_attr not in uniques:
          uniques.add(tweet_attr)
          yield tweet



          The Tweet class feels somewhat fine but I don't understand how it improves value over a simple list based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet in the module models, sure this is some kind of """Tweet model"""…



          I also don't understand why you allow for variable number of arguments (using *args) but only ever use args[0]: you’re not even guaranteed that there is at least 1 element in args. Better use an explicit argument here.



          Lastly, you should use super() instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet and list).



          Proposed improvements:



          import operator


          class Tweets(list):
          def __init__(self, tweets, filter_by_unique=False, **kwargs):
          if filter_by_unique:
          tweets = filter_unique(tweets)

          super().__init__(tweets, **kwargs)

          @property
          def sorted_by_time(self):
          return sorted(self, key=operator.attrgetter('created_at'))

          @property
          def oldest(self):
          return min(self, key=operator.attrgetter('created_at'))

          @property
          def newest(self):
          return max(self, key=operator.attrgetter('created_at'))



          The Author class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.



          Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:



          • Sort the array of tweets by author names before processing it;


          • Group the array by author names to gather together the tweets of a single author.

          Note that I don't know tweepy enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)



          Proposed improvements:



          import operator


          class Authors(object):
          def __init__(self, tweets):
          unicity_key = operator.attrgetter('author.name')
          tweets = sorted(tweets, key=unicity_key)

          self.followers_count =
          self.following_count =
          self.total_tweets =
          self.tweets_count =
          for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
          author_tweets = list(author_tweets)
          author = author_tweets[0].author
          self.followers_count[author.name] = author.followers_count
          self.following_count[author.name] = author.friends_count
          self.total_tweets[author.name] = author.statuses_count
          self.tweets_count[author.name] = len(author_tweets)

          def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
          measurements =
          'Followers': self.followers_count,
          'Following': self.following_count,
          'Total Tweets': self.total_tweets,
          'Sample Tweets': self.tweets_count,

          author_measurement = measurements[measurement]
          sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)

          if isinstance(color, str):
          color = mplcolors.hex2color(mplcolors.cnames[color])

          if incolor_measurement is not None:
          color_measurement = measurements[incolor_measurement]
          minor_max = max(color_measurement.values())
          colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
          measurement = ' (color: )'.format(measurement, incolor_measurement)
          else:
          colors = [color] * len(author_measurement)

          ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
          ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
          ax.set_yticks(ticks)
          ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
          ax.set_xlabel(measurement)
          plt.tight_layout()





          share|improve this answer























          • Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
            – Arief
            Jul 18 at 10:06











          • @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
            – Mathias Ettinger
            Jul 18 at 10:22










          • Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
            – Arief
            Jul 26 at 10:44







          • 1




            @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
            – Mathias Ettinger
            Jul 26 at 11:45







          • 1




            @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
            – Mathias Ettinger
            Jul 31 at 6:41














          up vote
          2
          down vote



          accepted










          You’ve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py into models.py and radial_bar.py into functional.py (and, btw, this kind of file is usually named utils.py).



          Out of the 5 units presented here, 2 are unused: split_texts and Adjustment. So a quick word about them:



          • You can drop the adjustment parameter of split_texts as it is unused;

          • You can use itertools.chain.from_iterable to flatten nested iterables;

          • You can simplify optional arguments by dropping **kwargs in favor of default values.

          All in all, these two can be condensed to:



          import itertools


          class Adjustment:
          def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
          self.title = title
          self.freq_lim = freq_lim
          self.char_lim = char_lim
          self.exclude = exclude


          def split_texts(texts):
          return list(itertools.chain.from_iterable(map(str.split, texts)))


          But the Adjustment class would be even more succint and easily extensible if it were one of the new dataclasses:



          from dataclasses import dataclass


          @dataclass(frozen=True)
          class Adjustment:
          title: str = "An adjustment."
          freq_lim: int = None
          char_lim: int = None
          exclude: str = None



          filter_unique doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output is 'text' because you store the .text attribute of your objects in unique but you are only checking if the object itself is in unique, not its text. Second, you return unique which may contain tweets, statuses or text; but your usage in the Tweet class suggest that you want to return tweets every time.



          To improve things, we could:



          • check unicity using a set whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ for lists;

          • extract the desired attribute out of the object to test for unicity but return the actual tweet;

          • extract attributes step by step instead of having 4 explicit cases;

          • turn the function into a generator, since it will be fed into the list constructor anyway.

          Proposed improvements:



          from contextlib import suppress


          def filter_unique(tweet_stats_list, output='status'):
          uniques = set()
          for tweet in tweet_stats_list:
          tweet_attr = tweet
          with suppress(AttributeError):
          tweet_attr = tweet_attr.retweeted_status
          if output == 'text':
          tweet_attr = tweet_attr.text
          if tweet_attr not in uniques:
          uniques.add(tweet_attr)
          yield tweet



          The Tweet class feels somewhat fine but I don't understand how it improves value over a simple list based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet in the module models, sure this is some kind of """Tweet model"""…



          I also don't understand why you allow for variable number of arguments (using *args) but only ever use args[0]: you’re not even guaranteed that there is at least 1 element in args. Better use an explicit argument here.



          Lastly, you should use super() instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet and list).



          Proposed improvements:



          import operator


          class Tweets(list):
          def __init__(self, tweets, filter_by_unique=False, **kwargs):
          if filter_by_unique:
          tweets = filter_unique(tweets)

          super().__init__(tweets, **kwargs)

          @property
          def sorted_by_time(self):
          return sorted(self, key=operator.attrgetter('created_at'))

          @property
          def oldest(self):
          return min(self, key=operator.attrgetter('created_at'))

          @property
          def newest(self):
          return max(self, key=operator.attrgetter('created_at'))



          The Author class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.



          Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:



          • Sort the array of tweets by author names before processing it;


          • Group the array by author names to gather together the tweets of a single author.

          Note that I don't know tweepy enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)



          Proposed improvements:



          import operator


          class Authors(object):
          def __init__(self, tweets):
          unicity_key = operator.attrgetter('author.name')
          tweets = sorted(tweets, key=unicity_key)

          self.followers_count =
          self.following_count =
          self.total_tweets =
          self.tweets_count =
          for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
          author_tweets = list(author_tweets)
          author = author_tweets[0].author
          self.followers_count[author.name] = author.followers_count
          self.following_count[author.name] = author.friends_count
          self.total_tweets[author.name] = author.statuses_count
          self.tweets_count[author.name] = len(author_tweets)

          def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
          measurements =
          'Followers': self.followers_count,
          'Following': self.following_count,
          'Total Tweets': self.total_tweets,
          'Sample Tweets': self.tweets_count,

          author_measurement = measurements[measurement]
          sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)

          if isinstance(color, str):
          color = mplcolors.hex2color(mplcolors.cnames[color])

          if incolor_measurement is not None:
          color_measurement = measurements[incolor_measurement]
          minor_max = max(color_measurement.values())
          colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
          measurement = ' (color: )'.format(measurement, incolor_measurement)
          else:
          colors = [color] * len(author_measurement)

          ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
          ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
          ax.set_yticks(ticks)
          ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
          ax.set_xlabel(measurement)
          plt.tight_layout()





          share|improve this answer























          • Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
            – Arief
            Jul 18 at 10:06











          • @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
            – Mathias Ettinger
            Jul 18 at 10:22










          • Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
            – Arief
            Jul 26 at 10:44







          • 1




            @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
            – Mathias Ettinger
            Jul 26 at 11:45







          • 1




            @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
            – Mathias Ettinger
            Jul 31 at 6:41












          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          You’ve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py into models.py and radial_bar.py into functional.py (and, btw, this kind of file is usually named utils.py).



          Out of the 5 units presented here, 2 are unused: split_texts and Adjustment. So a quick word about them:



          • You can drop the adjustment parameter of split_texts as it is unused;

          • You can use itertools.chain.from_iterable to flatten nested iterables;

          • You can simplify optional arguments by dropping **kwargs in favor of default values.

          All in all, these two can be condensed to:



          import itertools


          class Adjustment:
          def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
          self.title = title
          self.freq_lim = freq_lim
          self.char_lim = char_lim
          self.exclude = exclude


          def split_texts(texts):
          return list(itertools.chain.from_iterable(map(str.split, texts)))


          But the Adjustment class would be even more succint and easily extensible if it were one of the new dataclasses:



          from dataclasses import dataclass


          @dataclass(frozen=True)
          class Adjustment:
          title: str = "An adjustment."
          freq_lim: int = None
          char_lim: int = None
          exclude: str = None



          filter_unique doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output is 'text' because you store the .text attribute of your objects in unique but you are only checking if the object itself is in unique, not its text. Second, you return unique which may contain tweets, statuses or text; but your usage in the Tweet class suggest that you want to return tweets every time.



          To improve things, we could:



          • check unicity using a set whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ for lists;

          • extract the desired attribute out of the object to test for unicity but return the actual tweet;

          • extract attributes step by step instead of having 4 explicit cases;

          • turn the function into a generator, since it will be fed into the list constructor anyway.

          Proposed improvements:



          from contextlib import suppress


          def filter_unique(tweet_stats_list, output='status'):
          uniques = set()
          for tweet in tweet_stats_list:
          tweet_attr = tweet
          with suppress(AttributeError):
          tweet_attr = tweet_attr.retweeted_status
          if output == 'text':
          tweet_attr = tweet_attr.text
          if tweet_attr not in uniques:
          uniques.add(tweet_attr)
          yield tweet



          The Tweet class feels somewhat fine but I don't understand how it improves value over a simple list based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet in the module models, sure this is some kind of """Tweet model"""…



          I also don't understand why you allow for variable number of arguments (using *args) but only ever use args[0]: you’re not even guaranteed that there is at least 1 element in args. Better use an explicit argument here.



          Lastly, you should use super() instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet and list).



          Proposed improvements:



          import operator


          class Tweets(list):
          def __init__(self, tweets, filter_by_unique=False, **kwargs):
          if filter_by_unique:
          tweets = filter_unique(tweets)

          super().__init__(tweets, **kwargs)

          @property
          def sorted_by_time(self):
          return sorted(self, key=operator.attrgetter('created_at'))

          @property
          def oldest(self):
          return min(self, key=operator.attrgetter('created_at'))

          @property
          def newest(self):
          return max(self, key=operator.attrgetter('created_at'))



          The Author class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.



          Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:



          • Sort the array of tweets by author names before processing it;


          • Group the array by author names to gather together the tweets of a single author.

          Note that I don't know tweepy enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)



          Proposed improvements:



          import operator


          class Authors(object):
          def __init__(self, tweets):
          unicity_key = operator.attrgetter('author.name')
          tweets = sorted(tweets, key=unicity_key)

          self.followers_count =
          self.following_count =
          self.total_tweets =
          self.tweets_count =
          for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
          author_tweets = list(author_tweets)
          author = author_tweets[0].author
          self.followers_count[author.name] = author.followers_count
          self.following_count[author.name] = author.friends_count
          self.total_tweets[author.name] = author.statuses_count
          self.tweets_count[author.name] = len(author_tweets)

          def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
          measurements =
          'Followers': self.followers_count,
          'Following': self.following_count,
          'Total Tweets': self.total_tweets,
          'Sample Tweets': self.tweets_count,

          author_measurement = measurements[measurement]
          sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)

          if isinstance(color, str):
          color = mplcolors.hex2color(mplcolors.cnames[color])

          if incolor_measurement is not None:
          color_measurement = measurements[incolor_measurement]
          minor_max = max(color_measurement.values())
          colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
          measurement = ' (color: )'.format(measurement, incolor_measurement)
          else:
          colors = [color] * len(author_measurement)

          ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
          ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
          ax.set_yticks(ticks)
          ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
          ax.set_xlabel(measurement)
          plt.tight_layout()





          share|improve this answer















          You’ve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py into models.py and radial_bar.py into functional.py (and, btw, this kind of file is usually named utils.py).



          Out of the 5 units presented here, 2 are unused: split_texts and Adjustment. So a quick word about them:



          • You can drop the adjustment parameter of split_texts as it is unused;

          • You can use itertools.chain.from_iterable to flatten nested iterables;

          • You can simplify optional arguments by dropping **kwargs in favor of default values.

          All in all, these two can be condensed to:



          import itertools


          class Adjustment:
          def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
          self.title = title
          self.freq_lim = freq_lim
          self.char_lim = char_lim
          self.exclude = exclude


          def split_texts(texts):
          return list(itertools.chain.from_iterable(map(str.split, texts)))


          But the Adjustment class would be even more succint and easily extensible if it were one of the new dataclasses:



          from dataclasses import dataclass


          @dataclass(frozen=True)
          class Adjustment:
          title: str = "An adjustment."
          freq_lim: int = None
          char_lim: int = None
          exclude: str = None



          filter_unique doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output is 'text' because you store the .text attribute of your objects in unique but you are only checking if the object itself is in unique, not its text. Second, you return unique which may contain tweets, statuses or text; but your usage in the Tweet class suggest that you want to return tweets every time.



          To improve things, we could:



          • check unicity using a set whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ for lists;

          • extract the desired attribute out of the object to test for unicity but return the actual tweet;

          • extract attributes step by step instead of having 4 explicit cases;

          • turn the function into a generator, since it will be fed into the list constructor anyway.

          Proposed improvements:



          from contextlib import suppress


          def filter_unique(tweet_stats_list, output='status'):
          uniques = set()
          for tweet in tweet_stats_list:
          tweet_attr = tweet
          with suppress(AttributeError):
          tweet_attr = tweet_attr.retweeted_status
          if output == 'text':
          tweet_attr = tweet_attr.text
          if tweet_attr not in uniques:
          uniques.add(tweet_attr)
          yield tweet



          The Tweet class feels somewhat fine but I don't understand how it improves value over a simple list based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet in the module models, sure this is some kind of """Tweet model"""…



          I also don't understand why you allow for variable number of arguments (using *args) but only ever use args[0]: you’re not even guaranteed that there is at least 1 element in args. Better use an explicit argument here.



          Lastly, you should use super() instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet and list).



          Proposed improvements:



          import operator


          class Tweets(list):
          def __init__(self, tweets, filter_by_unique=False, **kwargs):
          if filter_by_unique:
          tweets = filter_unique(tweets)

          super().__init__(tweets, **kwargs)

          @property
          def sorted_by_time(self):
          return sorted(self, key=operator.attrgetter('created_at'))

          @property
          def oldest(self):
          return min(self, key=operator.attrgetter('created_at'))

          @property
          def newest(self):
          return max(self, key=operator.attrgetter('created_at'))



          The Author class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.



          Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:



          • Sort the array of tweets by author names before processing it;


          • Group the array by author names to gather together the tweets of a single author.

          Note that I don't know tweepy enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)



          Proposed improvements:



          import operator


          class Authors(object):
          def __init__(self, tweets):
          unicity_key = operator.attrgetter('author.name')
          tweets = sorted(tweets, key=unicity_key)

          self.followers_count =
          self.following_count =
          self.total_tweets =
          self.tweets_count =
          for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
          author_tweets = list(author_tweets)
          author = author_tweets[0].author
          self.followers_count[author.name] = author.followers_count
          self.following_count[author.name] = author.friends_count
          self.total_tweets[author.name] = author.statuses_count
          self.tweets_count[author.name] = len(author_tweets)

          def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
          measurements =
          'Followers': self.followers_count,
          'Following': self.following_count,
          'Total Tweets': self.total_tweets,
          'Sample Tweets': self.tweets_count,

          author_measurement = measurements[measurement]
          sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)

          if isinstance(color, str):
          color = mplcolors.hex2color(mplcolors.cnames[color])

          if incolor_measurement is not None:
          color_measurement = measurements[incolor_measurement]
          minor_max = max(color_measurement.values())
          colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
          measurement = ' (color: )'.format(measurement, incolor_measurement)
          else:
          colors = [color] * len(author_measurement)

          ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
          ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
          ax.set_yticks(ticks)
          ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
          ax.set_xlabel(measurement)
          plt.tight_layout()






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jul 18 at 10:34


























          answered Jul 18 at 0:40









          Mathias Ettinger

          21.7k32875




          21.7k32875











          • Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
            – Arief
            Jul 18 at 10:06











          • @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
            – Mathias Ettinger
            Jul 18 at 10:22










          • Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
            – Arief
            Jul 26 at 10:44







          • 1




            @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
            – Mathias Ettinger
            Jul 26 at 11:45







          • 1




            @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
            – Mathias Ettinger
            Jul 31 at 6:41
















          • Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
            – Arief
            Jul 18 at 10:06











          • @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
            – Mathias Ettinger
            Jul 18 at 10:22










          • Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
            – Arief
            Jul 26 at 10:44







          • 1




            @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
            – Mathias Ettinger
            Jul 26 at 11:45







          • 1




            @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
            – Mathias Ettinger
            Jul 31 at 6:41















          Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
          – Arief
          Jul 18 at 10:06





          Thanks for the review. The Adjustment class may be updated with more inputs for more possible adjustments.
          – Arief
          Jul 18 at 10:06













          @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
          – Mathias Ettinger
          Jul 18 at 10:22




          @Arief Possibly. But you don't extract anything else from kwargs and you don't store it either; so there is no need in having it around.
          – Mathias Ettinger
          Jul 18 at 10:22












          Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
          – Arief
          Jul 26 at 10:44





          Whyfilter_unique is not useful..? the argument output=... is for preference whether the output is in form of Status with full features, or only the text. Each Status object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
          – Arief
          Jul 26 at 10:44





          1




          1




          @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
          – Mathias Ettinger
          Jul 26 at 11:45





          @Arief In your original version, if output is text, you check for status but store text in unique; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
          – Mathias Ettinger
          Jul 26 at 11:45





          1




          1




          @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
          – Mathias Ettinger
          Jul 31 at 6:41




          @Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the operator module is implemented in C.
          – Mathias Ettinger
          Jul 31 at 6:41












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199686%2fmodels-for-analysing-twitter-data-which-based-on-tweepy-package%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Chat program with C++ and SFML

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          Will my employers contract hold up in court?