Models for analysing Twitter data (*which based on tweepy package)
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
I have three modules here models.py
, functionals.py
, and adjustment.py
. These three may be used to perform analysis of Twitter data through tweepy
objects.
The Tweets
class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors
class.
The Authors
class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot
function).
Here is an example usage :
import numpy
from models import Authors
from models import Tweets
import matplotlib.pyplot as plt
stats = numpy.load('testfile.npy')
tweets = Tweets(stats)
Model = Authors(tweets)
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
plt.show()
The example above plots the barplots, with bar colors representing the incolor_measurement
variable.
The testfile.npy
is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy
, collected a few days ago.
My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?
The modules :
adjustment.py
:
import copy
class Adjustment(object):
def __init__(self, title = "an adjustment.", **kwargs):
self.title = title
if 'freq_lim' in kwargs:
self.freq_lim = kwargs['freq_lim']
else:
self.freq_lim = None
if 'char_lim' in kwargs:
self.char_lim = kwargs['char_lim']
else:
self.char_lim = None
if 'exclude' in kwargs:
self.exclude = kwargs['exclude']
else:
self.exclude = None
functionals.py
:
def filter_unique(tweet_stats_list, output = 'status'):
stats = tweet_stats_list
unique =
for tweet in stats:
try:
if not (tweet.retweeted_status in unique):
if output == 'status':
unique.append(tweet.retweeted_status)
elif output == 'text':
unique.append(tweet.retweeted_status.text)
except:
if not (tweet in unique):
if output == 'status':
unique.append(tweet)
elif output == 'text':
unique.append(tweet.text)
return unique
def split_texts(texts, adjustment = None):
split_list =
for text in texts:
split = text.split()
split_list.extend(split)
return split_list
models.py
:
import copy
import numpy
import matplotlib.pyplot as plt
from matplotlib import colors as mplcolors
from . import adjustment as adjust
from . import functionals as func
class Tweets(list):
"""
Tweets model.
"""
def __init__(self, *args, filter_by_unique = False, **kwargs):
if filter_by_unique:
tweets = func.filter_unique(args[0])
else:
tweets = args[0]
list.__init__(self, tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key = lambda x: x.created_at)
@property
def oldest(self):
return self.sorted_by_time[0]
@property
def newest(self):
return self.sorted_by_time[-1]
class Authors(object):
"""
Authors model.
"""
def __init__(self, tweets):
self.tweets = tweets
self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.followers_count = author: self.authors[author].followers_count for author in self.authors
self.following_count = author: self.authors[author].friends_count for author in self.authors
self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author
def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):
measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])
if type(color) == str:
color = mplcolors.hex2color(mplcolors.cnames[color])
colors = len(self.authors)*[color]
if incolor_measurement != None:
minor_max = max(measurements[incolor_measurement].values())
transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
colors = [(color[0], color[1], color[2], trans) for trans in transparency]
var = [i+height for i in range(len(self.authors))]
ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
ax.set_yticks(var)
ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)
if incolor_measurement != None:
ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
else :
ax.set_xlabel(measurement)
plt.tight_layout()
Result of example usage :
python object-oriented functional-programming twitter tweepy
add a comment |Â
up vote
1
down vote
favorite
I have three modules here models.py
, functionals.py
, and adjustment.py
. These three may be used to perform analysis of Twitter data through tweepy
objects.
The Tweets
class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors
class.
The Authors
class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot
function).
Here is an example usage :
import numpy
from models import Authors
from models import Tweets
import matplotlib.pyplot as plt
stats = numpy.load('testfile.npy')
tweets = Tweets(stats)
Model = Authors(tweets)
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
plt.show()
The example above plots the barplots, with bar colors representing the incolor_measurement
variable.
The testfile.npy
is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy
, collected a few days ago.
My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?
The modules :
adjustment.py
:
import copy
class Adjustment(object):
def __init__(self, title = "an adjustment.", **kwargs):
self.title = title
if 'freq_lim' in kwargs:
self.freq_lim = kwargs['freq_lim']
else:
self.freq_lim = None
if 'char_lim' in kwargs:
self.char_lim = kwargs['char_lim']
else:
self.char_lim = None
if 'exclude' in kwargs:
self.exclude = kwargs['exclude']
else:
self.exclude = None
functionals.py
:
def filter_unique(tweet_stats_list, output = 'status'):
stats = tweet_stats_list
unique =
for tweet in stats:
try:
if not (tweet.retweeted_status in unique):
if output == 'status':
unique.append(tweet.retweeted_status)
elif output == 'text':
unique.append(tweet.retweeted_status.text)
except:
if not (tweet in unique):
if output == 'status':
unique.append(tweet)
elif output == 'text':
unique.append(tweet.text)
return unique
def split_texts(texts, adjustment = None):
split_list =
for text in texts:
split = text.split()
split_list.extend(split)
return split_list
models.py
:
import copy
import numpy
import matplotlib.pyplot as plt
from matplotlib import colors as mplcolors
from . import adjustment as adjust
from . import functionals as func
class Tweets(list):
"""
Tweets model.
"""
def __init__(self, *args, filter_by_unique = False, **kwargs):
if filter_by_unique:
tweets = func.filter_unique(args[0])
else:
tweets = args[0]
list.__init__(self, tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key = lambda x: x.created_at)
@property
def oldest(self):
return self.sorted_by_time[0]
@property
def newest(self):
return self.sorted_by_time[-1]
class Authors(object):
"""
Authors model.
"""
def __init__(self, tweets):
self.tweets = tweets
self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.followers_count = author: self.authors[author].followers_count for author in self.authors
self.following_count = author: self.authors[author].friends_count for author in self.authors
self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author
def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):
measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])
if type(color) == str:
color = mplcolors.hex2color(mplcolors.cnames[color])
colors = len(self.authors)*[color]
if incolor_measurement != None:
minor_max = max(measurements[incolor_measurement].values())
transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
colors = [(color[0], color[1], color[2], trans) for trans in transparency]
var = [i+height for i in range(len(self.authors))]
ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
ax.set_yticks(var)
ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)
if incolor_measurement != None:
ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
else :
ax.set_xlabel(measurement)
plt.tight_layout()
Result of example usage :
python object-oriented functional-programming twitter tweepy
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have three modules here models.py
, functionals.py
, and adjustment.py
. These three may be used to perform analysis of Twitter data through tweepy
objects.
The Tweets
class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors
class.
The Authors
class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot
function).
Here is an example usage :
import numpy
from models import Authors
from models import Tweets
import matplotlib.pyplot as plt
stats = numpy.load('testfile.npy')
tweets = Tweets(stats)
Model = Authors(tweets)
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
plt.show()
The example above plots the barplots, with bar colors representing the incolor_measurement
variable.
The testfile.npy
is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy
, collected a few days ago.
My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?
The modules :
adjustment.py
:
import copy
class Adjustment(object):
def __init__(self, title = "an adjustment.", **kwargs):
self.title = title
if 'freq_lim' in kwargs:
self.freq_lim = kwargs['freq_lim']
else:
self.freq_lim = None
if 'char_lim' in kwargs:
self.char_lim = kwargs['char_lim']
else:
self.char_lim = None
if 'exclude' in kwargs:
self.exclude = kwargs['exclude']
else:
self.exclude = None
functionals.py
:
def filter_unique(tweet_stats_list, output = 'status'):
stats = tweet_stats_list
unique =
for tweet in stats:
try:
if not (tweet.retweeted_status in unique):
if output == 'status':
unique.append(tweet.retweeted_status)
elif output == 'text':
unique.append(tweet.retweeted_status.text)
except:
if not (tweet in unique):
if output == 'status':
unique.append(tweet)
elif output == 'text':
unique.append(tweet.text)
return unique
def split_texts(texts, adjustment = None):
split_list =
for text in texts:
split = text.split()
split_list.extend(split)
return split_list
models.py
:
import copy
import numpy
import matplotlib.pyplot as plt
from matplotlib import colors as mplcolors
from . import adjustment as adjust
from . import functionals as func
class Tweets(list):
"""
Tweets model.
"""
def __init__(self, *args, filter_by_unique = False, **kwargs):
if filter_by_unique:
tweets = func.filter_unique(args[0])
else:
tweets = args[0]
list.__init__(self, tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key = lambda x: x.created_at)
@property
def oldest(self):
return self.sorted_by_time[0]
@property
def newest(self):
return self.sorted_by_time[-1]
class Authors(object):
"""
Authors model.
"""
def __init__(self, tweets):
self.tweets = tweets
self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.followers_count = author: self.authors[author].followers_count for author in self.authors
self.following_count = author: self.authors[author].friends_count for author in self.authors
self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author
def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):
measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])
if type(color) == str:
color = mplcolors.hex2color(mplcolors.cnames[color])
colors = len(self.authors)*[color]
if incolor_measurement != None:
minor_max = max(measurements[incolor_measurement].values())
transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
colors = [(color[0], color[1], color[2], trans) for trans in transparency]
var = [i+height for i in range(len(self.authors))]
ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
ax.set_yticks(var)
ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)
if incolor_measurement != None:
ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
else :
ax.set_xlabel(measurement)
plt.tight_layout()
Result of example usage :
python object-oriented functional-programming twitter tweepy
I have three modules here models.py
, functionals.py
, and adjustment.py
. These three may be used to perform analysis of Twitter data through tweepy
objects.
The Tweets
class is a simple one which takes 'tweet' objects and take it as a 'list', which is necessary (not yet implemented as necessary) as an argument of Authors
class.
The Authors
class is used to perform analysis of tweets by their authors/users. For example, we can plot the number of Followers of each user appearing in the data (the plot is by hbar_plot
function).
Here is an example usage :
import numpy
from models import Authors
from models import Tweets
import matplotlib.pyplot as plt
stats = numpy.load('testfile.npy')
tweets = Tweets(stats)
Model = Authors(tweets)
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Followers', incolor_measurement = 'Following', height = 0.5, color = (1,0,0,1))
fig, ax = plt.subplots(1, 1)
Model.hbar_plot(ax, measurement = 'Sample Tweets', incolor_measurement = 'Followers', height = 0.5, color = (1,0,0,1))
plt.show()
The example above plots the barplots, with bar colors representing the incolor_measurement
variable.
The testfile.npy
is available in https://github.com/anbarief/statistweepy/blob/master/examples/testfile.npy . This file consists of about 200 'tweet' objects of tweepy
, collected a few days ago.
My question is, how to make better organization of the classes and their methods, also the functions here? How to write and organize it better?
The modules :
adjustment.py
:
import copy
class Adjustment(object):
def __init__(self, title = "an adjustment.", **kwargs):
self.title = title
if 'freq_lim' in kwargs:
self.freq_lim = kwargs['freq_lim']
else:
self.freq_lim = None
if 'char_lim' in kwargs:
self.char_lim = kwargs['char_lim']
else:
self.char_lim = None
if 'exclude' in kwargs:
self.exclude = kwargs['exclude']
else:
self.exclude = None
functionals.py
:
def filter_unique(tweet_stats_list, output = 'status'):
stats = tweet_stats_list
unique =
for tweet in stats:
try:
if not (tweet.retweeted_status in unique):
if output == 'status':
unique.append(tweet.retweeted_status)
elif output == 'text':
unique.append(tweet.retweeted_status.text)
except:
if not (tweet in unique):
if output == 'status':
unique.append(tweet)
elif output == 'text':
unique.append(tweet.text)
return unique
def split_texts(texts, adjustment = None):
split_list =
for text in texts:
split = text.split()
split_list.extend(split)
return split_list
models.py
:
import copy
import numpy
import matplotlib.pyplot as plt
from matplotlib import colors as mplcolors
from . import adjustment as adjust
from . import functionals as func
class Tweets(list):
"""
Tweets model.
"""
def __init__(self, *args, filter_by_unique = False, **kwargs):
if filter_by_unique:
tweets = func.filter_unique(args[0])
else:
tweets = args[0]
list.__init__(self, tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key = lambda x: x.created_at)
@property
def oldest(self):
return self.sorted_by_time[0]
@property
def newest(self):
return self.sorted_by_time[-1]
class Authors(object):
"""
Authors model.
"""
def __init__(self, tweets):
self.tweets = tweets
self.authors = author.name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.username = author.screen_name : author for author in list(set([tweet.author for tweet in self.tweets]))
self.followers_count = author: self.authors[author].followers_count for author in self.authors
self.following_count = author: self.authors[author].friends_count for author in self.authors
self.totaltweets = author: self.authors[author].statuses_count for author in self.authors
self.tweets_by_author = author: [tweet for tweet in self.tweets if tweet.author.name == author] for author in self.authors
self.tweets_count = author: len(self.tweets_by_author[author]) for author in self.tweets_by_author
def hbar_plot(self, ax, measurement = 'Followers', color = (0,0,1,1), incolor_measurement = None, height = 1, textsize = 7, **kwargs):
measurements = 'Followers': self.followers_count, 'Following' : self.following_count, 'Total Tweets' : self.totaltweets, 'Sample Tweets' : self.tweets_count
sorted_authors = sorted(measurements[measurement], key = lambda x : measurements[measurement][x])
if type(color) == str:
color = mplcolors.hex2color(mplcolors.cnames[color])
colors = len(self.authors)*[color]
if incolor_measurement != None:
minor_max = max(measurements[incolor_measurement].values())
transparency = [measurements[incolor_measurement][author]/minor_max for author in sorted_authors]
colors = [(color[0], color[1], color[2], trans) for trans in transparency]
var = [i+height for i in range(len(self.authors))]
ax.barh([i-height/2 for i in var], [measurements[measurement][author] for author in sorted_authors], height = height, color = colors, **kwargs)
ax.set_yticks(var)
ax.set_yticklabels(sorted_authors, rotation = 'horizontal', size = textsize)
if incolor_measurement != None:
ax.set_xlabel(measurement + ' (color : '+incolor_measurement+')')
else :
ax.set_xlabel(measurement)
plt.tight_layout()
Result of example usage :
python object-oriented functional-programming twitter tweepy
edited Jul 30 at 17:51
asked Jul 17 at 14:06
Arief
405112
405112
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
YouâÂÂve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py
into models.py
and radial_bar.py
into functional.py
(and, btw, this kind of file is usually named utils.py
).
Out of the 5 units presented here, 2 are unused: split_texts
and Adjustment
. So a quick word about them:
- You can drop the
adjustment
parameter ofsplit_texts
as it is unused; - You can use
itertools.chain.from_iterable
to flatten nested iterables; - You can simplify optional arguments by dropping
**kwargs
in favor of default values.
All in all, these two can be condensed to:
import itertools
class Adjustment:
def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
self.title = title
self.freq_lim = freq_lim
self.char_lim = char_lim
self.exclude = exclude
def split_texts(texts):
return list(itertools.chain.from_iterable(map(str.split, texts)))
But the Adjustment
class would be even more succint and easily extensible if it were one of the new dataclasses
:
from dataclasses import dataclass
@dataclass(frozen=True)
class Adjustment:
title: str = "An adjustment."
freq_lim: int = None
char_lim: int = None
exclude: str = None
filter_unique
doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output
is 'text'
because you store the .text
attribute of your objects in unique
but you are only checking if the object itself is in unique, not its text. Second, you return unique
which may contain tweets, statuses or text; but your usage in the Tweet
class suggest that you want to return tweets every time.
To improve things, we could:
- check unicity using a
set
whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ forlist
s; - extract the desired attribute out of the object to test for unicity but return the actual tweet;
- extract attributes step by step instead of having 4ÃÂ explicit cases;
- turn the function into a generator, since it will be fed into the
list
constructor anyway.
Proposed improvements:
from contextlib import suppress
def filter_unique(tweet_stats_list, output='status'):
uniques = set()
for tweet in tweet_stats_list:
tweet_attr = tweet
with suppress(AttributeError):
tweet_attr = tweet_attr.retweeted_status
if output == 'text':
tweet_attr = tweet_attr.text
if tweet_attr not in uniques:
uniques.add(tweet_attr)
yield tweet
The Tweet
class feels somewhat fine but I don't understand how it improves value over a simple list
based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet
in the module models
, sure this is some kind of """Tweet model"""
â¦
I also don't understand why you allow for variable number of arguments (using *args
) but only ever use args[0]
: youâÂÂre not even guaranteed that there is at least 1 element in args
. Better use an explicit argument here.
Lastly, you should use super()
instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet
and list
).
Proposed improvements:
import operator
class Tweets(list):
def __init__(self, tweets, filter_by_unique=False, **kwargs):
if filter_by_unique:
tweets = filter_unique(tweets)
super().__init__(tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key=operator.attrgetter('created_at'))
@property
def oldest(self):
return min(self, key=operator.attrgetter('created_at'))
@property
def newest(self):
return max(self, key=operator.attrgetter('created_at'))
The Author
class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.
Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:
- Sort the array of tweets by author names before processing it;
Group the array by author names to gather together the tweets of a single author.
Note that I don't know tweepy
enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)
Proposed improvements:
import operator
class Authors(object):
def __init__(self, tweets):
unicity_key = operator.attrgetter('author.name')
tweets = sorted(tweets, key=unicity_key)
self.followers_count =
self.following_count =
self.total_tweets =
self.tweets_count =
for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
author_tweets = list(author_tweets)
author = author_tweets[0].author
self.followers_count[author.name] = author.followers_count
self.following_count[author.name] = author.friends_count
self.total_tweets[author.name] = author.statuses_count
self.tweets_count[author.name] = len(author_tweets)
def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
measurements =
'Followers': self.followers_count,
'Following': self.following_count,
'Total Tweets': self.total_tweets,
'Sample Tweets': self.tweets_count,
author_measurement = measurements[measurement]
sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)
if isinstance(color, str):
color = mplcolors.hex2color(mplcolors.cnames[color])
if incolor_measurement is not None:
color_measurement = measurements[incolor_measurement]
minor_max = max(color_measurement.values())
colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
measurement = ' (color: )'.format(measurement, incolor_measurement)
else:
colors = [color] * len(author_measurement)
ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
ax.set_yticks(ticks)
ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
ax.set_xlabel(measurement)
plt.tight_layout()
Thanks for the review. TheAdjustment
class may be updated with more inputs for more possible adjustments.
â Arief
Jul 18 at 10:06
@Arief Possibly. But you don't extract anything else fromkwargs
and you don't store it either; so there is no need in having it around.
â Mathias Ettinger
Jul 18 at 10:22
Whyfilter_unique
is not useful..? the argumentoutput=...
is for preference whether the output is in form ofStatus
with full features, or only thetext
. EachStatus
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
â Arief
Jul 26 at 10:44
1
@Arief In your original version, ifoutput
is text, you check for status but store text inunique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
â Mathias Ettinger
Jul 26 at 11:45
1
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and theoperator
module is implemented in C.
â Mathias Ettinger
Jul 31 at 6:41
 |Â
show 3 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
YouâÂÂve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py
into models.py
and radial_bar.py
into functional.py
(and, btw, this kind of file is usually named utils.py
).
Out of the 5 units presented here, 2 are unused: split_texts
and Adjustment
. So a quick word about them:
- You can drop the
adjustment
parameter ofsplit_texts
as it is unused; - You can use
itertools.chain.from_iterable
to flatten nested iterables; - You can simplify optional arguments by dropping
**kwargs
in favor of default values.
All in all, these two can be condensed to:
import itertools
class Adjustment:
def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
self.title = title
self.freq_lim = freq_lim
self.char_lim = char_lim
self.exclude = exclude
def split_texts(texts):
return list(itertools.chain.from_iterable(map(str.split, texts)))
But the Adjustment
class would be even more succint and easily extensible if it were one of the new dataclasses
:
from dataclasses import dataclass
@dataclass(frozen=True)
class Adjustment:
title: str = "An adjustment."
freq_lim: int = None
char_lim: int = None
exclude: str = None
filter_unique
doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output
is 'text'
because you store the .text
attribute of your objects in unique
but you are only checking if the object itself is in unique, not its text. Second, you return unique
which may contain tweets, statuses or text; but your usage in the Tweet
class suggest that you want to return tweets every time.
To improve things, we could:
- check unicity using a
set
whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ forlist
s; - extract the desired attribute out of the object to test for unicity but return the actual tweet;
- extract attributes step by step instead of having 4ÃÂ explicit cases;
- turn the function into a generator, since it will be fed into the
list
constructor anyway.
Proposed improvements:
from contextlib import suppress
def filter_unique(tweet_stats_list, output='status'):
uniques = set()
for tweet in tweet_stats_list:
tweet_attr = tweet
with suppress(AttributeError):
tweet_attr = tweet_attr.retweeted_status
if output == 'text':
tweet_attr = tweet_attr.text
if tweet_attr not in uniques:
uniques.add(tweet_attr)
yield tweet
The Tweet
class feels somewhat fine but I don't understand how it improves value over a simple list
based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet
in the module models
, sure this is some kind of """Tweet model"""
â¦
I also don't understand why you allow for variable number of arguments (using *args
) but only ever use args[0]
: youâÂÂre not even guaranteed that there is at least 1 element in args
. Better use an explicit argument here.
Lastly, you should use super()
instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet
and list
).
Proposed improvements:
import operator
class Tweets(list):
def __init__(self, tweets, filter_by_unique=False, **kwargs):
if filter_by_unique:
tweets = filter_unique(tweets)
super().__init__(tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key=operator.attrgetter('created_at'))
@property
def oldest(self):
return min(self, key=operator.attrgetter('created_at'))
@property
def newest(self):
return max(self, key=operator.attrgetter('created_at'))
The Author
class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.
Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:
- Sort the array of tweets by author names before processing it;
Group the array by author names to gather together the tweets of a single author.
Note that I don't know tweepy
enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)
Proposed improvements:
import operator
class Authors(object):
def __init__(self, tweets):
unicity_key = operator.attrgetter('author.name')
tweets = sorted(tweets, key=unicity_key)
self.followers_count =
self.following_count =
self.total_tweets =
self.tweets_count =
for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
author_tweets = list(author_tweets)
author = author_tweets[0].author
self.followers_count[author.name] = author.followers_count
self.following_count[author.name] = author.friends_count
self.total_tweets[author.name] = author.statuses_count
self.tweets_count[author.name] = len(author_tweets)
def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
measurements =
'Followers': self.followers_count,
'Following': self.following_count,
'Total Tweets': self.total_tweets,
'Sample Tweets': self.tweets_count,
author_measurement = measurements[measurement]
sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)
if isinstance(color, str):
color = mplcolors.hex2color(mplcolors.cnames[color])
if incolor_measurement is not None:
color_measurement = measurements[incolor_measurement]
minor_max = max(color_measurement.values())
colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
measurement = ' (color: )'.format(measurement, incolor_measurement)
else:
colors = [color] * len(author_measurement)
ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
ax.set_yticks(ticks)
ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
ax.set_xlabel(measurement)
plt.tight_layout()
Thanks for the review. TheAdjustment
class may be updated with more inputs for more possible adjustments.
â Arief
Jul 18 at 10:06
@Arief Possibly. But you don't extract anything else fromkwargs
and you don't store it either; so there is no need in having it around.
â Mathias Ettinger
Jul 18 at 10:22
Whyfilter_unique
is not useful..? the argumentoutput=...
is for preference whether the output is in form ofStatus
with full features, or only thetext
. EachStatus
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
â Arief
Jul 26 at 10:44
1
@Arief In your original version, ifoutput
is text, you check for status but store text inunique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
â Mathias Ettinger
Jul 26 at 11:45
1
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and theoperator
module is implemented in C.
â Mathias Ettinger
Jul 31 at 6:41
 |Â
show 3 more comments
up vote
2
down vote
accepted
YouâÂÂve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py
into models.py
and radial_bar.py
into functional.py
(and, btw, this kind of file is usually named utils.py
).
Out of the 5 units presented here, 2 are unused: split_texts
and Adjustment
. So a quick word about them:
- You can drop the
adjustment
parameter ofsplit_texts
as it is unused; - You can use
itertools.chain.from_iterable
to flatten nested iterables; - You can simplify optional arguments by dropping
**kwargs
in favor of default values.
All in all, these two can be condensed to:
import itertools
class Adjustment:
def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
self.title = title
self.freq_lim = freq_lim
self.char_lim = char_lim
self.exclude = exclude
def split_texts(texts):
return list(itertools.chain.from_iterable(map(str.split, texts)))
But the Adjustment
class would be even more succint and easily extensible if it were one of the new dataclasses
:
from dataclasses import dataclass
@dataclass(frozen=True)
class Adjustment:
title: str = "An adjustment."
freq_lim: int = None
char_lim: int = None
exclude: str = None
filter_unique
doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output
is 'text'
because you store the .text
attribute of your objects in unique
but you are only checking if the object itself is in unique, not its text. Second, you return unique
which may contain tweets, statuses or text; but your usage in the Tweet
class suggest that you want to return tweets every time.
To improve things, we could:
- check unicity using a
set
whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ forlist
s; - extract the desired attribute out of the object to test for unicity but return the actual tweet;
- extract attributes step by step instead of having 4ÃÂ explicit cases;
- turn the function into a generator, since it will be fed into the
list
constructor anyway.
Proposed improvements:
from contextlib import suppress
def filter_unique(tweet_stats_list, output='status'):
uniques = set()
for tweet in tweet_stats_list:
tweet_attr = tweet
with suppress(AttributeError):
tweet_attr = tweet_attr.retweeted_status
if output == 'text':
tweet_attr = tweet_attr.text
if tweet_attr not in uniques:
uniques.add(tweet_attr)
yield tweet
The Tweet
class feels somewhat fine but I don't understand how it improves value over a simple list
based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet
in the module models
, sure this is some kind of """Tweet model"""
â¦
I also don't understand why you allow for variable number of arguments (using *args
) but only ever use args[0]
: youâÂÂre not even guaranteed that there is at least 1 element in args
. Better use an explicit argument here.
Lastly, you should use super()
instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet
and list
).
Proposed improvements:
import operator
class Tweets(list):
def __init__(self, tweets, filter_by_unique=False, **kwargs):
if filter_by_unique:
tweets = filter_unique(tweets)
super().__init__(tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key=operator.attrgetter('created_at'))
@property
def oldest(self):
return min(self, key=operator.attrgetter('created_at'))
@property
def newest(self):
return max(self, key=operator.attrgetter('created_at'))
The Author
class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.
Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:
- Sort the array of tweets by author names before processing it;
Group the array by author names to gather together the tweets of a single author.
Note that I don't know tweepy
enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)
Proposed improvements:
import operator
class Authors(object):
def __init__(self, tweets):
unicity_key = operator.attrgetter('author.name')
tweets = sorted(tweets, key=unicity_key)
self.followers_count =
self.following_count =
self.total_tweets =
self.tweets_count =
for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
author_tweets = list(author_tweets)
author = author_tweets[0].author
self.followers_count[author.name] = author.followers_count
self.following_count[author.name] = author.friends_count
self.total_tweets[author.name] = author.statuses_count
self.tweets_count[author.name] = len(author_tweets)
def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
measurements =
'Followers': self.followers_count,
'Following': self.following_count,
'Total Tweets': self.total_tweets,
'Sample Tweets': self.tweets_count,
author_measurement = measurements[measurement]
sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)
if isinstance(color, str):
color = mplcolors.hex2color(mplcolors.cnames[color])
if incolor_measurement is not None:
color_measurement = measurements[incolor_measurement]
minor_max = max(color_measurement.values())
colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
measurement = ' (color: )'.format(measurement, incolor_measurement)
else:
colors = [color] * len(author_measurement)
ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
ax.set_yticks(ticks)
ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
ax.set_xlabel(measurement)
plt.tight_layout()
Thanks for the review. TheAdjustment
class may be updated with more inputs for more possible adjustments.
â Arief
Jul 18 at 10:06
@Arief Possibly. But you don't extract anything else fromkwargs
and you don't store it either; so there is no need in having it around.
â Mathias Ettinger
Jul 18 at 10:22
Whyfilter_unique
is not useful..? the argumentoutput=...
is for preference whether the output is in form ofStatus
with full features, or only thetext
. EachStatus
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
â Arief
Jul 26 at 10:44
1
@Arief In your original version, ifoutput
is text, you check for status but store text inunique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
â Mathias Ettinger
Jul 26 at 11:45
1
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and theoperator
module is implemented in C.
â Mathias Ettinger
Jul 31 at 6:41
 |Â
show 3 more comments
up vote
2
down vote
accepted
up vote
2
down vote
accepted
YouâÂÂve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py
into models.py
and radial_bar.py
into functional.py
(and, btw, this kind of file is usually named utils.py
).
Out of the 5 units presented here, 2 are unused: split_texts
and Adjustment
. So a quick word about them:
- You can drop the
adjustment
parameter ofsplit_texts
as it is unused; - You can use
itertools.chain.from_iterable
to flatten nested iterables; - You can simplify optional arguments by dropping
**kwargs
in favor of default values.
All in all, these two can be condensed to:
import itertools
class Adjustment:
def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
self.title = title
self.freq_lim = freq_lim
self.char_lim = char_lim
self.exclude = exclude
def split_texts(texts):
return list(itertools.chain.from_iterable(map(str.split, texts)))
But the Adjustment
class would be even more succint and easily extensible if it were one of the new dataclasses
:
from dataclasses import dataclass
@dataclass(frozen=True)
class Adjustment:
title: str = "An adjustment."
freq_lim: int = None
char_lim: int = None
exclude: str = None
filter_unique
doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output
is 'text'
because you store the .text
attribute of your objects in unique
but you are only checking if the object itself is in unique, not its text. Second, you return unique
which may contain tweets, statuses or text; but your usage in the Tweet
class suggest that you want to return tweets every time.
To improve things, we could:
- check unicity using a
set
whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ forlist
s; - extract the desired attribute out of the object to test for unicity but return the actual tweet;
- extract attributes step by step instead of having 4ÃÂ explicit cases;
- turn the function into a generator, since it will be fed into the
list
constructor anyway.
Proposed improvements:
from contextlib import suppress
def filter_unique(tweet_stats_list, output='status'):
uniques = set()
for tweet in tweet_stats_list:
tweet_attr = tweet
with suppress(AttributeError):
tweet_attr = tweet_attr.retweeted_status
if output == 'text':
tweet_attr = tweet_attr.text
if tweet_attr not in uniques:
uniques.add(tweet_attr)
yield tweet
The Tweet
class feels somewhat fine but I don't understand how it improves value over a simple list
based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet
in the module models
, sure this is some kind of """Tweet model"""
â¦
I also don't understand why you allow for variable number of arguments (using *args
) but only ever use args[0]
: youâÂÂre not even guaranteed that there is at least 1 element in args
. Better use an explicit argument here.
Lastly, you should use super()
instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet
and list
).
Proposed improvements:
import operator
class Tweets(list):
def __init__(self, tweets, filter_by_unique=False, **kwargs):
if filter_by_unique:
tweets = filter_unique(tweets)
super().__init__(tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key=operator.attrgetter('created_at'))
@property
def oldest(self):
return min(self, key=operator.attrgetter('created_at'))
@property
def newest(self):
return max(self, key=operator.attrgetter('created_at'))
The Author
class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.
Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:
- Sort the array of tweets by author names before processing it;
Group the array by author names to gather together the tweets of a single author.
Note that I don't know tweepy
enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)
Proposed improvements:
import operator
class Authors(object):
def __init__(self, tweets):
unicity_key = operator.attrgetter('author.name')
tweets = sorted(tweets, key=unicity_key)
self.followers_count =
self.following_count =
self.total_tweets =
self.tweets_count =
for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
author_tweets = list(author_tweets)
author = author_tweets[0].author
self.followers_count[author.name] = author.followers_count
self.following_count[author.name] = author.friends_count
self.total_tweets[author.name] = author.statuses_count
self.tweets_count[author.name] = len(author_tweets)
def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
measurements =
'Followers': self.followers_count,
'Following': self.following_count,
'Total Tweets': self.total_tweets,
'Sample Tweets': self.tweets_count,
author_measurement = measurements[measurement]
sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)
if isinstance(color, str):
color = mplcolors.hex2color(mplcolors.cnames[color])
if incolor_measurement is not None:
color_measurement = measurements[incolor_measurement]
minor_max = max(color_measurement.values())
colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
measurement = ' (color: )'.format(measurement, incolor_measurement)
else:
colors = [color] * len(author_measurement)
ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
ax.set_yticks(ticks)
ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
ax.set_xlabel(measurement)
plt.tight_layout()
YouâÂÂve presented 5 units (classes, functions) in 3 different files. I understand that you have a few more in your repository but I think this is over-splitting. You should at least merge adjustment.py
into models.py
and radial_bar.py
into functional.py
(and, btw, this kind of file is usually named utils.py
).
Out of the 5 units presented here, 2 are unused: split_texts
and Adjustment
. So a quick word about them:
- You can drop the
adjustment
parameter ofsplit_texts
as it is unused; - You can use
itertools.chain.from_iterable
to flatten nested iterables; - You can simplify optional arguments by dropping
**kwargs
in favor of default values.
All in all, these two can be condensed to:
import itertools
class Adjustment:
def __init__(self, title="An adjustment.", freq_lim=None, char_lim=None, exclude=None):
self.title = title
self.freq_lim = freq_lim
self.char_lim = char_lim
self.exclude = exclude
def split_texts(texts):
return list(itertools.chain.from_iterable(map(str.split, texts)))
But the Adjustment
class would be even more succint and easily extensible if it were one of the new dataclasses
:
from dataclasses import dataclass
@dataclass(frozen=True)
class Adjustment:
title: str = "An adjustment."
freq_lim: int = None
char_lim: int = None
exclude: str = None
filter_unique
doesn't feel like it produce any usable output. For starter, you are comparing apples and oranges when output
is 'text'
because you store the .text
attribute of your objects in unique
but you are only checking if the object itself is in unique, not its text. Second, you return unique
which may contain tweets, statuses or text; but your usage in the Tweet
class suggest that you want to return tweets every time.
To improve things, we could:
- check unicity using a
set
whose contain check is done in $mathcalO(1)$ instead of $mathcalO(n)$ forlist
s; - extract the desired attribute out of the object to test for unicity but return the actual tweet;
- extract attributes step by step instead of having 4ÃÂ explicit cases;
- turn the function into a generator, since it will be fed into the
list
constructor anyway.
Proposed improvements:
from contextlib import suppress
def filter_unique(tweet_stats_list, output='status'):
uniques = set()
for tweet in tweet_stats_list:
tweet_attr = tweet
with suppress(AttributeError):
tweet_attr = tweet_attr.retweeted_status
if output == 'text':
tweet_attr = tweet_attr.text
if tweet_attr not in uniques:
uniques.add(tweet_attr)
yield tweet
The Tweet
class feels somewhat fine but I don't understand how it improves value over a simple list
based on the provided model (I didn't read the other models on your repository, so it may be more obvious there). Other than that the docstring doesn't add any value: this is the class Tweet
in the module models
, sure this is some kind of """Tweet model"""
â¦
I also don't understand why you allow for variable number of arguments (using *args
) but only ever use args[0]
: youâÂÂre not even guaranteed that there is at least 1 element in args
. Better use an explicit argument here.
Lastly, you should use super()
instead of explicitly calling the parent class. It doesn't matter much in such case but it's a good habit to get into if you ever use multiple inheritance (or just in case you decide to add a layer of abstraction between Tweet
and list
).
Proposed improvements:
import operator
class Tweets(list):
def __init__(self, tweets, filter_by_unique=False, **kwargs):
if filter_by_unique:
tweets = filter_unique(tweets)
super().__init__(tweets, **kwargs)
@property
def sorted_by_time(self):
return sorted(self, key=operator.attrgetter('created_at'))
@property
def oldest(self):
return min(self, key=operator.attrgetter('created_at'))
@property
def newest(self):
return max(self, key=operator.attrgetter('created_at'))
The Author
class feels very messy. You extract a handful of information out of your tweets stored as attributes, but half of them end up unused.
Since you're only interested in counting some stats about your author, you should store only those. And to help you along the way (especially counting the number of tweets of a particular author), you can:
- Sort the array of tweets by author names before processing it;
Group the array by author names to gather together the tweets of a single author.
Note that I don't know tweepy
enough to know if there is a more "unique" identification method than the name of the author. (I do imagine that several author can share a single name, so there must be some.)
Proposed improvements:
import operator
class Authors(object):
def __init__(self, tweets):
unicity_key = operator.attrgetter('author.name')
tweets = sorted(tweets, key=unicity_key)
self.followers_count =
self.following_count =
self.total_tweets =
self.tweets_count =
for _, author_tweets in itertools.groupby(tweets, key=unicity_key):
author_tweets = list(author_tweets)
author = author_tweets[0].author
self.followers_count[author.name] = author.followers_count
self.following_count[author.name] = author.friends_count
self.total_tweets[author.name] = author.statuses_count
self.tweets_count[author.name] = len(author_tweets)
def hbar_plot(self, ax, measurement='Followers', color=(0,0,1,1), incolor_measurement=None, height=1, textsize=7, **kwargs):
measurements =
'Followers': self.followers_count,
'Following': self.following_count,
'Total Tweets': self.total_tweets,
'Sample Tweets': self.tweets_count,
author_measurement = measurements[measurement]
sorted_authors = sorted(author_measurement, key=author_measurement.__getitem__)
if isinstance(color, str):
color = mplcolors.hex2color(mplcolors.cnames[color])
if incolor_measurement is not None:
color_measurement = measurements[incolor_measurement]
minor_max = max(color_measurement.values())
colors = [(*color[:3], color_measurement[author] / minor_max) for author in sorted_authors]
measurement = ' (color: )'.format(measurement, incolor_measurement)
else:
colors = [color] * len(author_measurement)
ticks, values = zip(*((i + height, author_measurement[author]) for i, author in enumerate(sorted_authors)))
ax.barh([i - height / 2 for i in ticks], values, height=height, color=colors, **kwargs)
ax.set_yticks(ticks)
ax.set_yticklabels(sorted_authors, rotation='horizontal', size=textsize)
ax.set_xlabel(measurement)
plt.tight_layout()
edited Jul 18 at 10:34
answered Jul 18 at 0:40
Mathias Ettinger
21.7k32875
21.7k32875
Thanks for the review. TheAdjustment
class may be updated with more inputs for more possible adjustments.
â Arief
Jul 18 at 10:06
@Arief Possibly. But you don't extract anything else fromkwargs
and you don't store it either; so there is no need in having it around.
â Mathias Ettinger
Jul 18 at 10:22
Whyfilter_unique
is not useful..? the argumentoutput=...
is for preference whether the output is in form ofStatus
with full features, or only thetext
. EachStatus
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
â Arief
Jul 26 at 10:44
1
@Arief In your original version, ifoutput
is text, you check for status but store text inunique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
â Mathias Ettinger
Jul 26 at 11:45
1
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and theoperator
module is implemented in C.
â Mathias Ettinger
Jul 31 at 6:41
 |Â
show 3 more comments
Thanks for the review. TheAdjustment
class may be updated with more inputs for more possible adjustments.
â Arief
Jul 18 at 10:06
@Arief Possibly. But you don't extract anything else fromkwargs
and you don't store it either; so there is no need in having it around.
â Mathias Ettinger
Jul 18 at 10:22
Whyfilter_unique
is not useful..? the argumentoutput=...
is for preference whether the output is in form ofStatus
with full features, or only thetext
. EachStatus
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.
â Arief
Jul 26 at 10:44
1
@Arief In your original version, ifoutput
is text, you check for status but store text inunique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.
â Mathias Ettinger
Jul 26 at 11:45
1
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and theoperator
module is implemented in C.
â Mathias Ettinger
Jul 31 at 6:41
Thanks for the review. The
Adjustment
class may be updated with more inputs for more possible adjustments.â Arief
Jul 18 at 10:06
Thanks for the review. The
Adjustment
class may be updated with more inputs for more possible adjustments.â Arief
Jul 18 at 10:06
@Arief Possibly. But you don't extract anything else from
kwargs
and you don't store it either; so there is no need in having it around.â Mathias Ettinger
Jul 18 at 10:22
@Arief Possibly. But you don't extract anything else from
kwargs
and you don't store it either; so there is no need in having it around.â Mathias Ettinger
Jul 18 at 10:22
Why
filter_unique
is not useful..? the argument output=...
is for preference whether the output is in form of Status
with full features, or only the text
. Each Status
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.â Arief
Jul 26 at 10:44
Why
filter_unique
is not useful..? the argument output=...
is for preference whether the output is in form of Status
with full features, or only the text
. Each Status
object has unique ID, two with same ID could be mixed in without notice.. the filtering is on ID by default.â Arief
Jul 26 at 10:44
1
1
@Arief In your original version, if
output
is text, you check for status but store text in unique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.â Mathias Ettinger
Jul 26 at 11:45
@Arief In your original version, if
output
is text, you check for status but store text in unique
; thus no filtering taking place. Now I don't know the inner details of your objects but if statuses have a unique ID, you should store and check for that ID instead.â Mathias Ettinger
Jul 26 at 11:45
1
1
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the
operator
module is implemented in C.â Mathias Ettinger
Jul 31 at 6:41
@Arief Because 1) we reuse the expression twice, so better give it a name; 2) namming a lambda defeat the purpose and is better done by defining a function; 3) instead of defining a function here and reinventing the wheel, it's better to use what's already in the language; 4) lambdas are slow anyway and the
operator
module is implemented in C.â Mathias Ettinger
Jul 31 at 6:41
 |Â
show 3 more comments
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199686%2fmodels-for-analysing-twitter-data-which-based-on-tweepy-package%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password