I've organized the rewrites that I often used after writing python.
namedtuple
I think I use class when I want to reuse some structure. However, I didn't like it very much because the code would be long.
class Twitter:
def __init__(self, account, user, followers, followings, tweets):
self.account = account
self.user = user
self.followers = followers
self.followings = followings
self.tweets = tweets
def __repr__(self):
return f"{type(self).__name__}(account={repr(self.account)}, user={repr(self.user)}, followers={repr(self.followers)}, followings={repr(self.followings)}, tweets={repr(self.tweets)})"
t = Twitter("Yuriko Koike", "@ecoyuri", 790000, 596, 3979)
print(t)
Twitter(account='Yuriko Koike', user='@ecoyuri', followers=790000, followings=596, tweets=3979)
I could write this using namedtuple: ↓
from collections import namedtuple
Twitter = namedtuple('Twitter', 'account user followers followings tweets')
t = Twitter('Yuriko Koike', '@ecoyuri', 790000, 596, 3979)
print(t)
#Output is the same as above
So I thought it was very good.
yield
For example, consider the following code.
def omit_stopwords(tweets):
omitted_tweets = []
for t in tweets:
#url or@{User name}Or#{Tag name}Remove
reg = r'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+|[@@][A-Za-z0-9._-]+|[##][one-龥_Ah-Hmm_A-ヺ a-zA-Za-zA-Z0-9]+'
text_mod = re.sub(reg,'',t['text'])
omitted_tweets.append(text_mod)
return omitted_tweets
# get_tweets"[{'text':{tweet1},'text':{tweet2},...,'text':{tweetN}]Function that returns data in the format of"
ots = omit_stopwords(get_tweets())
for ot in ots:
print(f"analyzing the tweet: {ot}")
analyzing the tweet:Today 18:Live streaming from 45 onwards will be accompanied by Governor Yoshimura of Osaka Prefecture. ~~~
・ ・ ・
analyzing the tweet:~~~. We will continue to conduct field surveys to prevent the spread of infection.
Tweet data etc. are usually large, so omitted_tweets
is a fairly large list, which is not good in terms of memory and speed. At such times
def omit_stopwords(tweets):
for t in tweets:
reg = r'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+|[@@][A-Za-z0-9._-]+|[##][one-龥_Ah-Hmm_A-ヺ a-zA-Za-zA-Z0-9]+'
text_mod = re.sub(reg,'',t['text'])
yield text_mod
ots = omit_stopwords(get_tweets())
for ot in ots:
print(f"analyzing the tweet: {ot}")
By using yield
instead of` return
as in, the replacement process in omit_stopwords
is executed for the first time in the for statement, and as a result, the memory is suppressed. Seems to be able to. As proof of that, if you try to output the variable ots
,
<generator object omit_stopwords_yield at 0x10f957468>
It is a generator type like
print(f"analyzing the tweet: {ots.__next__()}")
print(f"analyzing the tweet: {ots.__next__()}")
print(f"analyzing the tweet: {ots.__next__()}")
#・ ・ ・
You can output the data in the list one by one with. (After turning the for statement, an error will occur because the generator is used up.)
reg = r'[@@][A-Za-z0-9._-]+'
target_tweets = []
# @{User name}Extract only tweets that do not contain
for t in get_tweets():
if not re.search(reg, t['text']):
target_tweets.append(t)
↑ is
reg = r'[@@][A-Za-z0-9._-]+'
target_tweets = [t for t in get_tweets() if not re.search(reg, t['text'])]
I will put it refreshingly like. Often used when you want to create another list from a list.
So I want to use it positively.
Others may be added as appropriate.
Recommended Posts