Introduction

I've organized the rewrites that I often used after writing python.

namedtuple

I think I use class when I want to reuse some structure. However, I didn't like it very much because the code would be long.

class Twitter:
	def __init__(self, account, user, followers, followings, tweets):
		self.account = account
		self.user = user
		self.followers = followers
		self.followings = followings
		self.tweets = tweets

	def __repr__(self):
		return f"{type(self).__name__}(account={repr(self.account)}, user={repr(self.user)}, followers={repr(self.followers)}, followings={repr(self.followings)}, tweets={repr(self.tweets)})"

t = Twitter("Yuriko Koike", "@ecoyuri", 790000, 596, 3979)
print(t)

Twitter(account='Yuriko Koike', user='@ecoyuri', followers=790000, followings=596, tweets=3979)

I could write this using namedtuple: ↓

from collections import namedtuple

Twitter = namedtuple('Twitter', 'account user followers followings tweets')
t = Twitter('Yuriko Koike', '@ecoyuri', 790000, 596, 3979)
print(t)

#Output is the same as above

Short code
Simple and easy to understand
The contents can be printed with print ()

So I thought it was very good.

yield

For example, consider the following code.

def omit_stopwords(tweets):
	omitted_tweets = []
	for t in tweets:
		#url or@{User name}Or#{Tag name}Remove
		reg = r'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+|[@＠][A-Za-z0-9._-]+|[#＃][one-龥_Ah-Hmm_A-ヺ a-ｚＡ-Ｚa-zA-Z0-9]+'
		text_mod = re.sub(reg,'',t['text'])
		omitted_tweets.append(text_mod)
	return omitted_tweets

# get_tweets"[{'text':{tweet1},'text':{tweet2},...,'text':{tweetN}]Function that returns data in the format of"
ots = omit_stopwords(get_tweets())

for ot in ots:
	print(f"analyzing the tweet: {ot}")

analyzing the tweet:Today 18:Live streaming from 45 onwards will be accompanied by Governor Yoshimura of Osaka Prefecture. ~~~
・ ・ ・
analyzing the tweet:~~~. We will continue to conduct field surveys to prevent the spread of infection.

Tweet data etc. are usually large, so omitted_tweets is a fairly large list, which is not good in terms of memory and speed. At such times

def omit_stopwords(tweets):
	for t in tweets:
		reg = r'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+|[@＠][A-Za-z0-9._-]+|[#＃][one-龥_Ah-Hmm_A-ヺ a-ｚＡ-Ｚa-zA-Z0-9]+'
		text_mod = re.sub(reg,'',t['text'])
		yield text_mod

ots = omit_stopwords(get_tweets())

for ot in ots:
	print(f"analyzing the tweet: {ot}")

By using yield instead of` return as in, the replacement process in omit_stopwords is executed for the first time in the for statement, and as a result, the memory is suppressed. Seems to be able to. As proof of that, if you try to output the variable ots,

<generator object omit_stopwords_yield at 0x10f957468>

It is a generator type like

print(f"analyzing the tweet: {ots.__next__()}")
print(f"analyzing the tweet: {ots.__next__()}")
print(f"analyzing the tweet: {ots.__next__()}")
#・ ・ ・

You can output the data in the list one by one with. (After turning the for statement, an error will occur because the generator is used up.)

Comprehension notation

reg = r'[@＠][A-Za-z0-9._-]+'
target_tweets = []
# @{User name}Extract only tweets that do not contain
for t in get_tweets():
	if not re.search(reg, t['text']):
		target_tweets.append(t)

↑ is

reg = r'[@＠][A-Za-z0-9._-]+'
target_tweets = [t for t in get_tweets() if not re.search(reg, t['text'])]

I will put it refreshingly like. Often used when you want to create another list from a list.

Less code
Fast (apparently)

So I want to use it positively.

in conclusion

Others may be added as appropriate.

A note for writing Python-like code

Introduction

Comprehension notation

in conclusion