Extraction of tweet.js (json.loads and eval) (Python)

Hello, first post rude. Since last year (2019), it has become impossible to download in csv format, but from tweet.js, tweet-part .js, etc., first of all, "tweet of" my tweet"" I would like to writetime" and " text " as I want to extract``.


For tweet.js


#### **`read_js.py`**
```python

#Tweet in the same directory.Please put js
#If you have MeCab installed, uncomment it to retrieve the word-separated ones.
import re
import datetime
import json
#import MeCab
tw_open = open("tweet.js","r",encoding="utf-8")
tw_time = open("tweet_mytext_time.txt","a",encoding="utf-8")
tw_a = open("tweet_mytext.txt","a",encoding="utf-8")
#tw_mecab = open("tweet_mytext_mecab.txt","a",encoding="utf-8")
twr = tw_open.read()
twr = re.sub("window.YTD.tweet.part0 = ","",twr)
twrj=json.loads(twr)
big=[]
small=[]
#mecab = MeCab.Tagger ("-Owakati")

for n in range(len(twrj)):
tw=eval(str(twrj[n]["tweet"]))
twf=str(tw["full_text"])
twf=re.sub(r"https?://[\w/:%#\$&\?\(\)~\.=\+\-…]+","",twf)
twf=twf.replace("\n","")
twc=str(tw["created_at"])
tim=datetime.datetime.strptime(twc,"%a %b %d %H:%M:%S %z %Y").replace(tzinfo=None)
tim_r=str(tim).replace(" ","_")
small=[]
twf_b=twf.split(":")[0] 
if not "RT" in twf_b:
 if not "@" in twf_b:
  small.append(str(tim.timestamp()).replace(".0",""))
  small.append(tim_r)
  small.append(twf)
  big.append(small)
small=[]

big.sort(key=lambda x: x[1],reverse=True)

for num in range(len(big)):
tw_a.write(big[num][2]+"\n")
tw_time.write(big[num][1]+" "+big[num][2]+"\n")
#text=big[num][2]
#text_m = mecab.parse(text)
#tw_mecab.write(str(text_m))
```

If you also have tweet-part1.js and Mecab,


#### **`read_js.py`**
```python

import re
import datetime
import json
import MeCab
tw_open = open("tweet.js","r",encoding="utf-8")
tw1_open = open("tweet-part1.js","r",encoding="utf-8")
tw_time = open("tweet_mytext_time.txt","a",encoding="utf-8")
tw_a = open("tweet_mytext.txt","a",encoding="utf-8")
tw_mecab = open("tweet_mytext_mecab.txt","a",encoding="utf-8")
twr = tw_open.read()
tw1r = tw1_open.read()
twr = re.sub("window.YTD.tweet.part0 = ","",twr)
tw1r = re.sub("window.YTD.tweet.part1 = ","",tw1r)
twrj=json.loads(twr)
tw1rj=json.loads(tw1r)
big=[]
small=[]
mecab = MeCab.Tagger ("-Owakati")


for n in range(len(twrj)):
tw=eval(str(twrj[n]["tweet"]))
twf=str(tw["full_text"])
twf=re.sub(r"https?://[\w/:%#\$&\?\(\)~\.=\+\-…]+","",twf)
twf=twf.replace("\n","")
twc=str(tw["created_at"])
tim=datetime.datetime.strptime(twc,"%a %b %d %H:%M:%S %z %Y").replace(tzinfo=None)
tim_r=str(tim).replace(" ","_")
small=[]
twf_b=twf.split(":")[0] 
if not "RT" in twf_b:
 if not "@" in twf_b:
  small.append(str(tim.timestamp()).replace(".0",""))
  small.append(tim_r)
  small.append(twf)
  big.append(small)
small=[]

for n in range(len(tw1rj)):
tw1=eval(str(tw1rj[n]["tweet"]))
twf1=str(tw1["full_text"])
twf1=re.sub(r"https?://[\w/:%#\$&\?\(\)~\.=\+\-…]+","",twf1)
twf1=twf1.replace("\n","")
twc1=str(tw1["created_at"])
tim1=datetime.datetime.strptime(twc1,"%a %b %d %H:%M:%S %z %Y").replace(tzinfo=None)
tim_r1=str(tim1).replace(" ","_")
small=[]
twf_b1=twf1.split(":")[0] 
if not "RT" in twf_b1:
 if not "@" in twf_b1:
  small.append(str(tim1.timestamp()).replace(".0",""))
  small.append(tim_r1)
  small.append(twf1)
  big.append(small)

#print(big)
big.sort(key=lambda x: x[1],reverse=True)
for num in range(len(big)):
tw_a.write(big[num][2]+"\n")
tw_time.write(big[num][1]+" "+big[num][2]+"\n")
text=big[num][2]
text_m = mecab.parse(text)
tw_mecab.write(str(text_m))
```

When you do this,



#### **`tweet_mytext.txt`**
```text

text
.....
```


#### **`tweet_mytext_time.txt`**
```text

2020-01-19_05:47:57 text
.....
```

Should be.
<h2> Description

Aside from the basics (), I had a lot of trouble with how to handle the quotes in JSON,


#### **`python`**
```python

twrj=json.loads(twr)
tw=eval(str(twrj[n]["tweet"]))
```

In such places, it seems that open and read read the entire sentence, remove the extra header, and convert it to dictionary type with json.loads (character type).
From there, eval further converts the dictionary type tweet value as a dictionary.



#### **`python`**
```python

{'tweet': {'retweeted': False, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'name': 'Saidjon', 'screen_name': 'noppo6', 'indices': ['3', '10'], 'id_str': '240638809', 'id': '240638809'}], 'urls': []}, 'display_text_range': ['0', '140'], 'favorite_count': '0', 'id_str': '1218787465110024192', 'truncated': False, 'retweet_count': '0', 'id': '1218787465110024192', 'created_at': 'Sun Jan 19 06:49:10 +0000 2020', 'favorited': False, 'full_text': 'RT @noppo6:Samarkand is blue because it was painted blue to attract tourists after independence. Almost the level of ruins destruction. Congratulations to the Japanese guidebooks and media that praise Samarkand as the "blue city". I think this is good for this simple Samarkand.\Even if it's n, Shahizi ...', 'lang': 'ja'}}
```

The JSON I just got was like this, for example.

Q. How do you judge if it is your tweet?
A. In the case of RT, it is a colon, and RT and @ are included in the 0th position, so it is judged by that.




I've only written so far, but I'm happy if it helps. [I'm always on Twitter (@ kenkensz9) so if you have any questions](https://twitter.com/kenkensz9)
I hope you like it!


Recommended Posts

Extraction of tweet.js (json.loads and eval) (Python)
Source installation and installation of Python
Environment construction of python and opencv
This and that of python properties
Coexistence of Python2 and 3 with CircleCI (1.0)
Python --Difference between exec and eval
Summary of Python indexes and slices
Reputation of Python books and reference books
Installation of Visual studio code and installation of python
Connect a lot of Python or and and
Easy introduction of python3 series and OpenCV3
[Python] Various combinations of strings and values
Idempotent automation of Python and PyPI setup
Full understanding of Python threading and multiprocessing
Project Euler # 1 "Multiples of 3 and 5" in Python
Introduction of Python
Correspondence summary of array operation of ruby and python
Summary of the differences between PHP and Python
The answer of "1/2" is different between python2 and 3
Specifying the range of ruby and python arrays
A memorandum of extraction by python bs4 request
Installation of Python3 and Flask [Environment construction summary]
Basics of Python ①
Basics of python ①
Compare the speed of Python append and map
[Python] Chapter 02-01 Basics of Python programs (operations and variables)
Copy of python
Implementation of TRIE tree with Python and LOUDS
python development environment -use of pyenv and virtualenv-
Links and memos of Python character code strings
Comparison of R and Python writing (Euclidean algorithm)
I / O related summary of python and fortran
List of Python code to move and remember
[Python] A rough understanding of iterators, iterators, and generators
About the * (asterisk) argument of python (and itertools.starmap)
A discussion of the strengths and weaknesses of Python
About shallow and deep copies of Python / Ruby
Continuation of multi-platform development with Electron and Python
Explanation of edit distance and implementation in Python
[Python] Eliminate conditional branching by if by making full use of Enum and eval
[Python] Class type and usage of datetime module
Example of reading and writing CSV with Python
Comparison of Python and Ruby (Environment / Grammar / Literal)
Introduction of Python
Basic operation of Python Pandas Series and Dataframe (1)
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
Mayungo's Python Learning Note: List of stories and links
Full-width and half-width processing of CSV data in Python
The story of Python without increment and decrement operators.
Calculation of standard deviation and correlation coefficient in Python
[Python of Hikari-] Chapter 06-02 Function (argument and return value 1)
List of Python libraries for data scientists and data engineers
The process of installing Atom and getting Python running
Python netCDF4 read speed and nesting of for statements
Python --Explanation and usage summary of the top 24 packages
[Python] Type Error: Summary of error causes and remedies for'None Type'
Receives and outputs standard output of Python 2 and Python 3> C implementations
Easy partial download of mp4 with python and youtube-dl!
Difference between Ruby and Python in terms of variables
Indent behavior of json.dumps is different between python2 and python3
Visualize the range of interpolation and extrapolation with python