Extracting papers from ACL2020, an international conference on natural language processing, using Python's arXiv API

Introduction

The ACL2020 paper has begun to be submitted to arXiv, so I used the arXiv API to list it (for myself).

environment

$ pip install arxiv
import arxiv
import pandas as pd

Search conditions

Extract articles containing "ACL2020" in the comment column from articles in the Computation and Language category.

I referred to this page for how to use the API. -Get paper information using arXiv API in Python, download PDF

#Search query
l = arxiv.query(query='(co="ACL2020" OR co:"ACL 2020") AND cat:cs.CL', sort_by='submittedDate')

#Since unrelated papers were included, I made it a DataFrame and filtered it.
df = pd.io.json.json_normalize(l)
acl_df = df[df["arxiv_comment"].str.contains("ACL", na=False)]
acl2020_df = acl_df[acl_df["arxiv_comment"].str.contains("2020", na=False)]

len(acl2020_df)
#There were 102 cases

List of papers (as of April 26, 2020)

I tried to extract the papers adopted (submitted) to ACL2020 by arxiv API. System demo paper, Student Research Workshop, long / short are mixed.

** May contain some errors. ** ** **use as just reference. ** **

reference

-I tried to list the ones in arXiv in ACL 2019 -Get paper information using arXiv API in Python, download PDF

Recommended Posts

Extracting papers from ACL2020, an international conference on natural language processing, using Python's arXiv API
I read an introductory book on natural language processing
Let's enjoy natural language processing with COTOHA API
[For beginners] Language analysis using the natural language processing tool "GiNZA" (from morphological analysis to vectorization)