[PYTHON] How to read the SNLI dataset

What is SNLI

--Standord Natural Language Inference [^ 1] --Annotated corpus for learning natural language reasoning --A pair consisting of two documents, premise and hypothesis, and the corresponding label (manual) --neutral: I can't say either --contradiction: Contradiction --entailment: correct ---: No label

Text Judgments Hypothesis
A man inspects the uniform of a figure in some East Asian country. contradiction The man is sleeping
An older and younger man smiling. neutral Two men are smiling and laughing at the cats playing on the floor.
A black race car starts up in front of a crowd of people. contradiction A man is driving down a lonely road.
A soccer game with multiple males playing. entailment Some men are playing a sport.
A smiling costumed woman is holding an umbrella. neutral A happy woman in a fairy costume holds an umbrella.

--Number of data: 570,000 in total --Training: 550,000 --Validation: 10,000 --Test: 10,000

--There is also parsing data in the following format.

{
	"annotator_labels": ["neutral"], 
	"captionID": "3416050480.jpg#4", 
	"gold_label": "neutral", 
	"pairID": "3416050480.jpg#4r1n", 
	"sentence1": "A person on a horse jumps over a broken down airplane.",
	"sentence1_binary_parse": "( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )",
	"sentence1_parse": "(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))", 
	"sentence2": "A person is training his horse for a competition.", 
	"sentence2_binary_parse": "( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) )", 
	"sentence2_parse": "(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .)))"
}

download

You can download it from The Stanford Natural Language Inference (SNLI) Corpus.

wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip

Data reading

Data is stored in json format (.jsonl) and tsv format (.txt).

import pandas as pd
df = pd.read_csv("snli_1.0/snli_1.0_train.txt", sep="\t")

References

Recommended Posts

How to read the SNLI dataset
How to read JSON
How to use the generator
How to use the decorator
How to increase the axis
How to start the program
Read the Python-Markdown source: How to create a parser
How to read the CBC (Pulp, python-mip) solver log
How to calculate the autocorrelation coefficient
How to use the zip function
How to use the optparse module
How to read e-Stat subregion data
How to get the Python version
[Python] How to import the library
How to increase the number of machine learning dataset images
How to overwrite the output to the console
How to use the ConfigParser module
Preparing to load the original dataset
How to make a command to read the configuration file with pyramid
How to display the progress bar (tqdm)
How to use the Spark ML pipeline
How to read pydoc on python interpreter
How to check the version of Django
How to solve the bin packing problem
How to manually update the AMP cache
[Colab] How to copy a huge dataset
[Linux] How to use the echo command
How to use the Linux grep command
How to get colored output to the console
How to operate Linux from the console
How to access the Datastore from the outside
How to use the IPython debugger (ipdb)
How to read CSV files in Pandas
How to read problem data with paiza
[Image recognition] How to read the result of automatic annotation with VoTT
How to assign multiple values to the Matplotlib colorbar
How to calculate the volatility of a brand
How to read a CSV file with Python 2/3
How to specify the launch browser for JupyterLab 3.0.0
How to find the area of the Voronoi diagram
How to use MkDocs for the first time
How to specify the NIC to scan with amazon-dash
[Python] How to change the date format (display format)
[Python] How to read excel file with pandas
[Python] How to read data from CIFAR-10 and CIFAR-100
How to try the friends-of-friends algorithm with pyfof
How to use the graph drawing library Bokeh
How to print debug messages to the Django console
How to read an array with Python's ConfigParser
How to use the Google Cloud Translation API
How to operate Linux from the outside Procedure
How to use the NHK program guide API
[Algorithm x Python] How to use the list
How to erase the characters output by Python
How to measure line speed from the terminal
How to get the files in the [Python] folder
I read "How to make a hacking lab"
How to read time series data in PyTorch
Download the VGG Face2 dataset directly to the server
Tensorflow, Tensorflow After all, which one (How to read Tensorflow)
How to identify the system call number ausyscall