[PYTHON] Mongodb Shortest Introduction (1) Installed & started on EC2 & suddenly put tens of thousands

Install & start mongoDB & bulk insert

I wanted to do it in two lines. .. I also had mongo in my existing epel repository, but it didn't work.

~~yum install --enablerepo=epel mongodb-org~~

――You should use the official repository properly ――The repositories written by various people are sieves, so you should call them official.

Even if I can install it, I get an error when I run it. /usr/bin/mongod: symbol lookup error: /usr/bin/mongod: undefined symbol: _ZN7pcrecpp2RE4InitEPKcPKNS_10RE_OptionsE

It seems to be EC2. I'm not going to change /lib64/libpcre.so.0.0.1 or something, so it's better to go from the repository http://stackoverflow.com/questions/20872774/epel-mongodb-will-not-start-on-ec2-amazon-ami

Gently, from adding repository to starting

This time I referred to the page for Amazon Linux https://docs.mongodb.org/manual/tutorial/install-mongodb-on-amazon/

Add yum.repos.d

sudo vi /etc/yum.repos.d/mongodb.repo

[mongodb]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.2/x86_64/
gpgcheck=0
enabled=0

Installation

sudo yum install -y --enablerepo=mongodb mongodb-org

Service start

sudo service mongod start

Yes, it's up.

Console launch

mogo

Try it out

Click here for the official Docment of the command

https://docs.mongodb.org/manual/reference/command/

Create data Put JSON format data in honyarara

Initially it is a DB called test, which can be switched with use "dbname" It seems to be a system that would be made if it wasn't there. I like that kind of thing. I like it because honyarara is the name of the dataset called Collection

db.honyarara.insert(
   {
     item: "test"
   }
)

Try to find the data `honyarara Please all``

db.honyarara.find()

It was in

Look for someone whose item is test`

db.honyarara.find("item":"test")

found.

Finished!

quit()

Bulk insert

mongo feels like javascript Bulk insert is such a command

db.collection.bulkWrite(
   [
      { insertOne : { "document" : { name : "sue", age : 26 } } },
      { insertOne : { "document" : { name : "joe", age : 24 } } },
      { insertOne : { "document" : { name : "ann", age : 25 } } },
      { insertOne : { "document" : { name : "bob", age : 27 } } },
      { updateMany: {
         "filter" : { age : { $gt : 25} },
         "update" : { $set : { "status" : "enrolled" } }
         }
      },
      { deleteMany : { "filter" : { "status" : { $exists : true } } } }
   ]
)

I can't do that with JSON, so I'll unzip the CSV gzip and insert tens of thousands of records along the way.

Bulk insert every 1000 while reading a gz-compressed text file with python

Install because it uses pymongo

pip install pymongo

Run the script to read the CSV file and save it to mongodb

python insert.py "filename.csv.gz"

Oh! Haya! 6 million cases, but it's going well

insert.py


# coding: utf-8
import pymongo
import sys
import csv
import gzip
#Get Client
client=pymongo.MongoClient()
#get database
db=client.test
#Get Collection
mongo=db.hoge

#The first argument is the CSV file path
infiename=sys.argv[1]
#A list of column names in the order they are in the CSV file
keynames=('name','age','gender')
#Bulk insert when the specified number is reached while adding to the list of counter items
i=0;
items=[]
#While opening gzip
with gzip.open(infiename, 'r') as f:
#While reading the CSV, specify the delimiter in the delimiter part of
    reader = csv.reader(f, delimiter=",")
    for row in reader:
#Add a column name key to the data in each row of CSV to make a dictionary
        data=dict(zip(keys,row))
#Add to the list
        items.append(data)
        i=i+1
#When 1000 cases are accumulated, flash and reset
        if(i%1000==0):
            result=mongo.insert_many(items)
#Returns the entered ID
            #print result.inserted_ids
            items=[]
            i=0
#If there is a surplus at the end, put it in
if(i>0):
    mongo.insert_many(items)


Recommended Posts

Mongodb Shortest Introduction (1) Installed & started on EC2 & suddenly put tens of thousands
Mongodb Shortest Introduction (2) I searched for tens of thousands