Introduction

Ah, I want to scrape.

So I will scrape it with Nuxt on Docker. In the case of node type, the library called puppeteer seemed to be recommended for scraping, so scraping quickly from Nuxt's server Middleware.

I grew up being told that I shouldn't bother people too much, so I'll scrape my site. (You don't need to log in. Please use it ♪) toribure | Simple is the best brainstorming tool that can be used alone or as a team

It's a little publicity. There is an image of a cute bird (Irasutoya) on the top page. This time, I will scrape this and display it.

Preparing Nuxt on Docker

I think there are many other articles around here, so let's take a look. By the way, the environment is

$ docker -v
  Docker version 19.03.13-beta2, build ff3fbc9d55
$ docker-compose -v
  docker-compose version 1.26.2, build eefe0d31

was.

Make a Nuxt app

$ docker run --rm -it -w /app -v `pwd`:/app node yarn create nuxt-app scraping
? Project name: scraping
? Programming language: JavaScript
? Package manager: Yarn
? UI framework: None
? Nuxt.js modules: Axios
? Linting tools: 
? Testing framework: None
? Rendering mode: Universal (SSR / SSG)
? Deployment target: Server (Node.js hosting)
? Development tools:

Only axios will be used later, so I will consciously include it.

By the way, at the time of article creation, the version of the node: latest image was 14.9.0, the version of create-nuxt-app was v3.2.0, and the version of nuxt was 2.14.0.

Prepare Dockerfile, docker-compose.yml

From here on, the just-created scraping / directory is your working directory.

$ cd scraping

`Dockerfile`


FROM node

ENV HOME=/app     \
    LANG=C.UTF-8  \
    TZ=Asia/Tokyo \
    HOST=0.0.0.0

WORKDIR ${HOME}

RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

COPY package.json ${HOME}
COPY yarn.lock ${HOME}
RUN yarn install

COPY . ${HOME}
EXPOSE 3000
CMD ["yarn", "run", "dev"]

For more information on RUN apt-get ..., see Troubleshooting puppeteer. there is. It means that an error will occur if the browser and fonts are not prepared in the container.

`docker-compose.yml`


version: "3"

services:
  nuxt:
    build: .
    volumes:
      - .:/app
    ports:
      - 3000:3000

Once this is done, build the container once.

$ docker-compose build

I'll make a scraping app

Introduced puppeteer

I will put it in yarn.

$ docker-compose run --rm nuxt yarn add puppeteer

Make an API

We will use serverMiddleware. We will scrape through serverMiddleware as an API by referring to express-template which is also officially introduced.

`js:nuxt.config.js`


export default {
  ...
  ,
  serverMiddleware: {
    '/api': '~/api'
  }
}

This will direct / api access to ~ / api / index.js. So I will make a file.

$ mkdir api
$ touch api/index.js api/scraping.js

I made two files, but index.js is the receiver and I'm going to let scraping.js do the actual processing.

`api/index.js`


const app = require('express')()
const scraping = require('./scraping')

app.get('/get_image', async(req, res) => {
  const image = await scraping.getImage()
  res.send(image)
})

module.exports = {
  path: '/api',
  handler: app
}

This is to call the get_image () method of scraping.js when / api / get_image is accessed.

`api/scraping.js`


const puppeteer = require('puppeteer')

async function getImage() {
  const browser = await puppeteer.launch({
    args: [
      '--no-sandbox',
      '--disable-dev-shm-usage'
    ]
  })
  const page = await browser.newPage()
  await page.goto("https://toribure.herokuapp.com/")
  const image = await page.evaluate(() => {
    return document.getElementsByTagName("main")[0].getElementsByTagName("img")[0].src
  })
  return image
}

module.exports = {
  getImage
}

It almost follows the puppeteer official README. You can get and manipulate elements by using page.evaluate. As you can see from the HTML structure of this scraping destination (https://toribure.herokuapp.com/) with the Developer tool etc., there is only one in totalThis is an image of Mr. Tori, whose target is theimg element, which has only one under the main element. (It has a dirty structure, but it's cute) Once you know that, all you have to do is get the elements just like normal js.

This is the end of coding on the API side.

The front is refreshing

I'm getting tired, so when I press the button on the front, the image is displayed.

`pages/index.vue`


<template>
  <div>
    <button @click="showBird">Scraping!!</button>
    <br>
    <img v-if="src" :src="src">
  </div>
</template>

<script>
export default {
  data() {
    return {
      src: ""
    }
  },
  methods: {
    async showBird() {
      this.src = await this.$axios.$get("/api/get_image")
    }
  }
}
</script>

**Complete! !! ** **

Operation check

A cute bird came out ♪

finally

If you can do so far, the rest is the world of DOM manipulation, so if you understand the structure of the target page and write js, you can scrape anything. Some sites prohibit scraping, so it would be great if you could realize various ideas while paying attention to that point!

reference

-[Nuxt] Scraping with Puppeteer-From data acquisition to display (serverMiddleware) --7839 -[[Procedure explanation] Easy scraping with JavaScipt! – Take a screenshot with Puppeteer ｜ ProgLearn --ProgLearn --Proglearn, a comprehensive programming information site](https://blog.proglearn.com/2019/06/20/javascipt%E3%81%A7%E7%B0%A1%E5 % 8D% 98% E3% 82% B9% E3% 82% AF% E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0% EF% BC % 81-puppeteer% E3% 81% A7% E3% 82% B9% E3% 82% AF% E3% 83% AA% E3% 83% BC% E3% 83% B3% E3% 82% B7% E3% 83 % A7 /) -Scraping with Puppeteer | grgr-dkrk's blog -I tried scraping with Docker + docker-compose + puppeteer --Qiita -Data acquisition by Axios with Nuxt.js --Qiita

Scraping with puppeteer in Nuxt on Docker.