Prepare the scraping environment with Java. However, I don't want to install Chrome, so let Docker (selenium / standalone-chrome
) do it.
Thank you very much. https://qiita.com/wizpra-koyasu/items/7b7e0938ad6d36caf4be https://stackoverflow.com/questions/12836114/selenium-webdriver-remote-setup https://www.seleniumhq.org/docs/03_webdriver.jsp
This is very easy. Just click + NEW
in Kitematic, search for standalone-chrome
and CREATE
If you start it safely, you will see which port it is published on at ʻACCESS URL`, so make a note of it. If you understand, I think you should specify Ports
Almost the sample code
package org.openqa.selenium.example;
import java.net.MalformedURLException;
import java.net.URL;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;
public class Selenium2Example {
public static void main(String[] args) {
// Create a new instance of the Firefox driver
// Notice that the remainder of the code relies on the interface,
// not the implementation.
DesiredCapabilities capability = DesiredCapabilities.chrome();
WebDriver driver = null;
try {
driver = new RemoteWebDriver(new URL("http://localhost:32778/wd/hub"),
capability);
} catch (MalformedURLException e) {
//TODO auto-generated catch block
e.printStackTrace();
}
if(driver != null) {
// And now use this to visit Google
driver.get("http://www.google.com");
// Alternatively the same thing can be done like this
// driver.navigate().to("http://www.google.com");
// Find the text input element by its name
WebElement element = driver.findElement(By.name("q"));
// Enter something to search for
element.sendKeys("Cheese!?");
// Now submit the form. WebDriver will find the form for us from the element
element.submit();
// Check the title of the page
System.out.println("Page title is: " + driver.getTitle());
// Google's search is rendered dynamically with JavaScript.
// Wait for the page to load, timeout after 10 seconds
(new WebDriverWait(driver, 10)).until(new ExpectedCondition<Boolean>() {
public Boolean apply(WebDriver d) {
return d.getTitle().toLowerCase().startsWith("cheese!");
}
});
// Should see: "cheese! - Google Search"
System.out.println("Page title is: " + driver.getTitle());
//Close the browser
driver.quit();
}
}
}
Since Selenium is used, put the library in Maven or Gradle. I used maven
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hoge</groupId>
<artifactId>fuga</artifactId>
<version>0.0.1-SNAPSHOT</version>
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependencies>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>2.41.0</version>
</dependency>
</dependencies>
</project>
If you move it with this, you should get the following output ... It's easy.
Page title is: Cheese!? -Google search
Page title is: Cheese!? -Google search
Where will it be stored when I take a screenshot? I made Remote's WebDriver for the first time, but I wonder if the generated image can be controlled on the client side. Or will it be held on the server and collected? Well, I'm sure it's the former ... Let's find out tomorrow ...
Recommended Posts