I, ~ March 2013 Graduated from the Faculty of Arts and Sciences ~ April 2018 Infrastructure SIer # Linux or Azure May 2018-WEB engineer # I wanted to be able to develop Following the transition, I am studying Java in a highly acclaimed business.
If you can learn how to develop I would like to do my best so that I can acquire full-stack skills.
There may be a lot of beginner's description, but by continuing Output, I hope that engineer skills will be accumulated.
Scraping a certain web page I want to access the search screen (search by keyword and then scrape the data).
I want to access it like this,
Connection.Response response = Jsoup.connect(Url)
.headers(header)
.cookies(cookies)
.data(formData)
.timeout(3000)
.execute();
The URL looks like this.
https://hoge.com/fuga.aspx?validation_no=123456789
Of course, even if you access it as it is, an error will occur, I'm not sure if I try to purify validation_no.
After worrying about 6 hours, I was looking at the developer tools
https://hoge.com/fuga.aspx?validation_no=123456789
There is a word "Location" on the page (*) you are accessing before! Notice. When I look it up, it seems that the redirect destination is specified. That means you don't have to think about validation_no! ??
https://hoge.com/top.aspx
Like this, once
https://hoge.com/top.aspx
Use to get the Location below.
Connection.Response res = Jsoup.connect(Url)
.headers(header)
.timeout(3000)
.cookies(cookies)
.method(Connection.Method.GET)
.followRedirects(false)
.execute();
System.out.println(response.header("Location"));
Then, you can get the URL with validation_no, so Use it to do what you want to do.
If you look at it, you can see it in one shot, but I'm addicted to it. .. ..
Reference: https://stackoverflow.com/questions/16243455/capture-header-location-with-jsoup-or-other-html-parser
I was hitting Jsoup again at the location I received, In the first place
.followRedirects(true)
It seems that this existence alone was good.
Recommended Posts