Parse the redirected page (Jsoup uses Location to transition screen)

Introduction

I, ~ March 2013 Graduated from the Faculty of Arts and Sciences ~ April 2018 Infrastructure SIer # Linux or Azure May 2018-WEB engineer # I wanted to be able to develop Following the transition, I am studying Java in a highly acclaimed business.

If you can learn how to develop I would like to do my best so that I can acquire full-stack skills.

There may be a lot of beginner's description, but by continuing Output, I hope that engineer skills will be accumulated.

What I wanted to do

Scraping a certain web page I want to access the search screen (search by keyword and then scrape the data).

I want to access it like this,

Connection.Response response = Jsoup.connect(Url)
        .headers(header)
        .cookies(cookies)
        .data(formData)
        .timeout(3000)
        .execute();

The URL looks like this.

https://hoge.com/fuga.aspx?validation_no=123456789

Of course, even if you access it as it is, an error will occur, I'm not sure if I try to purify validation_no.

After worrying about 6 hours, I was looking at the developer tools

https://hoge.com/fuga.aspx?validation_no=123456789

There is a word "Location" on the page (*) you are accessing before! Notice. When I look it up, it seems that the redirect destination is specified. That means you don't have to think about validation_no! ??

https://hoge.com/top.aspx

What i did

Like this, once

https://hoge.com/top.aspx

Use to get the Location below.

Connection.Response res = Jsoup.connect(Url)
        .headers(header)
        .timeout(3000)
        .cookies(cookies)
        .method(Connection.Method.GET)
        .followRedirects(false)
        .execute();

System.out.println(response.header("Location"));

Then, you can get the URL with validation_no, so Use it to do what you want to do.

At the end

If you look at it, you can see it in one shot, but I'm addicted to it. .. ..

Reference: https://stackoverflow.com/questions/16243455/capture-header-location-with-jsoup-or-other-html-parser

Postscript (2018/6/18)

I was hitting Jsoup again at the location I received, In the first place

        .followRedirects(true)

It seems that this existence alone was good.

Recommended Posts

Parse the redirected page (Jsoup uses Location to transition screen)
How to transition from the [Swift5] app to the iPhone settings screen
I want to transition to the same screen in the saved state
[Rails] How to prevent screen transition
How to switch the display of the header menu for each transition page
How to pass the value to another screen
Fix the view screen of the post page
[Error] How to resolve the event that the screen does not transition after editing
[Rails] When transitioning to a page with link_to, move to the specified location on the page