Correct the character code in Java and read from the URL

Background

--JRuby's #open throws an error in Windows 10 environment --Let's implement it on the Java side --If you use jisautodetect, you can't use utf-8. --To the code below

Input / output

--Arguments --url (variable name: link) --Time-out time (variable name: time_limit)

code

JavaOpen.java


import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.URL;
import java.net.URLConnection;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class JavaOpen{
    public static String open(String link, int time_limit){
        String html = "";
        try {
            URL url = new URL(link);
            URLConnection con = url.openConnection();
            con.setConnectTimeout(time_limit*300);
            con.setReadTimeout(time_limit*700);
            try (InputStream is = con.getInputStream();){
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                byte[] byteChunk = new byte[8192];
                int n;

                while ( (n = is.read(byteChunk)) > 0 ) {
                    baos.write(byteChunk, 0, n);
                }
                byte[] bytes = baos.toByteArray();
                html = bytesToHtml(bytes);
            } catch (IOException e) {
                e.printStackTrace ();
            }
        } finally {
            return html;
        }
    }

    public static String bytesToHtml(byte[] src) throws UnsupportedEncodingException {
        String[] char_codes = { "UTF8","SJIS","EUC_JP","EUC_JP_LINUX","EUC_JP_Solaris" };
        for (String cc: char_codes){
            String s_tmp = new String(src, cc);
            byte[] b_tmp = s_tmp.getBytes(cc);
            if (Arrays.equals(src, b_tmp)) {
                return s_tmp;
            }
        }
        return "";
    }
}

comment

--String [] char_codes = {"UTF8", "SJIS", "EUC_JP", "EUC_JP_LINUX", "EUC_JP_Solaris"}; is a character code that you may be able to access, so feel free to use it. --I wanted to come up with an alternative for the variable name time_limit ... -- setConnectTimeout: setReadTimeout = 3: 7 I allocated it, but what about normal? ――I wanted to know how to read 8192 bytes at a time, but I lost the intention to move.

reference

-try-with-resources statement --ORACLE -Simple character code judgment in Java --Qiita --Supported encodings --ORACLE

Recommended Posts

Correct the character code in Java and read from the URL
Guess the character code in Java
JSON in Java and Jackson Part 1 Return JSON from the server
The application absorbs the difference in character code
OCR in Java (character recognition from images)
Java character code
Regarding the transient modifier and serialization in Java
Capture and save from selenium installation in Java
Add, read, and delete Excel comments in Java
[Java] Read the file in src / main / resources
Generate OffsetDateTime from Clock and LocalDateTime in Java
Ruby: Nokogiri automatically determines the character code of html read in binary mode
[Deep Learning from scratch] in Java 1. For the time being, differentiation and partial differentiation
Read JSON in Java
Get the URL of the HTTP redirect destination in Java
Write ABNF in Java and pass the email address
Read the packet capture obtained by tcpdump in Java
Java language from the perspective of Kotlin and C #
Source used to get the redirect source URL in Java
Java classes and instances to understand in the figure
Reverse Enum constants from strings and values in Java
Differences in code when using the length system in Java
[Android development] Get an image from the server in Java and set it in ImageView! !!
Read binary files in Java 1
Find the address class and address type from the IP address with Java
In Java, I want to trim multiple specified characters from only the beginning and end.
Set the date and time from the character string with POI
How to set character code and line feed code in Eclipse
Read standard input in Java
Understand the Singleton pattern by comparing Java and JavaScript code
Read binary files in Java 2
Java in Visual Studio Code
Write Java8-like code in Java8
Think about the differences between functions and methods (in Java)
Get attributes and values from an XML file in Java
Understand the Iterator pattern by comparing Java and JavaScript code
Install the memcached plugin on MySQL and access it from Java
[Java] Get the dates of the past Monday and Sunday in order
Get the public URL of a private Flickr file in Java
Read the first 4 bytes of the Java class file and output CAFEBABE
From fledgling Java (3 years) to Node.js (4 years). And the impression of returning to Java
Avoid character code error in java when using VScode extension RUN-CODE
Sample code to call the Yahoo! Local Search API in Java
Sample to read and write LibreOffice Calc fods file in Java 2021
Sample code that uses the Mustache template engine JMustache in Java
[Java] How to convert from String to Path type and get the path
Read barometric pressure and temperature with Java from Raspberry Pi 3 & BMP180
Read and generate QR code [Android]
Easily read text files in Java (Java 11 & Java 7)
Read Java properties file in C #
Code Java from Emacs with Eclim
Java Spring environment in vs Code
Encoding and Decoding example in Java
Specify the java location in eclipse.ini
Java 15 implementation and VS Code preferences
Read CSV in Java (Super CSV Annotation)
Unzip the zip file in Java
[Java] Remove whitespace from character strings
Java 9 new features and sample code
Generate CloudStack API URL in Java
StringBuffer and StringBuilder Class in Java