A story that I was addicted to when I made SFTP communication with python

Overview

Be careful because some overseas libraries have an appropriate idea of encoding.

Trouble content

For business, I needed to touch sftp with python, so I used paramiko and tried to download the file directly from the FTP server and apply the text file inside for statistical processing. In the API documentation, "the file () function can be used for the same purpose as the python file". So when I specified the path etc. properly, the following error appeared.

UnicodeDecodeError: 'utf-8' codec can't decode byte ～～ in position ～～: invalid start byte

The code looks like the following.


client = paramiko.SSHClient()
client.connect(Appropriate connection information)
sftp_connection = client.open_sftp()

with sftp_connection.open(File Path) as f
    for line in f:
        print(line)

When I did this, I got a UnicodeDecodeError near the for statement.

Cause

In a nutshell, it was "because you can't specify the encoding when retrieving the contents of a file." I'm trying to read a text file encoded in ANSI in UTF-8 and I'm getting an error. I looked at the source, but at the moment I can't specify the encoding when opening. I can do it with standard input.

Workaround

It seems that it is not possible to rewrite the contents of the file into English or recreate it with utf, so this time I decided to open it in binary and encode it separately with ANSI to read it. Specifically, the following with statement in the above code was rewritten with the following image. (The reason why it resembles standard input to some extent is because I wanted to be able to debug locally instead of sftp. Of course, it takes less time to debug if there is no communication.)

import codecs

for line in readlines():
    print(line)

def readlines():
    file f = sftp_connection.open(File Path, "rb")
    return codecs.encode(f.read(), "ANSI").split("\r\n")

By specifying "rb" in the second variable of open, the file is read as a binary. I read it again with the encoding I want to read, and then return the line.

If the file is too big, there seems to be a problem, but it seems that there is no problem in practical use, so this time it was OK.

Root cause

I think the root cause of this problem is having paramiko's BufferedFile specify the encoding.

It's a good opportunity to read it properly, and I want to send a pull request to paramiko's github.

Recommended Posts

A story that I was addicted to when I made SFTP communication with python

A story I was addicted to when inserting from Python to a PostgreSQL table

A story that I was addicted to at np.where

A note I was addicted to when running Python with Visual Studio Code

A story that stumbled when I made a chatbot with Transformer

A story that I was addicted to calling Lambda from AWS Lambda.

A note I was addicted to when creating a table with SQLAlchemy

I was addicted to creating a Python venv environment with VS Code

A story I was addicted to trying to get a video url with tweepy

A story that didn't work when I tried to log in with the Python requests module

When writing to a csv file with python, a story that I made a mistake and did not meet the delivery date

I was addicted to scraping with Selenium (+ Python) in 2020

What I was addicted to when using Python tornado

Three things I was addicted to when using Python and MySQL with Docker

[Python] When I tried to make a decompression tool with a zip file I just knew, I was addicted to sys.exit ()

I made a fortune with Python.

I made a daemon with Python

I made a package to filter time series with python

What I was addicted to when migrating Processing users to Python

A story that was convenient when I tried using the python ip address module

I made a library to easily read config files with Python

I made a package that can compare morphological analyzers with Python

I want to use a wildcard that I want to shell with Python remove

A story about adding a REST API to a daemon made with Python

What I was addicted to when introducing ALE to Vim for Python

What I was addicted to with json.dumps in Python base64 encoding

A note I was addicted to when making a beep on Linux

What I was addicted to when dealing with huge files in a Linux 32bit environment

A story that went missing when I specified a path starting with a tilde (~) in python open

I made a program to collect images in tweets that I liked on twitter with Python

I made a shuffle that can be reset (reverted) with Python

A program that failed when trying to create a linebot with reference to "Dialogue system made with python"

I made a Hex map with Python

I made a library that adds docstring to a Python stub file.

I made a roguelike game with Python

What I was addicted to Python autorun

The story I was addicted to when I specified nil as a function argument in Go

I made a simple blackjack with Python

I made a configuration file with Python

I made a neuron simulator with Python

When I tried to scrape using requests in python, I was addicted to SSLError, so a workaround memo

When I tried to use pip with python, I was told that XML_SetHashSalt could not be found.

Note that I was addicted to accessing the DB with Python's mysql.connector using a web application.

[Python] I made a decorator that doesn't seem to have any use.

I tried to communicate with a remote server by Socket communication with Python.

I made a tool to automatically browse multiple sites with Selenium (Python)

I made a web application in Python that converts Markdown to HTML

A story I was addicted to trying to install LightFM on Amazon Linux

I made a Discord bot in Python that translates when it reacts

Use Python from Java with Jython. I was also addicted to it.

I get a UnicodeDecodeError when trying to connect to oracle with python sqlalchemy

I tried to discriminate a 6-digit number with a number discrimination application made with python

I made a tool that makes decompression a little easier with CLI (Python3)

I was addicted to trying Cython with PyCharm, so make a note

I made a module PyNanaco that can charge nanaco credit with python

I made a competitive programming glossary with Python

I made a weather forecast bot-like with Python.

I made a GUI application with Python + PyQt5

I made a Twitter fujoshi blocker with Python ①

I want to make a game with Python

[Python] I made a Youtube Downloader with Tkinter.