[PYTHON] Try debugging natural language processing on Windows. with VS Code

Merry Christmas. In old age, he wants to join the Santa Claus Association and give his children dreams.

By the way, do you guys like Windows? I love.

However, probably because there are few people using Windows machines in the data analysis area, the latest analysis libraries and frameworks are often inadequate for Windows, and there are scenes where it is difficult to build an environment.

Even developers who only have Windows machines have wonderful motivations such as "I want to do data analysis" and "I want to do natural language processing", but "Mac is not provided ('· × ·`) ・ ・ ・ Start For those who are worried that "I can't do it (SIer related)", this article will introduce the procedure for executing a natural language processing program with VS Code + Docker. Let's Dive into Docker for Debugging!!!

1. Try debugging Natural language processing themes

This time, I will try with a rudimentary code of "sentiment analysis" which is a task of natural language processing.

The framework used is Hugging Face, which specializes in natural language processing. Reference article: Transformers of Hugging Face attracting attention in natural language processing (NLP)

2. Execution example

# INPUT
text = ['Very yeah',
       'I'm not feeling well today',
       'It's subtle',
       'Okay',
       'I don't think it's good']

------------------------------------------------------------------------
# OUTPUT
[[{'label': 'positive', 'score': 0.9899728894233704}] #Very yeah
[{'label': 'Negative', 'score': 0.8069409132003784}]  #I'm not feeling well today
[{'label': 'Negative', 'score': 0.7249351143836975}]  #It's subtle
[{'label': 'positive', 'score': 0.6537005305290222}]  #Okay
[{'label': 'Negative', 'score': 0.9345374703407288}]  #I don't think it's good

Enter any text and run a program that can determine if the text is Positive/Negative. That's exciting.

3. Prerequisite conditions

It is assumed that the following environment is prepared.

  1. WSL2 must be installed → Procedure: Installation Guide for Windows Subsystem for Linux for Windows 10
  2. Dokcer Desktop WSL2 is installed → Procedure: Docker Desktop WSL 2 backend * English
  3. VS Code is installed → Procedure: Visual Studio Code
  4. VS Code Extension "Remote Development (ms-vscode-remote.vscode-remote-extension pack)" must be installed.

4. Procedure

4-1. Program creation

Well, let's make a program first.

There are only two files to prepare first. The file structure looks like this. image.png

First, from the Docker file.

Dockerfile


FROM continuumio/anaconda3
WORKDIR /app
#RUN conda install -y tensorflow
RUN pip install -U pip && \
    pip install mecab-python3 && \
    pip install fugashi && \
    pip install  ipadic && \
    pip install torch && \
    pip install transformers 

Next is the main program (Python) to be executed.

main.py


from transformers import pipeline
from transformers import BertForSequenceClassification
from transformers import BertJapaneseTokenizer

def nlp_main():

    #Text to enter
    text_list =  ['Very yeah','I'm not feeling well today','It's subtle','Okay','I don't think it's good']

    model = BertForSequenceClassification.from_pretrained('daigo/bert-base-japanese-sentiment')
    tokenizer = BertJapaneseTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")

    #Functions for sentiment analysis
    nlp_sentiment_analyzer = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

    #Processing execution
    for index, text in enumerate(text_list):

        print(f"No{index}『{text}』:{nlp_sentiment_analyzer(text)}")

if __name__ == '__main__':
    nlp_main()

~~ Very simple is Best. ~~

4-2. Docker build

Now that we have defined a Dockerfile, let's build it. Originally, it is necessary to install the library etc. directly in the Native Windows environment, but With Docker, you can build an environment on a container very easily.

  1. Click on the green area at the bottom left of VS Code
  2. Select "ReOpen Folder Contianer"
  3. Select "Dockerfile"

With just this, you can build Docker using VS Code. Isn't it easy? 2kvnh-hrayp.gif Build time takes about 10 minutes. * By the way, my environment is Core i71065G7 @ 1.3GHz, 1.5GHz 16GB.

4-3. Program execution

As a trial, run the program normally on the console instead of Debug.

python main.py

p9egb-74x1y.gif

5. Run the program with Debug.

Now, here is the main debug execution method.

5-1. Installation of Extention

Install VS Code Extension so that you can run Debug.

Extention name

5-2. Debug settings

Now let's set the Debug settings.

Click the "Debug icon" above and click the "create a launch.json file link". image.png

After clicking, a selection screen will be displayed as to what to debug. Select "Python". image.png

Then select "Python File". image.png

The following automatically generated file "launch.json" is displayed. image.png

Please rewrite as follows. Changes: "program": "$ {workspaceRoot} /main.py"

launch.json


{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${workspaceRoot}/main.py", 
            "console": "integratedTerminal"
        }
    ]
}

Please refer to this information for details on Debug settings. Visual Studio Code Debugging

5-3. Start Debug

Debug is executed.

  1. Add the point "breakpoint" you want to stop processing to main.py.
  2. Click the Debug icon.
  3. Click the Debug Run Button.

Video commentary o1d0g-zxcid.gif

5-4. How to debug

When the debug session starts, the Debug toolbar appears at the top of the editor. image.png

Video commentary 0mhpx-2x78o.gif

5-5 Execution result

image.png If you can do it so far, it's okay if you have debugged with Visual Studio or Eclipse (old), right?

Also, for the entered text, The Positive/Negative classification and the reliability score are also displayed, but it seems that the result is reasonable from the human eye. It's amazing.

6. Conclusion

It is difficult to build a natural language processing environment in a Native Windows environment, By sandwiching Docker in this way, it is possible to easily build an environment. You can also run Debug with VS Code.

If you want to analyze in a Windows environment, please give it a try.

7. Supplement

Being able to develop with Docker means launching a strong instance such as AWS EC2, In fact, VS Code can also be used for remote debugging while it is running. In other words, GPU instances can also be used.

Recommended Posts

Try debugging natural language processing on Windows. with VS Code
Try debugging a Java program with VS Code
Building a haskell environment with Docker + VS Code on Windows 10 Home
Try using Spring Boot with VS Code
Create Spring Boot environment with Windows + VS Code
Connect with VS Code from a Windows client to Docker on another server
Lombok with VS Code
Build Java development environment with VS Code on Mac
Build ruby debug environment with VS Code of Windows 10
Docker management with VS Code
Try Docker on Windows 10 Home
Format Ruby with VS Code
Hello World with VS Code!
Install Java with zip on Windows
Spring Boot programming with VS Code
NLP4J [006-031] 100 language processing knocks with NLP4J # 31 verb
Java build with mac vs code
Getting Started with Docker with VS Code
A memorandum when IME cannot be turned on with VS Code (Ubuntu 20.04)
A memo that enabled VS Code + JUnit 5 to be used on Windows 10
Try remote debugging of Java with Remote Containers in Visual Studio Code Insiders
Prepare Java development environment with VS Code
I tried migrating Processing to VS Code
Using JupyterLab + Java with WSL on Windows 10
Hello World on Mac VS Code Java
NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B"
NLP4J [006-033] 100 language processing knocks with NLP4J # 33 Sahen noun
NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J
NLP4J [004] Try text analysis using natural language processing and parsing statistical processing in Java
NLP4J [003] Try text analysis using natural language processing and part-speech statistical processing in Java