[PYTHON] I want to easily delete columns containing NA in R

The pre-stage of data processing is from reading the data. Here, when you read the data you have, you often have empty columns or columns that are not the same length and you want to delete unnecessary columns. Suppose you read the following table.

a b c d
1 2 3 NA
2 3 4 NA
3 4 NA NA
df <- data.frame(a = c(1,2,3),
                 b = c(2,3,4),
                 c = c(3,4,NA),
                 d = c(NA,NA,NA))
> df
  a b  c  d
1 1 2  3 NA
2 2 3  4 NA
3 3 4 NA NA

Utilize dplyr :: select_if

library(dplyr)

Apply ʻanyNA ()` to check if NA is included for each column, and invert the return value as a logical vector.

df %>% lapply(.,anyNA) %>% unlist %>% !.
    a     b     c     d 
 TRUE  TRUE FALSE FALSE 

Pass this to the function select_if (), which selects the columns that meet the conditions.

df %>% select_if(lapply(.,anyNA) %>% unlist %>% !.)
  a b
1 1 2
2 2 3
3 3 4

I managed to get rid of unnecessary columns with a one-line script. I think the readability is not so bad thanks to the pipe.

Postscript: Utilize purrr :: negat ()

The comment from @hkzm was helpful.

Using purrr :: negate (), which passes the negation of the given function, made it more straightforward and simple to write.

library(tidyverse)
df %>% select_if(negate(anyNA))
  a b
1 1 2
2 2 3
3 3 4

It is equivalent to inverting with the operator ! And making it a formula with ~.

df %>% select_if(~ !anyNA(.))
  a b
1 1 2
2 2 3
3 3 4

I personally thought that the one with better readability would use purrr :: negat ().

It's easy with Python, but I have a hard time with R for some reason

For Python pandas, there is a method dropna () that removes missing values, which is applicable in both row and column directions.

Reference: Exclude (delete) / replace (fill in) / extract missing value NaN with pandas

--Delete columns that contain at least one NA

However, in R, the only function that deletes missing values is in the row direction.

--Delete lines that contain at least one NA

I did a google search and searched for a way to delete in the column direction and couldn't find it easily. I finally found this article.

Means to remove columns containing NA in R

Depending on the programming language, you may have strengths and weaknesses. Why isn't there a similar function in R?

Recommended Posts

I want to easily delete columns containing NA in R
I want to easily implement a timeout in python
I want to use the R dataset in python
I want to use R functions easily with ipython notebook
I want to print in a comprehension
I want to embed Matplotlib in PySimpleGUI
I want to do Dunnett's test in Python
I want to easily create a Noise Model
I want to pin Datetime.now in Django tests
I want to create a window in Python
Anyway, I want to check JSON data easily
I want to store DB information in list
I want to merge nested dicts in Python
I want to easily find a delicious restaurant
I want to display the progress in Python!
I want to write in Python! (1) Code format check
I want to embed a variable in a Python string
I want to transition with a button in flask
I want to use self in Backpropagation (tf.custom_gradient) (tensorflow)
I want to write in Python! (2) Let's write a test
Even in JavaScript, I want to see Python `range ()`!
I want to randomly sample a file in Python
I want to easily build a model-based development environment
I want to work with a robot in python.
I want to write in Python! (3) Utilize the mock
I want to do something in Python when I finish
I want to manipulate strings in Kotlin like Python!
I want to do something like sort uniq in Python
I want to solve Sudoku (Sudoku)
[Django] I want to log in automatically after new registration
I want to make the Dictionary type in the List unique
[Introduction to Pytorch] I want to generate sentences in news articles
I want to count unique values in arrays and tuples
I want to align the significant figures in the Numpy array
I want to be able to run Python in VS Code
I want to make input () a nice complement in python
I wanted to delete multiple objects in s3 with boto3
I didn't want to write the AWS key in the program
[Linux] I want to know the date when the user logged in
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
I want to run Rails with rails s even in vagrant environment
LINEbot development, I want to check the operation in the local environment
[Python / AWS Lambda layers] I want to reuse only module in AWS Lambda Layers
I want to create a pipfile and reflect it in docker
I want to make the second line the column name in pandas
I want to pass the G test in one month Day 1
I want to know the population of each country in the world.
I want to understand systemd roughly
I want to scrape images to learn
I want to do ○○ with Pandas
I want to copy yolo annotations
I want to debug with Python
I want to change the color by clicking the scatter point in matplotlib
I want to find variations in various statistics! Recommendation for re-sampling (Bootstrap)
"CSI" that I want to teach beginners in interactive console application production
(Matplotlib) I want to draw a graph with a size specified in pixels
[C language] I want to generate random numbers in the specified range
I want to convert a table converted to PDF in Python back to CSV
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
I analyzed Airbnb data for those who want to stay in Amsterdam