[PYTHON] Check the data summary in CASTable

SAS Viya is an AI platform. It is available through languages such as Python, Java and R. A table object called CASTable is used in SAS Viya (CAS stands for Cloud Analytic Services). This time, I will introduce how to change the extraction conditions when viewing the data status in CASTable.

Get a table from the database

First, connect to SAS Viya.

import swat
conn = swat.CAS('server-name.mycompany.com', 5570, 'username', 'password')

Then get the CASTable. This time, I will use CSV of IRIS data.

tbl = conn.loadtable('data/iris.csv', caslib='casuser').casTable

Check the information

Use the describe method to see what data you have.

tbl.describe()

The result will be returned as follows. You can see the number of rows, standard deviation, minimum value, maximum value, and data worth 25% / 50% / 75%.

sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667
std 0.828066 0.433594 1.764420
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000

Change the percentage

Changing the percentiles will change the data retrieved. The following is an example of changing to 30% and 80% data.

tbl.describe(percentiles=[0.3, 0.8])
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667
std 0.828066 0.433594 1.764420
min 4.300000 2.000000 1.000000
30% 5.250000 2.800000 1.700000
50% 5.800000 3.000000 4.350000
80% 6.550000 3.400000 5.350000
max 7.900000 4.400000 6.900000

See the whole

Specify ʻinclude ='all'` to check all data.

tbl.describe(include='all')
sepal_length sepal_width petal_length petal_width species
count 150 150 150 150
unique 35 23 43 22
top 5 3 1.5 0.2
freq 10 26 14 28
mean 5.84333 3.054 3.75867 1.19867
std 0.828066 0.433594 1.76442 0.763161
min 4.3 2 1 0.1
25% 5.1 2.8 1.6 0.3
50% 5.8 3 4.35 1.3
75% 6.4 3.3 5.1 1.8
max 7.9 4.4 6.9 2.5

The number can also be a floating point number.

tbl.describe(stats='all')
sepal_length sepal_width petal_length petal_width
count 1.500000e+02 1.500000e+02 1.500000e+02
unique 3.500000e+01 2.300000e+01 4.300000e+01
mean 5.843333e+00 3.054000e+00 3.758667e+00
std 8.280661e-01 4.335943e-01 1.764420e+00
min 4.300000e+00 2.000000e+00 1.000000e+00
25% 5.100000e+00 2.800000e+00 1.600000e+00
50% 5.800000e+00 3.000000e+00 4.350000e+00
75% 6.400000e+00 3.300000e+00 5.100000e+00
max 7.900000e+00 4.400000e+00 6.900000e+00
nmiss 0.000000e+00 0.000000e+00 0.000000e+00
sum 8.765000e+02 4.581000e+02 5.638000e+02
stderr 6.761132e-02 3.540283e-02 1.440643e-01
var 6.856935e-01 1.880040e-01 3.113179e+00
uss 5.223850e+03 1.427050e+03 2.583000e+03
cv 1.417113e+01 1.419759e+01 4.694272e+01
tvalue 8.642537e+01 8.626430e+01 2.609020e+01
probt 3.331256e-129 4.374977e-129 1.994305e-57

Summary

You can use the describe method to get an overview of the data in the CASTable. Please use it as a base for data analysis.

SAS for Developers | SAS

Recommended Posts

Check the data summary in CASTable
Check the behavior of destructor in Python
Get the column list & data list of CASTable
Check if the URL exists in Python
I saved the scraped data in CSV!
Store RSS data in Zabbix (external check)
Export CASTable data
Check if the characters are similar in Python
The story of reading HSPICE data in Python
Check the status of your data using pandas_profiling
Numerical summary of data
Check if the string is a number in python
Summary of tools needed to analyze data in Python
Check if the expected column exists in Pandas DataFrame
Sampling in imbalanced data
About the inefficiency of data transfer in luigi on-memory
[Django] Perform Truncate Table (delete all data in the table)
Check if it is Unix in the scripting language
Not being aware of the contents of the data in python
Let's use the open data of "Mamebus" in Python
Check for the existence of BigQuery tables in Java
Try to decipher the login data stored in Firefox
Check if it is Unix in the scripting language
Check the asymptotic nature of the probability distribution in Python
Processing summary 2 often done in Pandas (data reference, editing operation)
Try scraping the data of COVID-19 in Tokyo with Python
[python] How to check if the Key exists in the dictionary
Test code to check for broken links in the page
[Pandas] If the first row data is in the header in DataFrame
Various ways to calculate the similarity between data in python
Summary of stumbling blocks in Django for the first time
Master the type in Python? (When should type check be done)
Check in advance what happens when you execute the command
[Understand in the shortest time] Python basics for data analysis
[Homology] Count the number of holes in data with Python
Listed data structures in the Linux kernel and their operations
Check the Check button in Tkinter to allow Entry to be edited
The minimum methods to remember when aggregating data in Pandas
Handle Ambient data in Python
python-fitbit data acquisition query summary
Find the difference in Python
Data Manipulation in Python-Try Pandas_plyr
Display UTM-30LX data in Python
Gzip the data by streaming
Write data in HDF format
Methods available in the list
Python data type summary memo
Check the code with flake8
Face detection summary in Python
Regular expression check tool summary
Simply check Content-Type in Flask (@content_type)
What's new in Python 3.9 (Summary)
[Python] Check the installed libraries
Check if the configuration file is read in an easy-to-understand manner
[Machine learning] Check the performance of the classifier with handwritten character data
Check if the password hash generated by PHP matches in Python
How to check the memory size of a variable in Python
How to check the memory size of a dictionary in Python
[TensorFlow 2] How to check the contents of Tensor in graph mode
Check the drawing result using Plotly by embedding CodePen in Qiita
[Golang] Check if a specific character string is included in the character string