The Power of Pandas: Python


Pandas Basics### Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.


pandas is well suited for many different kinds of data: -Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet -Ordered and unordered (not necessarily fixed-frequency) time series data. ・ Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels ・ Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

Here are just a few of the things that pandas does well: ・ Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data -Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects -Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. Automatically align the data for you in computations ・ Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data -Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects -Intelligent label-based slicing, fancy indexing, and subsetting of large data sets ・ Intuitive merging and joining data sets ・ Flexible reshaping and pivoting of data sets ・ Hierarchical labeling of axes (possible to have multiple labels per tick) Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format -Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging.

2020-09-25 18_49_13-pandas-data-structure.svg.png

To load the pandas package and start working with it, import the package.

In [1]: import pandas as pd

■Creating data The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional).Each column in a DataFrame is a Series.

** ・ DataFrame ** A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column. For example, consider the following simple DataFrame:

In [2]: pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Out [2]:

Yes No
0 50 131
1 21 2

2020-09-25 17_35_21-What kind of data does pandas handle_ — pandas 1.1.2 documentation.png

DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [3]: pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Out [3]:

Bob Sue
0 I liked it. Pretty good.
1 It was awful. Bland.

There are several ways to create a DataFrame. One way is to use a dictionary. For example: 2020-09-25 17_25_46-Pandas Basics - Learn Python - Free Interactive Python Tutorial.png

** ・ Series ** A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:


In [4]: pd.Series([1, 2, 3, 4, 5])
Out [4]: 
0    1
1    2
2    3
3    4
4    5
dtype: int64

■Reading data files Another way to create a DataFrame is by importing a csv file using Pandas. Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. Now, the csv cars.csv is stored and can be imported using pd.read_csv: 2020-09-25 17_26_56-Pandas Basics - Learn Python - Free Interactive Python Tutorial.png

or we can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:

In [5]: pd.head()

■ Other Useful Tricks ** ・ Get the current working directory **

In [6]: import os
In [7]: os.getcwd()

** ・ Check how many rows and columns present in the data ** (o/p -> no. of rows, no. of columns)

In [8]: pd.shape
Out [8]: (2200, 15)

** ・ Rename the columns **

In [9]: pd_new = pd.rename(colums = {'Amount.Requested': 'Amount.Requested_NEW'})  
In [10]: pd_new.head()

** ・ Write a dataframe in csv or excel **

df.to_csv("filename.csv", index = False)
df.to_excel("filename.xlsx", index = False)

There are two ways to handle the situation where we do not want the index to be stored in csv file.

  1. you can use index=False while saving your dataframe to csv file.
df.to_csv("file_name.csv", index=False)

2 . Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.

df_new = pd.read_csv("file_name.csv").drop(['unnamed 0'],axis=1)

here is the cheat-sheet for pandas.

Enjoy the Power of Pandas and I hope you found it helpful. Thank you for spending the time to read this article. See you in next topic. :grinning: :grinning:

Recommended Posts

The Power of Pandas: Python
the zen of Python
Pandas of the beginner, by the beginner, for the beginner [Python]
Towards the retirement of Python2
About the ease of Python
About the features of Python
The story of Python and the story of NaN
[Python] Operation memo of pandas DataFrame
[Python] The stumbling block of import
First Python 3 ~ The beginning of repetition ~
Existence from the viewpoint of Python
pyenv-change the python version of virtualenv
Change the Python version of Homebrew
Power the brushless motor of Oriental motor
The power of combinatorial optimization solvers
[Python] Understanding the potential_field_planning of Python Robotics
Learn the basics of Python ① Beginners
[Python] Summary of how to use pandas
Change the length of Python csv strings
Check the behavior of destructor in Python
My pandas (python)
[Python3] Understand the basics of Beautiful Soup
The story of making Python an exe
Learning notes from the beginning of Python 1
Check the existence of the file with python
About the virtual environment of python version 3.7
[Python] Understand the content of error messages
[Python3] Rewrite the code object of the function
I didn't know the basics of Python
Basics of Python ①
The result of installing python in Anaconda
Basics of python ①
Check the path of the Python imported module
The story of manipulating python global variables
Copy of python
[python] [meta] Is the type of python a type?
The Python project template I think of.
In search of the fastest FizzBuzz in Python
Python Basic Course (at the end of 15)
Set the process name of the Python program
[Python] Get the character code of the file
The story of blackjack A processing (python)
Intuitively learn the reshape of Python np
Python Note: The secret role of commas
Learning notes from the beginning of Python 2
python pandas notes
Introduction of Python
Japanese translation: PEP 20 --The Zen of Python
[Python3] Understand the basics of file operations
Get the contents of git diff from python
Output the number of CPU cores in Python
I tried the pivot table function of pandas
Try the python version of emacs-org parser orgparse
[python] Check the elements of the list all, any
[Python] Sort the list of pathlib.Path in natural sort
Prepare the execution environment of Python3 with Docker
Automatic operation of Chrome with Python + Selenium + pandas
Summary of the differences between PHP and Python
The contents of the Python tutorial (Chapter 5) are itemized.
The contents of the Python tutorial (Chapter 4) are itemized.
The contents of the Python tutorial (Chapter 2) are itemized.