Python course for data science_useful techniques

python ・ Set is used to look for duplicate lists.

-Coordinates can be expressed by quotient (row) and remainder (column).

-Check the function reference with shift + tab.

-_ Contains the last executed return value.

numpy ・ Np.uint8 (unsigned, integet, 8bit)  0~255 Used for image data, etc.

・ Np.float32 Used when saving data used for machine learning.

・ Np.float64 Used when learning a model.

・ Np.expand_dims Increase the dimension of ndarray.

・ Np.squeeze Reduce the dimensions of the ndarray.

・ Flattern Make array one-dimensional.

・ Np.arange (start, stop, step) With range.

・ Np.linspace (start, stop, num) Create a list of start to stop numbers separated by num numbers.

・ Np.logspace (start, stop, num, base = 10) Calculate the power of base by the number of start to stop divided by the number of num.

・ Np.zeros (), np.ones (), np.eyes () All elements are 0, all elements are 1, and all diagonal elements are 1.

・ Np.random.rand () Randomly specify a number from 0 to 1.

・ Np.random.seed () Generate random numbers.

・ Np.random.randn () Generate values from the standard normal distribution (mean 0, variance 1).

・ Np.random.normal (mean, standard deviation) Generate values from the normal distribution (mean, standard deviation).

・ Np.random.randint (low, high) Randomly generate values above low and below high. Less than low if only low.

・ Np.random.choice (list) Get a random value from the specified list.

・ Argmax (), argmin () Get the index of the maximum value and the minimum value.

・ Difference between median and average Median: It takes time to calculate because sorting is required. Strong against outliers.

・ Time.time () Measure time.

・ 68-95-99.7 rules Probability that the data is included in the standard deviation ± 1,2,3 from the mean (normal distribution).

・ Np.clip (array, min, max) Convert min or less to min, and max or more to max.

・ Np.where (condition, true, false) If True for the condition, convert it to the specified value for true, and if False, convert it to the specified value for false.

・ .All (). Any () Judge whether all conditions are True or even one is True.

・ Np.unique (array, return_count = True) Returns a unique element and each count.

・ Np.bincount () Returns a count of 0,1,2,3 ...

・ Np.concatenate () Concatenate arrays.

・ Np.stack () Create a new axis and concatenate. Axis = -1 is often used.

・ Np.transpose (), .T Transpose.

・ Np.save (path, array), np.load (path) Save and load array.

・ Np.save (path, dictionary) .np.load (path, allow_pickle = True) [()] Save and load dictionaries.

pandas ・ Pd.set_options ("display.max_columns (rows)", num) Specify the number of rows and columns to display.

・ .Describe () Display numerical statistics.

・ .Columns () Display a list of columns.

・ Replace = True Updated the original data frame.

・ Reset_index (drop = True) Allocate index again. Overwrite the original index.

・ Set_index (column name) Set the specified column as index.

・ Dropna (subset = [column name]) Deleted rows where the specified column is nan.

・ Df [np.isnan (df ["columns"])], df [df ["columns"]. isna ()] Gets the row where the specified column is nan.

· Df.groupby ("columns"). Statistics Display group-by statistics in the specified column.

・ Pd.concat (df1, df2, axis) Combine data frames in the direction of the specified axis.

・ Df1.merge (df2, how, on, right_on, left_on, suffixies) Combine data frames with the specified combination method and key.

・ Unique () Get only unique values.

・ Nunique () Get the number of unique values.

・ Value_counts () Get how many records each value has.

・ Sort_values (by) Sort the data by the specified column.

・ Apply (function) Apply the function to each line.

・ Iterrows () Generate an iteration that returns index and series.

matplotlib ·% Matplotlib inline Can be drawn on jupyter.

・ Plt.plot (x, y) Draw a graph on the x-axis and y-axis.

・ Plt.x (y) label () Display label.

・ Plt.title () Show title.

・ Plt.legend () Display case law.

・ Plt.x (y) ticks Display the specified ticks.

-Plt.subplot (row, column, index) Draw multiple graphs by specifying rows, columns, and indexes.

・ Plt.figure ()  fig=plt.figure() ax1 = fig.add_subplot (row, column, index)

-Plt.subplots (rows, columns) fig, axes = plt.subplots (rows, columns)  axes[0].plot(x,y)

-Plt.scatter (), plt.hist (), plt.bar (), plt.boxplot () Draw scatter plots, histograms, bar charts, box plots.  plt["columns"].value_count().plot(kind="bar")

seaborn ・ Sns.distplot (array, norm_hist, kde) Show histogram. Probability density function is displayed in KDE by default.

-Kernel density estimation (KDE) A method for estimating the probability density function.

・ Sns.jointplot () Display a scatter plot of two variables. Each histogram is also displayed. Display the regression line with kind = "reg".

・ Sns.pairplot () Display a scatter plot of all numerical items. Color coded by hue.

・ Sns.barplot (x = categorical variable, y = numeric item, data = df) The average value of y of x is displayed as a bar graph. Show 95% confidence interval.

・ Sns.countplot (x) Display the number of specified variables.

・ Sns.boxplot (x, y) Display the boxplot of the specified variable.

・ Sns.violinplot (x, y) Display the distribution density of the specified variable.

・ Sns.swarmplot (x, y) Shows the actual distribution of the specified variable.

・ Corr () Display the correlation coefficient.

・ Sns.heatmap (df.corr (), annot = True, cmap = "coolwarm") Display the heat map of the correlation table.

・ Sns.set (context, style, palette) Change the style of seaborn.

OpenCV ・ Cv2.imread () Read the image file with ndarray.

・ Plt.imshow () Display ndarray as an image. Displayed in BGR.

・ Cv2.cvtColor (im, cv2.COLOR_BGR2RGB) Convert from BGR to RGB.

・ Cv2.imwraight () Save ndarray as an image.

・ Binarization ① Specify the threshold and binarize cv2.threshold (ndarray, threshold, 255, CV2.THRESH_BINARY) ② Binarization of Otsu cv2.threshold (ndarray, threshold, 255, CV2.THRESH_BINARY + CV2.THRESH_OTSH) The threshold is set automatically. Apply Linear Discriminant Analysis (LDA) to images.  ③Adaptive Thresholding cv2.adaptiveThreshold (ndarray, 255, cv2.ADAPTIE_THRESH_MEAN_C, CV2.THRESH_BINARY, size, constant) The threshold value is the average of the average brightness values within the specified range minus a constant.

glob Get a list of file paths.

os&pathlib ・ Path Create a path object. Used as an iterator.

・ Os.path.split () Disassembled into head and tail.

・ Os.path.join () Concatenate the folder path and file name.

・ Os.path.exists () Check for the existence of a file or directory.

・ Os.makedirs () Create a folder.

tqdm ・ Tqdm (iterator, total = len (df)) Show progress bar.

nibabel ・ Nib.load () Get the image of Nifty.

・ Get_fdata () Get the ndarray of the image.

multiprocessing ・ Map (func, iter) Returns iter with func applied to iter.

・ Cpu_count () Check the number of physical cores of the CPU that can be used.

・ Pool.map (), Pool.imap () Apply map function in parallel processing. map () returns a list and imap () returns an iter.

・ Pool.imap_unordered () Return as soon as processing is completed.

・ Zip () Returns the elements of multiple iterable objects as tuples.

・ P.close (), p.join () Finish parallel processing.

・% Load_ext autoreload,% autoreload 2 Reflect changes in other files.   ・ Rollaxis (array, axis, start) Inserts the specified axis into the position specified by start.

Recommended Posts

Python course for data science_useful techniques
Detailed Python techniques required for data shaping (1)
Detailed Python techniques required for data shaping (2)
Python for Data Analysis Chapter 4
Techniques for sorting in Python
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
Preprocessing template for data analysis (Python)
Data formatting for Python / color plots
Python visualization tool for data analysis work
Summary of useful techniques for Python Scrapy
How to use "deque" for Python data
2016-10-30 else for Python3> for:
python [for myself]
Data analysis python
[python] Read data
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
Display candlesticks for FX (forex) data in Python
Let's analyze Covid-19 (Corona) data using Python [For beginners]
About Python for loops
Python basic course (12 functions)
Python Basic Course (7 Dictionary)
Data analysis for improving POG 1 ~ Web scraping with Python ~
Python basic course (2 Python installation)
Create your own Big Data in Python for validation
Dimensionality reduction and 2D plotting techniques for high-dimensional data
Python basics ② for statement
[For beginners] How to study Python3 data analysis exam
Python Data Visualization Libraries
List of Python libraries for data scientists and data engineers
Python basic course (9 iterations)
Data analysis using Python 0
Data analysis overview python
Python Basic Course (11 exceptions)
About Python, for ~ (range)
Techniques for code testing?
Python basic course (6 sets)
Data cleaning using Python
python textbook for beginners
Python Basic Course (Introduction)
Refactoring tools for Python
python for android Toolchain
[CovsirPhy] COVID-19 Python package for data analysis: SIR-F model
Python Exercise for Beginners # 1 [Basic Data Types / If Statements]
[CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
Python basic course (13 classes)
Python data analysis template
[CovsirPhy] COVID-19 Python Package for Data Analysis: SIR model
[CovsirPhy] COVID-19 Python Package for Data Analysis: Parameter estimation
[Python tutorial] Data structure
[Python] Sorting Numpy data
Data analysis with Python
OpenCV for Python beginners
Python basic course (8 branches)
Install Python (for Windows)
[Python] for statement error
Python environment for projects
Python Basic Course (3 Python Execution)
[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)
[Understand in the shortest time] Python basics for data analysis
Which should I study, R or Python, for data analysis?