Python course for data science

python ・ Set is used to look for duplicate lists.

-Coordinates can be expressed by quotient (row) and remainder (column).

-Check the function reference with shift + tab.

-_ Contains the last executed return value.

numpy ・ Np.uint8 (unsigned, integet, 8bit) 　0~255 Used for image data, etc.

・ Np.float32 Used when saving data used for machine learning.

・ Np.float64 Used when learning a model.

・ Np.expand_dims Increase the dimension of ndarray.

・ Np.squeeze Reduce the dimensions of the ndarray.

・ Flattern Make array one-dimensional.

・ Np.arange (start, stop, step) With range.

・ Np.linspace (start, stop, num) Create a list of start to stop numbers separated by num numbers.

・ Np.logspace (start, stop, num, base = 10) Calculate the power of base by the number of start to stop divided by the number of num.

・ Np.zeros (), np.ones (), np.eyes () All elements are 0, all elements are 1, and all diagonal elements are 1.

・ Np.random.rand () Randomly specify a number from 0 to 1.

・ Np.random.seed () Generate random numbers.

・ Np.random.randn () Generate values from the standard normal distribution (mean 0, variance 1).

・ Np.random.normal (mean, standard deviation) Generate values from the normal distribution (mean, standard deviation).

・ Np.random.randint (low, high) Randomly generate values above low and below high. Less than low if only low.

・ Np.random.choice (list) Get a random value from the specified list.

・ Argmax (), argmin () Get the index of the maximum value and the minimum value.

・ Difference between median and average Median: It takes time to calculate because sorting is required. Strong against outliers.

・ Time.time () Measure time.

・ 68-95-99.7 rules Probability that the data is included in the standard deviation ± 1,2,3 from the mean (normal distribution).

・ Np.clip (array, min, max) Convert min or less to min, and max or more to max.

・ Np.where (condition, true, false) If True for the condition, convert it to the specified value for true, and if False, convert it to the specified value for false.

・ .All (). Any () Judge whether all conditions are True or even one is True.

・ Np.unique (array, return_count = True) Returns a unique element and each count.

・ Np.bincount () Returns a count of 0,1,2,3 ...

・ Np.concatenate () Concatenate arrays.

・ Np.stack () Create a new axis and concatenate. Axis = -1 is often used.

・ Np.transpose (), .T Transpose.

・ Np.save (path, array), np.load (path) Save and load array.

・ Np.save (path, dictionary) .np.load (path, allow_pickle = True) [()] Save and load dictionaries.

pandas ・ Pd.set_options ("display.max_columns (rows)", num) Specify the number of rows and columns to display.

・ .Describe () Display numerical statistics.

・ .Columns () Display a list of columns.

・ Replace = True Updated the original data frame.

・ Reset_index (drop = True) Allocate index again. Overwrite the original index.

・ Set_index (column name) Set the specified column as index.

・ Dropna (subset = [column name]) Deleted rows where the specified column is nan.

・ Df [np.isnan (df ["columns"])], df [df ["columns"]. isna ()] Gets the row where the specified column is nan.

· Df.groupby ("columns"). Statistics Display group-by statistics in the specified column.

・ Pd.concat (df1, df2, axis) Combine data frames in the direction of the specified axis.

・ Df1.merge (df2, how, on, right_on, left_on, suffixies) Combine data frames with the specified combination method and key.

・ Unique () Get only unique values.

・ Nunique () Get the number of unique values.

・ Value_counts () Get how many records each value has.

・ Sort_values (by) Sort the data by the specified column.

・ Apply (function) Apply the function to each line.

・ Iterrows () Generate an iteration that returns index and series.

matplotlib ·% Matplotlib inline Can be drawn on jupyter.

・ Plt.plot (x, y) Draw a graph on the x-axis and y-axis.

・ Plt.x (y) label () Display label.

・ Plt.title () Show title.

・ Plt.legend () Display case law.

・ Plt.x (y) ticks Display the specified ticks.

-Plt.subplot (row, column, index) Draw multiple graphs by specifying rows, columns, and indexes.

・ Plt.figure () 　fig=plt.figure() ax1 = fig.add_subplot (row, column, index)

-Plt.subplots (rows, columns) fig, axes = plt.subplots (rows, columns) 　axes[0].plot(x,y)

-Plt.scatter (), plt.hist (), plt.bar (), plt.boxplot () Draw scatter plots, histograms, bar charts, box plots. 　plt["columns"].value_count().plot(kind="bar")

seaborn ・ Sns.distplot (array, norm_hist, kde) Show histogram. Probability density function is displayed in KDE by default.

-Kernel density estimation (KDE) A method for estimating the probability density function.

・ Sns.jointplot () Display a scatter plot of two variables. Each histogram is also displayed. Display the regression line with kind = "reg".

・ Sns.pairplot () Display a scatter plot of all numerical items. Color coded by hue.

・ Sns.barplot (x = categorical variable, y = numeric item, data = df) The average value of y of x is displayed as a bar graph. Show 95% confidence interval.

・ Sns.countplot (x) Display the number of specified variables.

・ Sns.boxplot (x, y) Display the boxplot of the specified variable.

・ Sns.violinplot (x, y) Display the distribution density of the specified variable.

・ Sns.swarmplot (x, y) Shows the actual distribution of the specified variable.

・ Corr () Display the correlation coefficient.

・ Sns.heatmap (df.corr (), annot = True, cmap = "coolwarm") Display the heat map of the correlation table.

・ Sns.set (context, style, palette) Change the style of seaborn.

OpenCV ・ Cv2.imread () Read the image file with ndarray.

・ Plt.imshow () Display ndarray as an image. Displayed in BGR.

・ Cv2.cvtColor (im, cv2.COLOR_BGR2RGB) Convert from BGR to RGB.

・ Cv2.imwraight () Save ndarray as an image.

・ Binarization ① Specify the threshold and binarize cv2.threshold (ndarray, threshold, 255, CV2.THRESH_BINARY) ② Binarization of Otsu cv2.threshold (ndarray, threshold, 255, CV2.THRESH_BINARY + CV2.THRESH_OTSH) The threshold is set automatically. Apply Linear Discriminant Analysis (LDA) to images. 　③Adaptive Thresholding cv2.adaptiveThreshold (ndarray, 255, cv2.ADAPTIE_THRESH_MEAN_C, CV2.THRESH_BINARY, size, constant) The threshold value is the average of the average brightness values within the specified range minus a constant.

glob Get a list of file paths.

os&pathlib ・ Path Create a path object. Used as an iterator.

・ Os.path.split () Disassembled into head and tail.

・ Os.path.join () Concatenate the folder path and file name.

・ Os.path.exists () Check for the existence of a file or directory.

・ Os.makedirs () Create a folder.

tqdm ・ Tqdm (iterator, total = len (df)) Show progress bar.

nibabel ・ Nib.load () Get the image of Nifty.

・ Get_fdata () Get the ndarray of the image.

multiprocessing ・ Map (func, iter) Returns iter with func applied to iter.

・ Cpu_count () Check the number of physical cores of the CPU that can be used.

・ Pool.map (), Pool.imap () Apply map function in parallel processing. map () returns a list and imap () returns an iter.

・ Pool.imap_unordered () Return as soon as processing is completed.

・ Zip () Returns the elements of multiple iterable objects as tuples.

・ P.close (), p.join () Finish parallel processing.

・% Load_ext autoreload,% autoreload 2 Reflect changes in other files. 　・ Rollaxis (array, axis, start) Inserts the specified axis into the position specified by start.

Python course for data science_useful techniques