[PYTHON] Installation of dependent libraries for Alibaba Cloud function calculation

This article describes how to use existing tools to install dependent libraries in a ** Function Compute ** project with minimal manual intervention.

Package manager installation directory

Currently, Function Compute (https://www.alibabacloud.com/en/products/function-compute) supports Java, Python and Node.js environments. The package managers for these three languages are Maven, pip and NPM, respectively. The following describes the installation directories for each of these package managers.

Maven Maven is a Java package manager. Maven downloads the dependencies declared in the project file pom.xml from the central or private repositories into the $ M2_HOME / repository directory. The default value for M2_HOME is $ HOME / .m2. All Java projects on the development machine share the JAR package under this local repository directory. At the mvn package stage, all dependent JAR packages are packaged in the final product. Therefore, the Java project runs independently of the files in the $ M2_HOME / repository directory.

pip Currently, pip is the most popular and recommended package manager for Python. Before you understand how to install an installation package in your local directory, you need to familiarize yourself with the Python package manager. To get a better understanding, the following is a brief description of the Python package manager development history.

Prior to 2004, setup.py was recommended for Python installations. To use it, download any module and use the setup.py file that came with that module.

python setup.py install

setup.py is developed from Distutils. Released as part of the Python standard repository in 2000, Distutils is used to build and install Python modules.

Therefore, use setup.py for Python modules and

python setup.py sdist

You can also package the module into an RPM or EXE file.

python setup.py bdist_rpm
python setup.py bdist_wininst

Like MakeFile, setup.py can be used for builds and installations. However, because the build and install processes are integrated, you must build the module every time you install it, wasting resources. In 2004, the Python community released setuptools, which includes an easy_install tool. Later, Python began supporting the EGG format and introduced the online repository PyPi.

The online module repository PyPi has two important advantages.

  1. You only need to install the compiled EGG package, which improves efficiency.
  2. Automatically download and install dependent packages from PyPi.

Since its release in 2008, pip has gradually replaced easy_install and has become the de facto standard Python package manager. Being compatible with the EGG format, pip prefers the Wheel format and supports installing modules from code version repositories (eg GitHub).

The following describes the directory structure of Python modules. The directories for both EGG and Wheel installation files are divided into five categories: purelib, platlib, headers, scripts, and data.

Directory Installation location Purpose
purelib $prefix/lib/pythonX.Y/site-packages Pure Python implementation library
platlib $exec-prefix/lib/pythonX.Y/site-packages Platform-related DLL
headers $prefix/include/pythonX.Yabiflags/distname C header files
script $prefix/bin Executable files
data $prefix Data files, such as .conf configuration files and SQL initialization files

prefix and $ exec-prefix are Python compiler parameters that can be obtained from sys.prefix and sys.exec_prefix. Both defaults on Linux are / usr / local.

npm npm is a Node.js package manager. Running the npm install command will download the dependent packages to the node_modules subdirectory under the current directory. All Node.js runtime dependent packages are in the current directory. However, some Node.js libraries depend on the local environment that was built when you installed the module. If the build environment (such as Windows) and the runtime environment (such as Linux) are different, locally dependent libraries cannot be executed. Also, if the development and runtime libraries are installed at build time, the DDL (such as apt-get) that was locally installed in the operating system's package manager may not exist in the container under the runtime environment.

Troubleshooting issues

Next, let's look at how to resolve issues that occur when Function Compute's dependent libraries are installed.

Dependent libraries installed in the global system directory

Maven and pip install dependent packages in a system directory other than the project directory. When the project is built, Maven packages all external dependent packages into the final product. As a result, Maven-managed projects are free from dependency issues at run time. For JAVA projects that are not managed by Maven, it is also common to place dependent JAR packages in the current directory or its subdirectories and package them in the final product. In this way, you can prevent dependency issues when you run Java. However, such issues occur in a pip-managed Python environment. pip installs the dependencies in the system directory, but the Function Compute production environment (except the / tmp directory) is read-only and you cannot build a prefab environment.

Native dependencies

Common Python and Node.js library files depend on your system's native environment. You have to install the DDL for the compilation and runtime environments, which results in poor portability in both cases.

When Function Compute runs on Debian or Ubuntu, the APT package is used to manage system installation programs and libraries. By default, these programs and libraries are installed in your system directory (for example, / usr / bin, / usr / lib, / usr / local / bin, / usr / local / lib). I will. Therefore, native dependencies must also be installed in the local directory.

Recommended solution

Below are some intuitive solutions.

  1. Make sure that the development system for installing the dependencies matches the production execution system. Use fcli sbox to install the dependencies.

  2. Place all dependency files in your local directory. Copy the module, executable, .dl or .so file from pip to the current directory. However, it is actually difficult to put the dependency file in the current directory.

  3. Library files installed by pip or apt-get are scattered in different directories. This means that you must be familiar with different package managers to find these files.

  4. Library files have transitional dependencies. When a library is installed, other libraries that the library depends on are also installed. For this reason, getting these dependencies manually can be very tedious. In this case, how do you manually install the dependencies in the current directory with minimal manual intervention? The following describes some of the methods used by pip and the APT package manager and compares their strengths and weaknesses.

Installing dependencies on the current directory

Python

Method 1: Use the --install-option parameter.
pip install --install-option="--install-lib=$(pwd)" PyMySQL

Using --install-option will pass the parameters to setup.py. However, the .egg and .whl files do not include the setup.py file. Therefore, using --install-option launches the installation procedure based on the source code package, and setup.py launches the module building process.

--install-option supports the following options:

File type Option
Python modules --install-purelib
extension modules --install-platlib
all modules --install-lib
scripts --install-scripts
data --install-data
C headers --install-headers

If you use --install-lib, the values of --install-purelib and --install-platlib will be overwritten.

Also, --install-option =" --prefix = $ (pwd) " supports installation in the current directory, but under the current directory it says lib / python2.7 / site-packages A subdirectory is created.

advantage.

  1. You can install the module in a local directory such as purelib. Disadvantages: You can install the module in a local directory like purelib.

  2. Not applicable to modules that do not include source code packages.

  3. Build a system without making full use of the Wheel package.

  4. In order to install the module completely, it is necessary to set more parameters, which is troublesome.

** Method 2: Use the --target or -t parameters **

pip install --target=$(pwd) PyMySQL

--target is a new parameter provided by pip. This parameter installs the module directly in the current directory without creating a subdirectory called lib / python2.7 / site-packages. This method is easy to use and can be applied to modules with several dependencies.

** Method 3: Use PYTHONUSERBASE with --user **

PYTHONUSERBASE=$(pwd) pip install --user PyMySQL

If you use ʻuser, the module will be installed in the site.USER_BASEdirectory. The default value for this directory is~ / .local on Linux, ~ / Library / Python / X.Y on MacOS, and % APPDATA% Pythonon Windows. You can change the value ofsite.USER_BASE using the environment variable PYTHONUSERBASE`.

As with prefix =, using --user creates a subdirectory called lib / python2.7 / site-packages.

** Method 4: Use virtualenv **

pip install virtualenv
virtualenv path/to/my/virtual-env
source path/to/my/virtual-env/bin/activate
pip install PyMySQL

virutalenv is the method recommended by the Python community for not polluting the global environment. With virtualenv, both the desired module (such as PyMySQL) and the package manager (such as setuptools, pip, wheel) are stored in a local directory. These modules increase the size of the package but are not used at runtime.

apt-get The DDL and executables installed with apt-get also need to be installed in the local directory. I tried the methods recommended on the internet, chroot and ʻapt-get -o RootDir = $ (pwd), but discarded them because they were flawed. Based on the previous method, we have designed a method to download the DEB package using ʻapt-get and install it using dpkg.

apt-get install -d -o=dir::cache=$(pwd) libx11-6 libx11-xcb1 libxcb1
for f in $(ls ./archives/*.deb)
do 
    dpkg -x $pwd/archives/$f $pwd
done

Running method

Java loads jars and class files by setting the classpath, but nodejs automatically loads packages under node_modules in the current directory. These general operations are omitted here.

Python Python loads the module file from the directory list pointed to by sys.path.

> import sys
> print '\n'.join(sys.path)

/usr/lib/python2.7
/usr/lib/python2.7/plat-x86_64-linux-gnu
/usr/lib/python2.7/lib-tk
/usr/lib/python2.7/lib-old
/usr/lib/python2.7/lib-dynload
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages

By default, sys.path contains the current directory. So if you use the --target or -t parameters in the second method, the module is installed in the current directory and you can ignore the sys.path setting.

Since sys.path is an editable array, you can use sys.path.append (dir) when starting the program. You can also use the environment variable PYTHONPATH to improve the portability of your program.

export PYTHONPATH=$PYTHONPATH:$(pwd)/lib/python2.7/site-packages

apt-get Make sure that the executables and DDL installed using apt-get are available in the directory list set by the environment variables PATH and LD_LIBRARY_PATH.

PATH

The PATH variable shows the list of directories that the system uses to search for executable programs. Add bin or sbin directories such as bin, usr / bin, usr / local / bin to the PATH variable.

export PATH=$(pwd)/bin:$(pwd)/usr/bin:$(pwd)/usr/local/bin:$PATH

The above content also applies to Bash. For Java, Python, node.js, please adjust appropriately when changing the environment variable PATH of the current process.

LD_LIBRARY_PATH

Like PATH, LD_LIBRARY_PATH is a list of directories where you can look up DDL. The system typically places dynamic links in the / lib, / usr / lib, and / usr / local / lib directories. Some modules are placed in subdirectories of these directories, such as / usr / lib / x86_64-linux-gnu. These subdirectories are usually recorded in the files under /etc/ld.so.conf.d/.

cat /etc/ld.so.conf.d/x86_64-linux-gnu.conf
# Multiarch support
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu

Therefore, the so file in the directory declared in the file under $ (pwd) /etc/ld.so.conf.d/ must also be obtained from the directory list set by the LD_LIBRARY_PATH environment variable.

Note that changing the environment variable LD_LIBRARY_PATH at run time may not take effect. The LD_LIBRARY_PATH environment variable presets the / code / lib directory. Therefore, all dependent so files can be softlinked to the / code / lib directory.

Conclusion

This document describes how to run the pip and apt-get commands to install libraries in a local directory and set environment variables at runtime so that you can find the local library files where the program is installed. Explains.

The four methods provided by Python are applicable to any common scenario. Despite the slight differences described above, you can choose the right method for your needs.

apt-get is another way. Compared to other methods, this method does not require you to install the deb package that is already installed on your system, so you can reduce the package size. To further reduce the size, you can remove unnecessary installed files such as user manuals.

This book is part of the accumulation of technology to customize better tools. Based on this, we would like to provide better tools and simplify development in the future.

References

1, How does python find the package? 2, Pip User Guide 3、python-lambda-local 4、python-lambda 5, Python Package Management Tool Guide 6, Are you running apt-get for another partition / directory?

Recommended Posts

Installation of dependent libraries for Alibaba Cloud function calculation
Explore Alibaba Cloud Function Compute for DevOps using Python 3.0
Installation of OMC Cloud Agent --Linux-
Calculation speed of indexing for numpy quadratic array