General

[OpenVINO™ Getting Started] Install Tensorflow Training Model with TensorMan (English)

userHead LattePanda 2020-11-30 15:43:09 9671 Views0 Replies

Hello, fellow panda lovers!

Here is yet another wonderful post from the phenomenal AI Developer Contest that DFRobot held back in August. This post has just been translated from the original Chinese to English for your convenience, but the original post by community user pATAq can be found here. When reposting this article, please give credit where credit is due, and please enjoy!
 

Tensorman
 

Foreword

For original articles and reprinted references, please be sure to indicate the link. If there are any omissions, please correct me.

Rescue those who are stuck in the quagmire of TensorFlow GPU development environment configuration.

Recently, I participated in the "Industry AI Developer Competition" jointly organized by DFRobot and Intel. Due to a weak foundation (in this particular field), there were a lot of pitfalls in the process of exploration, and, of course, I gained a lot of experience. Migration learning uses TensorFlow 1.14 GPU + Object Detection API. Later, I accidentally learned that TensorMan (hereinafter referred to as tm) is a tool that can simplify the development environment configuration. The advantage of using Linux is that this development is more convenient overall, and it is more convenient to set system environment variables than in Windows. The hardware and software environment of this article is as follows:

Pop! OS/ Ubuntu MATE/Lubuntu 20.04TenserFlow 1.14 (Although there is tf 1.15, 1.14 is the most stable version before tf 2.x)Python 3.6PC [AMD R5 2600 + Nvidia GTX1660] (for TF GPU)LattePanda Delta / v1 (for TF CPU)

1. The Conventional Method of Installing TensorFlow and Object Detection API

Due to the need for migration learning, we used the Nvidia GPU and also used the power of CUDA to increase its speed. You can refer to the official installation guide, where we have summarized three different possible setups.

AMD also has a project called ROCm (full name: Radeon Open Compute platform) whose goal is to establish an ecosystem that can replace CUDA. But the application is not extensive enough. For more details, please refer to the relevant introduction.

To use TensorFlow's GPU, the following NVIDIA® software must be installed in the system:
 

NVIDIA® GPU driver: CUDA 10.1 requires version 418.x or higher.CUDA® Toolkit: TensorFlow supports CUDA 10.1 (at least for TensorFlow 2.1.0 and higher)CUPTI included within the CUDA toolkit.cuDNN SDK (version 7.6 and higher)(Optional) TensorRT 6.0, which can shorten the delay time for inference with certain models and improve throughput.

1.1 Local Installation

First, you will need to download and install the above software packages by yourself, which, when compared with commercial maintenance software, does not necessarily follow the logic that the newer the open source software is, the better it is. For example, Nvidia recently released cuda 11, which already supports the latest graphics drivers, but we still need to use cuda 10.x for the cuda toolkit, and also use the corresponding version of the cuDNN SDK.

[The cuda version supported by the system driver] uses nvidia-smi tools, so you can also view the graphics card's current computational load, temperature, memory usage and other information.

1.2 Anaconda Package

Anaconda is a Python distribution for scientific computing, and supports Linux, Mac, and Windows, and includes many popular scientific computing and data analysis Python packages. To install tensorflow through the conda tool, you only need to install the GPU driver into the system, and everything else can be installed automatically with one click, without worrying too much about dependencies. However, due to the deep connection with the system, there are still many problems in actual use, such as the "cuDNN not initalized" error. It is recommended to use the following methods for Windows computers.

1.3 Docker Container

Because of the headaches caused by various dependency problems, people can't help but think of a solution that can solve the complicated environment dependency problem--the Docker, which packs various software packages and uses them out of the box. The official TensorFlow Docker image has been configured to run TensorFlow. Docker containers can run in a virtual environment and are the easiest way to set up GPU support.

2. Pop!_OS and Tensorman

Pop!_OS is based on a derivative version of Ubuntu and is maintained by System76. The latest version is 20.04. System76 is an American company that specializes in manufacturing equipment running Linux.

You can see that POS can easily install and run various productivity tools, one of which is called TensorMan. You can easily manage the tf tool chain, which simplifies the installation and use of the official image docker command. For details, please refer to the project homepage and introduction document.

2.1 When Starting Out

First, let's set up some software mirror sources to speed up the download.

0x01 Speed Up the Apt Software Warehouse

It is recommended to use Lemonitor's speed test tool to find the fastest software source address in the current network environment. Take the example below which uses Alibaba Cloud, and remember to modify it according to your own situation:
 

Code: Select all

sudo sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list sudo sed -i 's/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list Reference: mirrors.ustc.edu.cn

0x02 Speed Up Pip to Download the Python Library

After all modifications, the image created with tensorman will inherit these settings.
 

Code: Select all

# Create directory and configuration file mkdir ~/.pip && touch ~/.pip/pip.conf # Modify the configuration file cat << _EOF_ >> ~/.pip/pip.conf [global] index-url = https://mirrors.aliyun.com/pypi/simple/ [install] trusted-host=mirrors.aliyun.com _EOF_ Reference: developer.aliyun.com

0x03 Speed Up the Docker Hub's Mirror Warehouse

Modify or create the /etc/docker/daemon.json file, add the following content, and pay attention to the configuration of the file format and commas:
 

Code: Select all

{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "registry-mirrors": [ "https://2h3po24q.mirror.aliyuncs.com" ] } [Choose one of two] If the CPU version is installed, then the "registry-mirrors" is:
 

Code: Select all

{ "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"] } When I use Aliyun here, it is much faster. Also, ustc can be used, but it is sometimes unstable.
 

Code: Select all

# Restart the dockerd sudo systemctl restart docker Reference:

mirrors.ustc.edu.cndockerd configuration reference

0x04 Speed Up the Ubuntu PPA

PPA stands for Personal Package Archive. By adding a PPA source, Ubuntu can install a new version of the software. However, because it is personally maintained, the quality of the software is uneven. A detailed introduction can be moved to the Ubuntu PPA User Guide. Because the PPA software source is relatively fragmented, it is rare to directly establish a mirror, but there is a reverse proxy method that can accelerate the process in disguise. Of course, you can also build your own proxy to speed up the process, such as using tsocks.
 

Code: Select all

sudo su # Find all the software warehouse addresses in the sources.list.d directory and modify them to the ustc anti-generation address. find /etc/apt/sources.list.d/ -type f -name "*.list" -exec sed -i.bak -r 's#deb(-src)?\s*http(s)?://ppa.launchpad.net#deb\1 https://launchpad.proxy.ustclug.org#ig' {} \; Reference: Can you mirror the software in the ppa warehouse? and Help with reverse proxy use

Note that http is contaminated. It is mandatory to use https. It is recommended that all sources use https, although performance speed will be lost.

0x05 Speed Up Anaconda (Optional)
 

Code: Select all

nano ~/.condarc # Add the following content channels: - defaults show_channel_urls: true channel_alias: https://mirrors.tuna.tsinghua.edu.cn/anaconda default_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/pro - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud intel: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud # Clear the index cache to ensure that the index provided by the mirror station is used. conda clean -i # Test and install openvino with numpy and intel channel (needs to be specified by -c) # conda install numpy # conda install openvino -c intel Reference: mirrors.tuna.tsinghua.edu.cn

2.1 Use TensorMan on Pop! OS

Now everything is ready, so let's show off what we've prepared:

# Non-Pop! OS cannot be installed directly, so if you want to use Tensorman directly, please refer to the following code
sudo apt install tensorman
# $USER can be root or the current user name
sudo usermod -aG docker $USER
# Need root privileges to execute, so use the official image to create a container with version 1.14, expose the container's port 8888, and then use Python3
# Install Jupyter Notebook, whose container name is pc-gpu. Here, I will install it on the PC host
sudo tensorman +1.14.0 run -p 8888:8888 --root --python3 --jupyter --gpu --name tf-gpu bash
# [Choose one of two] If there is no Nvidia GPU, we can also create a TensorFlow CPU version: remove the --gpu parameter
# I am here to install on LattePanda v1 and LattePanda Delta
sudo tensorman +1.14.0 run -p 8888:8888 --root --python3 --jupyter --name lp-cpu bash

The image will be pulled when the container is created. If you don't perform the previous steps to speed up the entire process, but rather attempt to get it directly from the docker hub, it will be very time-consuming and will often fail. After installation, it will automatically enter the following interface.

By going through sources.list, you are able to see that the mirror is for Ubuntu 18.04 (Bionic), where you can execute the following command to proceed with updates in the non-host machine in the container:
 

Code: Select all

sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list sed -i 's/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list apt-get update && apt upgrade -y pip install --upgrade pip setuptools Use TensorFlow CPU on a CPU that does not support the avx/avx2 instruction set....

If you are using the GPU version, it can be used normally, and importing tensorflow can occur normally in Python without reporting an error, so you can skip this paragraph. On LP1 and LPD, as shown in the figure above, an error "Illegal instruction (core dumped)" will be reported. The reason is that, after TensorFlow v1.6, the officially compiled versions require the CPU to support the avx instruction set, and our Intel Celeron N4100/ Atom X5 z8350 does not support this instruction set, so an error is reported.

How do I know if my CPU supports the avx instruction set? There are three ways to find out.

1. Visit the advanced search page of Intel's product specifications, select the instruction set extension for the filter criteria -> Intel AVX, and then select the family for the second filter criteria. You can view whether or not the avx instruction set is supported by the Atom/Celeron/Core series processors. Here, you can see that the current Atom And Celeron series consumer processors do not support this avx instruction set.

2. You can search for keywords like "cpuz + N4100" to view screenshots of processor specifications

{CPU-Z Snapshot}
We can see that the N4100 does not support the AVX instruction set, while the AMD low-end processor Athlon 200GE, which is less than 200 yuan in bulk, is very conscientious. Like a toothpaste factory which makes small modifications and changes to an already great product to make it better, Intel really lives up to its name and standards. Via a/b

3. If you are using a Linux system, you can also use cat /proc/cpuinfo | grep avx to see if it is supported

Solutions

Compile by yourself or use a third-party TensorFlow package that does not require the avx instruction set. Please search for the former, such as docker-tensorflow-builder. Now, let’s introduce the second method, as well.
There is a project called tensorflow-community-wheels. You can find the compiled tensorflow package shared by others in the Issues tab. We need to find the noavx package corresponding to the proper Python version, such as this: Tensorflow 1.14.1, Python 3.6, libc-2.27 , linux_x86_64, whl (B970 noAVX, noCUDA, CPU-only). Download and get the whl file.
 

Code: Select all

# Uninstall the old tlf pip uninstall tensorflow # /Projects is the home directory of the host, where you can install the downloaded whl third-party precompiled package pip install tensorflow-1.14.1-cp36-cp36m-linux_x86_64.whl # Upgrade to solve the problem of being stuck in compiling for a long time when installing grpcio and directly install the precompiled package pip install --upgrade pip setuptools 1. Some online friends have even said that you could install tf through the command conda install -c conda-forge tensorflow and solve the problem, but I did not test this
2. When OpenVINO installs model optimizer dependencies, the default installed TF CPU version also requires the avx instruction set, and this problem can also be solved by the above methods.

Test the Installation

{Test tf}
You can see the normal import and successful use of tf
 

Code: Select all

jupyter notebook --ip=0.0.0.0 --no-browser --allow-root

 

{Test JupyterNotebook 1}

{Test JupyterNotebook 2}

When opening the browser, you can see that Jupyter can run correctly

We can install tmux to make Jupyter run in the background.

Create a Custom Image

The above method will not be destroyed immediately after each run. The container will be recreated from the official image next time and the changes will not be saved. Here we try to create an image to save the changes.
 

Code: Select all

# Because we want to use tensorflow object detection api, install the following packages apt-get install -y libsm6 libxext6 libxrender-dev tmux pip install opencv-python cython lxml pillow pip install --upgrade protobuf # Execute the following command in the terminal of the host to save the custom image, named lp-cpu tensorman save lp-cpu lp-cpu # Execute the following command in the terminal of the host to view the image and container sudo docker images tensorman show sudo tensorman list docker ps # Execute the following command to start the custom mirror for next time sudo tensorman'=lp-cpu' run -p 8888:8888 --root --name lp-cpu2 bash # Execute the following command in the container to start jupyter jupyter notebook --allow-root --ip=0.0.0.0 --no-browser Note:
 

The name of the saved image can be customized, I use lp-cpu hereIt is recommended to specify the name each time the container is started for easy customizationFor more usage methods, please refer to the introduction document

Test the Object Detection API

You can refer to this unofficial tutorial for installation. Since the preliminary work has been done, you can start to view it from the Downloading the TensorFlow Models section. Primary focus:
 

COCO API needs to be installedAdd the following to set the GPU usage, otherwise an error will be reported

Code: Select all

config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config) OR
 

Code: Select all

config = tf.compat.v1.ConfigProto(allow_soft_placement=True) config.gpu_options.per_process_gpu_memory_fraction = 0.3 tf.compat.v1.keras.backend.set_session(tf.compat.v1.Session(config=config)) If so, you should execute the classic demo correctly and output a successful graph.

Recently, the new version of Object Detection API supportsTensorFlow 2.x. You can refer to the updated documentation to learn more.

2.2 When Using Tensorman on Other Ubuntu 19.10+ and Derivative Releases

TensorMan is exclusive to Pop! OS, we tried to use it on the Ubuntu release through modification. Considering now that the software source could be different.
Found this PPA source:
https://launchpad.net/~system76/+archive/ubuntu/pop
There is also the apt.pop-os.org software repository.
 

Code: Select all

sudo add-apt-repository ppa:system76/pop sudo su # Find all the software warehouse addresses in the sources.list.d directory and modify them to the ustc anti-generation address. find /etc/apt/sources.list.d/ -type f -name "*.list" -exec sed -i.bak -r's#deb(-src)?\s*http(s)?:/ /ppa.launchpad.net#deb\1 https://launchpad.proxy.ustclug.org#ig' {} \; # Add apt.pop-os.org software warehouse, modify focal according to the situation: 20.04 is the corresponding version code. E.g. eoan: 19.10 cat << _EOF_ >> /etc/apt/sources.list.d/pop-os.list deb http://apt.pop-os.org/proprietary focal main _EOF_ apt-get update Then you can use TensorMan normally. The PPA source is for Pop! OS. Using it on other Ubuntu distributions may cause unknown problems, but when I use it it works well.

2.3 When Using Tensorman on Ubuntu 18.04

The main reason for problems occurring here is that the Docker version that comes with 18.04 is too old and does not support GPU-related functions, but it can still be used by adding PPA sources to the update dokcer. For details, refer to the following link: When is it coming to 18.04 LTS?