Einstieg in die KI mit dem NVIDIA Jetson Nano

Get started with AI with the NVIDIA Jetson Nano

Artificial intelligence (AI) is the buzzword of our time. As the central driver of digitization, it is fundamentally changing society, the economy and almost all other areas of life. Many companies are already using AI: in development, in production, in administration. But artificial intelligence also helps us in everyday life – in some areas quite obvious, in others rather hidden. Learn more about AI and how you can use NVIDIA’s Jetson Nano to implement an AI project in this article.

Artificial intelligence

What is artificial intelligence?

Artificial intelligence is the ability of a computer to solve tasks that would normally have to be done by human hands. The system should be able to act intelligently, similar to a human, and learn independently. However, this definition is imprecise, as the term “intelligence” is difficult to define. There are a number of different descriptions of human intelligence. Gardner developed a theory of multiple intelligences that lists eight dimensions of intelligence:

Multiple Intelligenzen: acht Dimensionen der Intelligenz
Figure 1: Source:

As you will see as this article progresses, AI applies to many, but not all, of these dimensions today.

What makes a computer intelligent?

A classic algorithm is usually permanently implemented and makes decisions based on sensor data, user inputs and trigger moments. All actions of a computer have to be programmed by a human. Nowadays, however, the requirements for a computer are so complex that this method quickly reaches its limits. For example, it is impossible to accurately predict the behavior of a user or to implement all existing objects on earth for real-time object recognition in advance.

The computer scientists recognized this problem very early on. Therefore they tried to develop adaptive computer programs. A computer can be trained in the desired functionality using sample data. This procedure can now be used to diagnose diseases based on symptoms or x-rays.

There are various test procedures to assess whether a computer system is intelligent – one of them is the “Turing test”. A person communicates electronically with two partners: a human and a computer. The referee asks both of them a comprehensive catalog of questions. If he cannot clearly identify the PC based on the answers given, he is considered intelligent.

Measured against the previously presented dimensions of intelligence, however, this test must be expanded so that other dimensions than pure knowledge (image, movement, language) can also be recorded.

Machine learning

How does the computer learn?

Figure 2: Source: https://datasolut.com/machine-learning-vs-deep-learning/

Nach menschlicher Definition gilt der Mensch als intelligentes Wesen. Wir lösen eine Vielzahl von realen Problemen, ohne angeben zu können, wie wir es im Einzelnen machen. Ein Beispiel ist die Unterscheidung zwischen Apfel und Birne. Ähnliches gilt für komplexe Bewegungsabläufe wie etwa das Radfahren: Kaum jemand kann im Einzelnen beschreiben, mit welchen Bewegungen er es schafft, das Gleichgewicht auf einem Fahrrad zu halten.

Dies sind Beispiele von implizitem Wissen, welche wir nicht in Regeln oder Anweisungen fassen können. Wenn man keine expliziten Vorschriften hat, mit denen man einen Computer für eine Aufgabe programmieren kann, so ist das Lernen aus Erfahrungen oder Daten eine Alternative. Bei „maschinellem Lernen“ analysiert ein System die verfügbaren Daten und modifiziert sich schrittweise, sodass es seine Aufgabe besser erfüllen kann. Dabei werden drei Arten von Lernaufgaben unterschieden:

Supervised learning

In the case of supervised learning, the system is given what it should learn, such as how to differentiate between bicycles and motorcycles. To this end, he is presented with many pictures of bicycles and motorcycles that have already been manually marked as bicycles or motorcycles. After processing a large number of such examples, the system looks for patterns that can be used to distinguish the objects. In this way, the system can learn to apply these patterns to new images in order to distinguish bicycle images from motorcycle images.

In the case of supervised learning, a distinction is made between two other important use cases: In the case of classification, the system has to choose the answer from a mostly small number of alternatives or classes. One example is the categorization of a product as “good” or “defective”. In the other application, the prognosis (regression), the system has to predict one or more continuous variables, e.g. the maximum temperature and wind strength for the next day in Munich.

Unsupervised learning

In the case of unsupervised learning, the system has to manage without manual specifications. It reads the available data and independently tries to find patterns and regularities in it. It can, for. B. Group data based on similarities. This approach is particularly promising when the data has different components – such as the different words in a written sentence. In the best case scenario, the values ​​of one word can be used to predict the next component.

This type of learning is also called self-supervised learning. It is a form of supervised learning that does not require people to provide annotations. Self-supervised learning is widespread in the world of artificial intelligence. For example, if a system is given the task of forecasting the next images in a video, it must develop a representation of the scenes and predict the possible movements and actions of the objects. If there are enough training videos, a basic “understanding” of the processes in videos is created. The system can use this as a basis for forecasting new video scenes.

Reinforcement learning

In reinforcement learning, also known as reinforcement learning, the system must first perform a series of actions before it gets to know the final result. Examples of this would be board games or controlling robots. After each action, the environment reacts (e.g. the opponent in chess) and the system receives new information about the status (e.g. the board positions), possibly also rewards (e.g. points, victory or defeat quantified by a Numerical value).

The aim of the system is to develop an action strategy with which it can react to any situation in such a way that the highest possible amount of rewards is generated. The technique of reinforcement learning with the help of rewards and punishments is borrowed from psychology and is used, among other things, in the training of dogs.

Deep learning

Technically speaking, deep learning is a subset of machine learning (see Figure 4). It encompasses all methods of searching for patterns and relationships in data using a deep neural network. Artificial neural networks are algorithms that are modeled on the biological model of the human brain.

The structure of an artificial neural network consists of the input layer, the hidden layer and the output layer. Despite their possible complexity, they essentially always have the structures of directed graphs. If an artificial neural network has particularly deep structures, i.e. many different hidden layers, it is called deep learning.

Figure 3: Source: https://datasolut.com/was-ist-deep-learning/

Such a system can, for example, predict the amount of precipitation tomorrow based on today’s readings of air pressure, temperature and wind direction. Although machine learning and deep learning are often used synonymously, there are small but subtle differences. While machine learning is used for structured input data, deep learning is particularly suitable for large unstructured data.

Structured input data can be ordered data records from a database (example: cat with color, shape, face). Machine learning automatically recognizes the information from this and independently develops algorithms to classify these objects. In the case of unstructured input data such as texts, images or music, the deep learning algorithm recognizes unknown structures and uses this to develop a complex model that enables objects to be classified during image recognition. In order to be able to achieve a high model quality, however, this method requires very large amounts of data.

Figure 4: Source: https://datasolut.com/was-ist-deep-learning/

NVIDIA Jetson Nano Kits

The Jetson Nano development board from NVIDIA provides an introduction to the topic of “AI”. This makes it possible to develop cost-effective and energy-efficient AI systems. The single-board computer with four ARM cores and Maxwell GPU as a CUDA computing accelerator and video engine opens up new possibilities for graphics and computation-intensive projects.

NVIDIA offers two versions of the Jetson Nano, the only differences being the size of the working memory (2/4 GB RAM), the display port and the power connection. With a size of around 70 x 45 mm, the Jetson Nano-Module is the smallest Jetson device. This production-ready System-on-Module (SOM) offers great advantages for various industries when providing AI: It provides 472 GFLOPs so that modern AI algorithms can be executed quickly. Several neural networks can be run in parallel and numerous high-resolution sensors can be processed at the same time. With these properties, it is ideal for applications such as entry-level network video recorders, household robots, and intelligent gateways with full analytical capabilities.

CPUQuad-core ARM Cortex-A57 Moure processor
GPUNVIDIA Maxwell architecture with 128 NVIDIA CUDA® cores
Main memory2/4 GB 64-bit LPDDR4, 1600MHz 25.6 GB / s
MemorymicroSD (card not included)
Video encoding4Kp30 | 4x 1080p30 | 9x 720p30 (H.264 / H.265)
Video decoding4Kp60 | 2x 4Kp30 | 8x 1080p30 | 18x 720p30 (H.264 / H.265)
NetworkGigabit Ethernet
Camera1x MIPI CSI-2 connector
DisplayHDMI 2.0
USB1x USB 3.0 Type A, 2x USB 2.0 Type A, USB 2.0 Micro-B
Connections1x SDIO | 2x SPI | 4x I2C | 2x I2S | GPIOs
Size69.6mm x 45mm
Mechanics260-pin edge connector

What is a CUDA core?

The term CUDA stands for “Compute Unified Device Architecture” and is an architecture developed by NVIDIA for parallel calculations. The additional use of the GPU relieves the CPU and increases the computing power of a computer. Since both cores are found on microprocessors based on semiconductor technology, CUDA cores are usually considered to be equivalent to CPU cores. In addition, both cores can process data, whereby the CPU is used for serial data processing, while the GPU is used for parallel data processing. However, CUDA cores are less complex. In addition, 128 CUDA cores are built into the GPU of the Jetson Nano, whereas the CPU only contains four cores. Apart from that, CUDA is used variably: in image and video processing, but also in the medical field, such as for CT image reconstructions. Due to the demanding development environment, CUDA is also often used in the areas of AI, deep learning and machine learning. Other areas of application are computer biology and chemistry, ray tracing and seismic analyzes.

What is possible with the Jetson Nano?

Due to its compact design, the Jetson Nano can be perfectly integrated into robot projects. With 128 CUDA cores, the single-board computer can carry out many operations in parallel and thus enables the use of several sensors with real-time calculation. Conversely, this means that it would be possible to develop an autonomously driving car with a Jetson Nano. Finally, thanks to the support of CUDA, a neural network could be trained directly on the Jetson Nano. In contrast, such a project with a Raspberry Pi could only be implemented with an additional GPU.

Classic AI systems, such as object / person recognition in real time, are also possible with the Jetson Nano. With the help of the open source libraries OpenCV or YOLO, you can download pre-trained models in order to then carry out object / person recognition. It is also possible to train your own model. With enough data records, for example, people or car license plates can be clearly identified on surveillance cameras in order to automate door opening, for example.

Real-time object recognition with the Jetson Nano

With the help of a simple project, we would like to show you the strength of the Jetson Nano, as mentioned above. Object recognition as a typical AI requirement for systems can basically also be mastered by single-board computers without a CUDA core, such as the Raspberry Pi. In order to be able to solve this requirement in real time, however, CUDA cores are indispensable, since large amounts of data have to be calculated in parallel and in real time. With its 128 CUDA cores, the Jetson Nano is ideally suited for this and was therefore selected for this project.

Effort

Since the AI ​​for real-time object recognition has already been preprogrammed within the open source environment of OpenCV and YOLO, the effort of this how-to is limited. As long as no new objects are to be taught to the AI, you do not have to train the neural network any further. However, if the AI ​​is to be further adapted, some in-depth technical knowledge is required. The real effort, however, is to download and install the AI ​​model and prepare the necessary setup (camera connection) for the application.

Level of difficulty

Medium: Without training the AI
Difficult: With training in AI (knowledge of OpenCV / YOLO and programming knowledge in Python and C)

Expenditure of time

  • Download and installation of the operating system: approx. 45 min
  • Download and installation of the AI ​​model (depending on the read / write speed of the microSD card and internet bandwidth): approx. 4-6 hours
  • Configuration: approx. 15 min
  • Total: 5-7 h

Costs

  • Total price: Hardware approx. € 130 + standard equipment (monitor, keyboard, mouse, Android phone with camera)

Hardware

Software

The software used is completely open source based and therefore available free of charge.

Installation

To install the operating system, you need a microSD card with a minimum size of 64GB. For this project we recommend that you use a fast MicroSD card, as the installation of the training data – depending on the writing speed – can take more than six hours. If you also use the 4GB version of the Jetson Nanos, you can achieve a higher FPS (frames per second) rate when detecting objects.

First, download the appropriate version of the operating system from the NVIDIA developer website. The BalenaEtcher program can be used for the image flash. All you have to do is select the loaded image and the microSD card (Figure 5) – the flash process can be started. Don’t be surprised though: the process can take up to 30 minutes to complete.

Flashvorgang mit BalenaEtcher
Figure 5: Flash process with BalenaEtcher

After the flash process, the microSD card can be inserted into the corresponding microSD card slot of the Jetson Nano. Before you supply the Jetson Nano with power, connect a keyboard, mouse and screen via the HDMI / display port. You can now start configuring the operating system.

To do this, follow the instructions on the screen. Make sure you choose “nvidia” as your username. The code of this project only works with this name. It is possible to work around this, but you would have to change the code in the appropriate places.

Flashvorgang mit BalenaEtcher
Figure 6: Setup: Username “nvidia”

Once the configuration process has been completed, the Jetson Nano will automatically reboot. Then log in with your password and the desktop environment will open.

Start the terminal and download the corresponding Github repository. To do this, enter the following command in the command line:

git clone https://github.com/wiegehtki/nvjetson_opencv_gsi.git

Use the command cp nvjetson_opencv_gsi/Installv2.3.8.sh to copy the “Installv2.3.8.sh” file from the “nvjetson_opencv_gsi” directory into the current one. Then all “sudoers” shell scripts must be integrated into the current directory using the command cp nvjetson_opencv_gsi/nv*sh. The shell scripts can be read with the command chmod + x *sh.

To prevent the system from constantly asking you to enter your password during the installation, you can grant yourself “superuser rights” with the following command:

sudo su
./nvidia2sudoers.sh
exit

Now you can open the terminal again and start the installation with ./Installv2.3.8.sh. Now restart the terminal and start the installation:

In order to be able to follow the progress of the installation, open a second terminal window.

Please note: The installation can take several hours (approx. four to six), after all models that have already been trained must be downloaded again. When the process is complete, the Jetson Nano will restart automatically. If you now see the login screen, the installation was successful.

Configuration of the IP webcam app

In this project, a cell phone camera is used for object recognition. So that the Jetson Nano can access the mobile phone camera via WLAN, the “IP Webcam” app must be downloaded. In order not to flood the app with data from the Jetson Nano and thus bring the smartphone’s memory to the limit, some settings may have to be made in the app.

Audio-Modus deaktivieren und Server starten
Figure 7: Deactivate audio mode and start the server
Einstellung der Auflösung, Qualität und FPS
Figure 8: Setting the resolution, quality and FPS
Automatische Kameraerkennung deaktivieren
Figure 9: Deactivate automatic camera detection
IP-Adresse und Port der Kamera
Figure 10: IP address and port of the camera

First, deactivate the audio mode (Figure 7: Audio Modus = Audio mode), since only images are to be processed in this project. Next you can adjust the resolution, quality and FPS under “Video Settings” (Figure 8: Videoauflösung = Video resolution; Qualität = Quality; FPS Limitierung = FPS limitation ). If you are working with the 4 GB version of the Jetson Nano, you can leave the resolution at Full HD. A lower resolution is recommended for the less powerful version (2 GB). In addition, the automatic camera detection “ONVIF” can be deactivated (Figure 9: Unterstützung von ONVIF aktivieren = Activate ONVIF support), as this function only causes additional overhead in the data packet.

When all the settings have been made, you can now open the camera under “Start Server” (Figure 7: Server starten=Start server, Videostreaming starten= Start video streaming). The corresponding IP address, via which the Jetson Nano can connect, and the port (to the right of the IP address) must be noted down for the next step (Figure 10).

Start real-time object recognition

Before object recognition can be started, you have to save the IP address and the port of the cell phone camera in the shell script. Start the terminal and do the following:

cd darknet
nano smartcam.sh

A window now opens (Figure 11) in which you enter the appropriate IP address and port at the marked position.

Now you need to save the change and close the shell script.

Enter the command ./smartcam.sh in the same directory to start real-time object recognition.

IP-Adresse und Port der Handykamera hinterlegen
Figure 11: Store the IP address and port of the mobile phone camera

The Jetson Nano will automatically connect to the camera. You can now see the picture in a separate window (see Figure 12). The AI ​​recognizes various objects in real time and can classify them by specifying a certain confidence level (e.g. 94%). Some items are still unknown to the AI. With a little programming work, you can train the AI ​​system in the development environment and expand your knowledge as you wish.

Echtzeit-Objekterkennung mit Nennung der Vertrauenswahrscheinlichkeit
Figure 12: Real-time object recognition with an indication of the probability of confidence

Summary

In this project, we have shown you how you can put your own AI project into practice with little effort. All you need is a Jetson Nano from NVIDIA, a computer that you use to program everything, and a smartphone camera. Using real-time recognition, the Jetson Nano can classify objects and, with a little help, even make new object classifications. So it doesn’t take much for you to immerse yourself in the world of AI. Try it too.

Pictures: reichelt elektronik GmbH & Co. KG

Leave a Reply

Your email address will not be published. Required fields are marked *