Frank Hofmann – Linux Hint

Introduction to Apache Solr. Part 2: Querying Solr

Frank Hofmann — Tue, 02 Mar 2021 06:04:00 +0000

Apache Solr [1] is a search engine framework written in Java and based on the Lucene search library [6]. In the previous article, we set up Apache Solr on the soon-to-be-released Debian GNU/Linux 11, initiated a single data core, uploaded example data, and demonstrated how to do a basic search within the data set using a simple query.

This is a follow-up article to the previous one. We will cover how to refine the query, formulate more complex search criteria with different parameters, and understand the Apache Solr query page’s different web forms. Also, we will discuss how to post-process the search result using different output formats such as XML, CSV, and JSON.

Querying Apache Solr

Apache Solr is designed as a web application and service that runs in the background. The result is that any client application can communicate with Solr by sending queries to it (the focus of this article), manipulating the document core by adding, updating, and deleting indexed data, and optimizing core data. There are two options — via dashboard/web interface or using an API by sending a corresponding request.

It is common to use the first option for testing purposes and not for regular access. The figure below shows the Dashboard from the Apache Solr Administration User Interface with the different query forms in the web browser Firefox.

First, from the menu under the core selection field, choose the menu entry “Query”. Next, the dashboard will display several input fields as follows:

Request handler (qt):
Define which kind of request you would like to send to Solr. You can choose between the default request handlers “/select” (query indexed data), “/update” (update indexed data), and “/delete” (remove the specified indexed data), or a self-defined one.
Query event (q):
Define which field names and values to be selected.
Filter queries (fq):
Restrict the superset of documents that can be returned without affecting the document score.
Sort order (sort):
Define the sort order of the query results to either ascending or descending
Output window (start and rows):
Limit the output to the specified elements
Field list (fl):
Limits the information included in a query response to a specified list of fields.
Output format (wt):
Define the desired output format. The default value is JSON.

Clicking on the Execute Query button runs the desired request. For practical examples, have a look below.

As the second option, you can send a request using an API. This is an HTTP request that can be sent to Apache Solr by any application. Solr processes the request and returns an answer. A special case of this is connecting to Apache Solr via Java API. This has been outsourced to a separate project called SolrJ [7] — a Java API without requiring an HTTP connection.

Query syntax

The query syntax is best described in [3] and [5]. The different parameter names directly correspond with the names of the entry fields in the forms explained above. The table below lists them, plus practical examples.

Query Parameters Index

Parameter	Description	Example
q	The main query parameter of Apache Solr — the field names and values. Their similarity scores document to terms in this parameter.	Id:5 cars:adilla *:X5
fq	Restrict the result set to the superset documents that match the filter, for example, defined via Function Range Query Parser	model id,model
start	Offsets for page results (begin). The default value of this parameter is 0.	5
rows	Offsets for page results (end). The value of this parameter is 10 by default	15
sort	It specifies the list of fields separated by commas, based on which the query results are to be sorted	model asc
fl	It specifies the list of the fields to return for all the documents in the result set	model id,model
wt	This parameter represents the type of response writer we wanted to view the result. The value of this is JSON by default.	json xml

Searches are done via HTTP GET request with the query string in the q parameter. The examples below will clarify how this works. In use is curl to send the query to Solr that is installed locally.

Retrieve all the datasets from the core cars
curl http://localhost:8983/solr/cars/query?q=*:*
Retrieve all the datasets from the core cars that have an id of 5
curl http://localhost:8983/solr/cars/query?q=id:5
Retrieve the field model from all the datasets of the core cars
Option 1 (with escaped &):

curl http://localhost:8983/solr/cars/query?q=id:*\&fl=model

Option 2 (query in single ticks):

curl 'http://localhost:8983/solr/cars/query?q=id:*&fl=model'
Retrieve all datasets of the core cars sorted by price in descending order, and output the fields make, model, and price, only (version in single ticks):
curl http://localhost:8983/solr/cars/query -d '
q=*:*&
sort=price desc&
fl=make,model,price '
Retrieve the first five datasets of the core cars sorted by price in descending order, and output the fields make, model, and price, only (version in single ticks):
curl http://localhost:8983/solr/cars/query -d '
q=*:*&
rows=5&
sort=price desc&
fl=make,model,price '
Retrieve the first five datasets of the core cars sorted by price in descending order, and output the fields make, model, and price plus its relevance score, only (version in single ticks):
curl http://localhost:8983/solr/cars/query -d '
q=*:*&
rows=5&
sort=price desc&
fl=make,model,price,score '
Return all stored fields as well as the relevance score:
curl http://localhost:8983/solr/cars/query -d '
q=*:*&
fl=*,score '

Furthermore, you can define your own request handler to send the optional request parameters to the query parser in order to control what information is returned.

Query Parsers

Apache Solr uses a so-called query parser — a component that translates your search string into specific instructions for the search engine. A query parser stands between you and the document that you are searching for.

Solr comes with a variety of parser types that differ in the way a submitted query is handled. The Standard Query Parser works well for structured queries but is less tolerant of syntax errors. At the same time, both the DisMax and Extended DisMax Query Parser are optimized for natural language-like queries. They are designed to process simple phrases entered by users and to search for individual terms across several fields using different weighting.

Furthermore, Solr also offers so-called Function Queries that allow a function to be combined with a query in order to generate a specific relevance score. These parsers are named Function Query Parser and Function Range Query Parser. The example below shows the latter one to pick all the data sets for “bmw” (stored in the data field make) with the models from 318 to 323:

curl http://localhost:8983/solr/cars/query -d '
q=make:bmw&
fq=model:[318 TO 323] '

Post-processing of results

Sending queries to Apache Solr is one part, but post-processing the search result from the other one. First, you can choose between different response formats — from JSON to XML, CSV, and a simplified Ruby format. Simply specify the corresponding wt parameter in a query. The code example below demonstrates this for retrieving the dataset in CSV format for all the items using curl with escaped &:

curl http://localhost:8983/solr/cars/query?q=id:5\&wt=csv

The output is a comma-separated list as follows:

In order to receive the result as XML data but the two output fields make and model, only, run the following query:

curl http://localhost:8983/solr/cars/query?q=*:*\&fl=make,model\&wt=xml

The output is different and contains both the response header and the actual response:

Wget simply prints the received data on stdout. This allows you to post-process the response using standard command-line tools. To list a few, this contains jq [9] for JSON, xsltproc, xidel, xmlstarlet [10] for XML as well as csvkit [11] for CSV format.

Conclusion

This article shows different ways of sending queries to Apache Solr and explains how to process the search result. In the next part, you will learn how to use Apache Solr to search in PostgreSQL, a relational database management system.

About the authors

Jacqui Kabeta is an environmentalist, avid researcher, trainer, and mentor. In several African countries, she has worked in the IT industry and NGO environments.

Frank Hofmann is an IT developer, trainer, and author and prefers to work from Berlin, Geneva, and Cape Town. Co-author of the Debian Package Management Book available from dpmb.org

Links and References

[1] Apache Solr, https://lucene.apache.org/solr/
[2] Frank Hofmann and Jacqui Kabeta: Introduction to Apache Solr. Part 1, http://linuxhint.com
[3] Yonik Seelay: Solr Query Syntax, http://yonik.com/solr/query-syntax/
[4] Yonik Seelay: Solr Tutorial, http://yonik.com/solr-tutorial/
[5] Apache Solr: Querying Data, Tutorialspoint, https://www.tutorialspoint.com/apache_solr/apache_solr_querying_data.htm
[6] Lucene, https://lucene.apache.org/
[7] SolrJ, https://lucene.apache.org/solr/guide/8_8/using-solrj.html
[8] curl, https://curl.se/
[9] jq, https://github.com/stedolan/jq
[10] xmlstarlet, http://xmlstar.sourceforge.net/
[11] csvkit, https://csvkit.readthedocs.io/en/latest/

Apache Solr: Setup a Node

Frank Hofmann — Sun, 21 Feb 2021 14:03:46 +0000

Part 1: Setting up a single node

Today, electronically storing your documents or data on a storage device is both quick and easy, it is comparably cheap, too. In use is a filename reference that is meant to describe what the document is about. Alternatively, data is kept in a Database Management System (DBMS) like PostgreSQL, MariaDB, or MongoDB to just name a few options. Several storage mediums are either locally or remotely connected to the computer, such as USB stick, internal or external hard disk, Network Attached Storage (NAS), Cloud Storage, or GPU/Flash-based, as in an Nvidia V100 [10].

In contrast, the reverse process, finding the right documents in a document collection, is rather complex. It mostly requires detecting the file format without fault, indexing the document, and extracting the key concepts (document classification). This is where the Apache Solr framework comes in. It offers a practical interface to do the steps mentioned — building a document index, accepting search queries, doing the actual search, and returning a search result. Apache Solr thus forms the core for effective research on a database or document silo.

In this article, you will learn how Apache Solr works, how to set up a single node, index documents, do a search, and retrieve the result.

The follow-up articles build on this one, and, in them, we discuss other, more specific use cases such as integrating a PostgreSQL DBMS as a data source or load balancing across multiple nodes.

About the Apache Solr project

Apache Solr is a search engine framework based on the powerful Lucene search index server [2]. Written in Java, it is maintained under the umbrella of the Apache Software Foundation (ASF) [6]. It is freely available under the Apache 2 license.

The topic “Find documents and data again” plays a very important role in the software world, and many developers deal with it intensively. The website Awesomeopensource [4] lists more than 150 search engine open-source projects. As of early 2021, ElasticSearch [8] and Apache Solr/Lucene are the two top dogs when it comes to searching for larger data sets. Developing your search engine requires a lot of knowledge, Frank does that with the Python-based AdvaS Advanced Search [3] library since 2002.

Setting up Apache Solr:

The installation and operation of Apache Solr are not complicated, it is simply a whole series of steps to be carried out by you. Allow about 1 hour for the result of the first data query. Furthermore, Apache Solr is not just a hobby project but is also used in a professional environment. Therefore, the chosen operating system environment is designed for long-term use.

As the base environment for this article, we use Debian GNU/Linux 11, which is the upcoming Debian release (as of early 2021) and expected to be available in mid-2021. For this tutorial, we expect that you have already installed it,–either as the native system, in a virtual machine like VirtualBox, or an AWS container.

Apart from the basic components, you need the following software packages to be installed on the system:

Curl
Default-java
Libcommons-cli-java
Libxerces2-java
Libtika-java (a library from the Apache Tika project [11])

These packages are standard components of Debian GNU/Linux. If not yet installed, you can post-install them in one go as a user with administrative rights, for example, root or via sudo, shown as follows:

# apt-get install curl default-java libcommons-cli-java libxerces2-java libtika-java

Having prepared the environment, the 2nd step is the installation of Apache Solr. As of now, Apache Solr is not available as a regular Debian package. Therefore, it is required to retrieve Apache Solr 8.8 from the download section of the project website [9] first. Use the wget command below to store it in the /tmp directory of your system:

$ wget -O /tmp https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz

The switch -O shortens –output-document and makes wget store the retrieved tar.gz file in the given directory. The archive has a size of roughly 190M. Next, unpack the archive into the /opt directory using tar. As a result, you will find two subdirectories — /opt/solr and /opt/solr-8.8.0, whereas /opt/solr is set up as a symbolic link to the latter one. Apache Solr comes with a setup script that you execute next, it is as follows:

# /opt/solr-8.8.0/bin/install_solr_service.sh

This results in the creation of the Linux user solr runs in the Solr service plus his home directory under /var/solr establishes the Solr service, added with its corresponding nodes, and starts the Solr service on port 8983. These are the default values. If you are unhappy with them, you can modify them during installation or even latersince the installation script accepts corresponding switches for setup adjustments. We recommend you to have a look at the Apache Solr documentation regarding these parameters.

The Solr software is organized in the following directories:

bin
contains the Solr binaries and files to run Solr as a service
contrib
external Solr libraries such as data import handler and the Lucene libraries
dist
internal Solr libraries
docs
link to the Solr documentation available online
example
example datasets or several use cases/scenarios
licenses
software licenses for the various Solr components
server
server configuration files, such as server/etc for services and ports

In more detail, you can read about these directories in the Apache Solr documentation [12].

Managing Apache Solr:

Apache Solr runs as a service in the background. You can start it in two ways, either using systemctl (first line) as a user with administrative permissions or directly from the Solr directory (second line). We list both terminal commands below:

# systemctl start solr
$ solr/bin/solr start

Stopping Apache Solr is done similarly:

# systemctl stop solr
$ solr/bin/solr stop

The same way goes in restarting the Apache Solr service:

# systemctl restart solr
$ solr/bin/solr restart

Furthermore, the status of the Apache Solr process can be displayed as follows:

# systemctl status solr
$ solr/bin/solr status

The output lists the service file that was started, both the corresponding timestamp and log messages. The figure below shows that the Apache Solr service was started on port 8983 with process 632. The process is successfully running for 38 minutes.

To see if the Apache Solr process is active, you may also cross-check using the ps command in combination with grep. This limits the ps output to all the Apache Solr processes that are currently active.

# ps ax | grep --color solr

The figure below demonstrates this for a single process. You see the call of Java that is accompanied by a list of parameters, for example memory usage (512M) ports to listen on 8983 for queries, 7983 for stop requests, and type of connection (http).

Adding users:

The Apache Solr processes run with a specific user named solr. This user is helpful in managing Solr processes, uploading data, and sending requests. Upon setup, the user solr does not have a password and is expected to have one to log in to proceed further. Set a password for the user solr like user root, it is shown as follows:

# passwd solr

Solr Administration:

Managing Apache Solr is done using the Solr Dashboard. This is accessible via web browser from http://localhost:8983/solr. The figure below shows the main view.

On the left, you see the main menu that leads you to the subsections for logging, administration of the Solr cores, the Java setup, and the status information. Choose the desired core using the selection box below the menu. On the right side of the menu, the corresponding information is displayed. The Dashboard menu entry shows further details regarding the Apache Solr process, as well as the current load and memory usage.

Please know that the contents of the Dashboard changes depending on the number of Solr cores, and the documents that have been indexed. Changes affect both the menu items and the corresponding information that is visible on the right.

Understanding How Search Engines Work:

Simply speaking, search engines analyze documents, categorize them, and allow you to do a search based on their categorization. Basically, the process consists of three stages, which are termed as crawling, indexing, and ranking [13].

Crawling is the first stage and describes a process by which new and updated content is collected. The search engine uses robots that are also known as spiders or crawlers, hence the term crawling to go through available documents.

The second stage is called indexing. The previously collected content is made searchable by transforming the original documents into a format the search engine understands. Keywords and concepts are extracted and stored in (massive) databases.

The third stage is called ranking and describes the process of sorting the search results according to their relevance with a search query. It is common to display the results in descending order so that the result that has the highest relevance to the searcher’s query comes first.

Apache Solr works similarly to the previously described three-stage process. Like the popular search engine Google, Apache Solr uses a sequence of gathering, storing, and indexing documents from different sources and makes them available/searchable in near real-time.

Apache Solr uses different ways to index documents including the following [14]:

Using an Index Request Handler when uploading the documents directly to Solr. These documents should be in JSON, XML/XSLT, or CSV formats.
Using the Extracting Request Handler (Solr Cell). The documents should be in PDF or Office formats, which are supported by Apache Tika.
Using the Data Import Handler, which conveys data from a database and catalogs it using column names. The Data Import Handler fetches data from emails, RSS feeds, XML data, databases, and plain text files as sources.

A query handler is used in Apache Solr when a search request is sent. The query handler analyzes the given query based on the same concept of the index handler to match the query and previously indexed documents. The matches are ranked according to their appropriateness or relevance. A brief example of querying is demonstrated below.

Uploading Documents:

For the sake of simplicity, we use a sample dataset for the following example that is already provided by Apache Solr. Uploading documents is done as the user solr. Step 1 is the creation of a core with the name techproducts (for a number of tech items).

$ solr/bin/solr create -c techproducts

Everything is fine if you see the message “Created new core ‘techproducts’”. Step 2 is adding data (XML data from exampledocs) to the previously created core techproducts. In use is the tool post that is parameterized by -c (name of the core) and the documents to be uploaded.

$ solr/bin/post -c techproducts solr/example/exampledocs/*.xml

This will result in the output shown below and will contain the entire call plus the 14 documents that have been indexed.

Also, the Dashboard shows the changes. A new entry named techproducts is visible in the dropdown menu on the left side, and the number of corresponding documents changed on the right side. Unfortunately, a detailed view of the raw datasets is not possible.

In case the core/collection needs to be removed, use the following command:

$ solr/bin/solr delete -c techproducts

Querying Data:

Apache Solr offers two interfaces to query data: via the web-based Dashboard and command-line. We will explain both methods below.

Sending queries via Solr dashboard is done as follows:

Choose the node techproducts from the dropdown menu.
Choose the entry Query from the menu below the dropdown menu.
Entry fields pop up on the right side to formulate the query like request handler (qt), query (q), and the sort order (sort).
Choose the entry field Query, and change the content of the entry from “*:*” to “manu:Belkin”. This limits the search from “all fields with all entries” to “datasets that have the name Belkin in the manu field”. In this case, the name manu abbreviates manufacturer in the example data set.
Next, press the button with Execute Query. The result is a printed HTTP request on top, and a result of the search query in JSON data format below.

The command-line accepts the same query as in the Dashboard. The difference is that you must know the name of the query fields. In order to send the same query like above, you have to run the following command in a terminal:

$ curl
http://localhost:8983/solr/techproducts/query?q=”manu”:”Belkin

The output is in JSON format, as shown below. The result consists of a response header and the actual response. The response consists of two data sets.

Wrapping Up:

Congratulations! You have achieved the first stage with success. The basic infrastructure is set up, and you have learned how to upload and query documents.

The next step will cover how to refine the query, formulate more complex queries, and understand the different web forms provided by the Apache Solr query page. Also, we will discuss how to post-process the search result using different output formats such as XML, CSV, and JSON.

About the authors:

Jacqui Kabeta is an environmentalist, avid researcher, trainer, and mentor. In several African countries, she has worked in the IT industry and NGO environments.

Frank Hofmann is an IT developer, trainer, and author and prefers to work from Berlin, Geneva, and Cape Town. Co-author of the Debian Package Management Book available from dpmb.org

[1] Apache Solr, https://lucene.apache.org/solr/
[2] Lucene Search Library, https://lucene.apache.org/
[3]AdvaS Advanced Search, https://pypi.org/project/AdvaS-Advanced-Search/
[4] The Top 165 Search Engine Open Source Projects, https://awesomeopensource.com/projects/search-engine
[5] ElasticSearch, https://www.elastic.co/de/elasticsearch/
[6]Apache Software Foundation (ASF), https://www.apache.org/
[7]FESS, https://fess.codelibs.org/index.html
[8] ElasticSearch, https://www.elastic.co/de/
[9] Apache Solr, Download section, https://lucene.apache.org/solr/downloads.htm
[10] Nvidia V100, https://www.nvidia.com/en-us/data-center/v100/
[11] Apache Tika, https://tika.apache.org/
[12] Apache Solr directory layout, https://lucene.apache.org/solr/guide/8_8/installing-solr.html#directory-layout
[13] How Search Engines Work: Crawling, Indexing, and Ranking. The beginners guide to SEO https://moz.com/beginners-guide-to-seo/how-search-engines-operate
[14] Get Started with Apache Solr, https://sematext.com/guides/solr/#:~:text=Solr%20works%20by%20gathering%2C%20storing,with%20huge%20volumes%20of%20data

Exporting Bash Variables

Frank Hofmann — Mon, 08 Feb 2021 16:08:36 +0000

Understanding variables in the Bash shell is essential in working with Linux in a professional manner. It is one of the key requirements for programming as well as achieving the Linux Professional Institute Certification (LPIC) Level 1 [2].

The previously published article by Fahmida Yesmin [4] gives you a wonderful introduction into Bash variables. Here we step further, and explain how to declare variables in Bash in such a way that you can use them in other environments on your Linux system, and which corresponding side effects you have to take into account.

A brief description of Bash

The Bash shell was first released in 1989 and has been used as the default login shell for most Linux distributions. Brian Fox wrote Bash as a UNIX shell and command language for the GNU Project as a free software replacement for the Bourne shell. It is an acronym for Bourne Again Shell. Bash is largely compatible with sh and incorporates useful features from the Korn shell ksh and the C shell csh [6].

While the GNU operating system provides other shells, including a version of csh, Bash is the default interactive shell. It is designed with portability in mind, and currently runs on nearly every version of UNIX plus other operating systems [9].

Bash variables in a nutshell

Variables are essential components of programming languages. They are referenced and manipulated in a computer program. Simply put, variables represent named memory cells. This is the same in Bash as in any programming language. This makes it possible for us as humans and users of the computer to store values in the “brain” of the computer and find them again via the assigned name of the variable.

The term variable refers to a combined form of two words, i.e., vary + able, which means its value can be changed, and it can be used for multiple times. In contrast to this, variables that cannot be changed are called constants. [10]

As long as there is enough memory available for your script you can freely create and use variables. You can simply set them by defining a variable name and then assigning its value. A variable name in Bash can include letters, digits, and underscores. Its name can be started with a letter and an underscore, only. Valid variable names are size, tax5, and _tax20 but not 5rules.

A variable value in Bash can contain a number, a single character, a string of characters, or a list of items (called array). It does not have a visible data type, and the internal data type of the variable will be automatically figured out (or derived) upon assignment of a value. Furthermore, there is no need to declare the variable — assigning a value to its reference will create the variable automatically. The example Bash script below demonstrates this for a string assignment, and a numeric number assignment.

#! /bin/bash

welcomeMessage="Hello World!"

echo $welcomeMessage

price=145

echo $price

Naming Conventions Of Bash Variables

There are no fixed rules for the spelling of names of variables, only conventions. These conventions are used:

Lowercase names — variables that are local to a script or function.
No matter whether spelt lower_case/snake case [8], or camel case style [7]. The example above uses camel case style.
Uppercase names — constants, environment variables, shell built-in variables.
Keep in mind that these variables might already be in use by other programs. Examples are $PATH, $LANG, $PWD, $PS4, and $SHELL.

For global IT companies it is common to work with style guides to ensure a common coding style among the company. See the Developer Editorial for IBM, and the Google Style Guide [3] for more information about the conventions they follow.

Variable Visibility

The default case is that a variable is locally bound to a structure, function, script, or process, and cannot be accessed from outside of it. The example below shows this for the variable $message that belongs to the script, and $welcome that belongs to the function outputWelcomeMessage().

#!/bin/bash

# define a variable message to the script

message=”Hello, again!”

outputWelcomeMessage () {

# define a local variable

welcome=”Hello!”

echo $welcome

}

outputWelcomeMessage () # prints Hello!

echo $message # prints Hello, again!

To make sure a previously defined variable with the same name is locally bound use the keyword local as demonstrated next. Without the keyword local the assignment in line 8 would relate to the globally defined variable with the same name defined earlier.

#!/bin/bash

# define a variable message to the script

message=”Hello, again!”

outputWelcomeMessage () {

# define a local variable with the same name

Local message=”Hello!”

echo $message

}

outputWelcomeMessage () # prints Hello!

echo $message # prints Hello, again!

Extending the scope of a variable

In order to make an internal variable visible to other child processes an additional step is needed. This step is called exporting a variable. Bash offers the usage of the keyword export followed by the variable name. The listing below demonstrates this for the variable backupPath.

$ backupPath=”/opt/backup/”

$ export backupPath

The export command is a shell built-in that is used to define the variable as one that subshells (shells spawned from the original) inherit. Variables that are exported can be read and written by more than one process, then.

The second option is to declare the variable as an environment variable right from the start. You can do that by using the keyword declare followed by the option “-x” (see [5] for more info about the declare command). The effect is similar to the export command that was introduced before.

$ declare -x BACKUPPATH=”/opt/backup/”

Inherit from other sessions

When you execute a program it automatically inherits its environment variables from the parent process. For instance if $HOME is set to /root in the parent then the child’s $HOME variable is also set to /root.

Further Commands

Among others, Linux comes with useful commands and options that relate to variables. The first two ones are called env and printenv. They list all the environment variables.

The image below shows the output of the command env in a terminal that is run in an X session. It contains variables like $XTERM (terminal type), $SHELL (the program that is called upon login, and shows /bin/bash for the path to the Bash interpreter), $LS_COLORS (the colours that are in use to highlight different file types when calling ls), and $DESKTOP_SESSION (the current X Desktop Environment).

The third and the fourth one are options of the export command — -p and -n. -p is short for print and just displays all the exported variables in the current shell using the declare command.

$ export -p

declare -x DESKTOP_SESSION="xfce"

declare -x DISPLAY=":0"

declare -x GLADE_CATALOG_PATH=":"

declare -x GLADE_MODULE_PATH=":"

declare -x GLADE_PIXMAP_PATH=":"

declare -x HOME="/home/frank"

declare -x LANG="de_DE.UTF-8"

The option -n is used to unset an environment variable. The listing below demonstrates this for the previously defined variable BACKUPPATH.

$ export -n BACKUPPATH

Conclusion

Bash is a very clever but sometimes also a bit complex environment. Variables control how the different tools interact. Exporting variables helps communicating between processes and is easy to use in everyday life.

About the authors

Jacqui Kabeta is an environmentalist, avid researcher, trainer and mentor. In several African countries she has worked in the IT industry and NGO environments.

Frank Hofmann is an IT developer, trainer, and author and prefers to work from Berlin, Geneva and Cape Town. Co-author of the Debian Package Management Book available from dpmb.org

Links and References

[1] Bash programming, Variables, https://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-5.html
[2] Linux Professional Institute LPIC-1, https://www.lpi.org/our-certifications/lpic-1-overview
[3] Google Shell Style Guide, Variable names, https://google.github.io/styleguide/shellguide.html#s7-naming-conventions
[4] Fahmida Yesmin: How to use Variables in Bash Programming, https://linuxhint.com/variables-bash-programming/
[5] The Bash Hackers Wiki, https://wiki.bash-hackers.org/
[6] The Bash, Wikipedia, https://en.wikipedia.org/wiki/Bash_(Unix_shell)
[7] Camel Case, Wikipedia, https://en.wikipedia.org/wiki/Camel_case
[8] Snake Case, Wikipedia, https://en.wikipedia.org/wiki/Snake_case
[9] What is Bash. https://www.gnu.org/software/bash/manual/html_node/What-is-Bash_003f.html
[10] Using variables in Bash https://opensource.com/article/19/8/using-variables-bash
Understanding Bash Elements of Programming https://www.linuxjournal.com/content/understanding-bash-elements-programming
Bash Variables https://www.javatpoint.com/bash-variables

Managing Linux Kernel Modules

Frank Hofmann — Mon, 01 Feb 2021 11:43:04 +0000

Understanding the Linux kernel

The Linux kernel is the core of the Linux operating system. It contains the main components to address the hardware and allows both communication and interaction between the user and the hardware. The Linux kernel is not a monolithic system but quite flexible, and the kernel is extended by so-called kernel modules.

What is a kernel module?

In general, a kernel module is a “piece of code that can be loaded and unloaded into the kernel upon demand. They extend the functionality of the kernel without the need to reboot the system” [1]. This leads to very great flexibility during operation.

Furthermore, “a kernel module can be configured as built-in or loadable. To dynamically load or remove a module, it has to be configured as a loadable module in the kernel configuration” [1]. This is done in the kernel source file /usr/src/linux/.config [2]. Built-in modules are marked with “y” and loadable modules with “m”. As an example, listing 1 demonstrates this for the SCSI module:

Listing 1: SCSI module usage declaration

CONFIG_SCSI=y # built-in module

CONFIG_SCSI=m # loadable module

# CONFIG_SCSI # variable is not set

We do not recommend editing the configuration file directly, but to use either the command “make config”, “make menuconfig”, or “make xconfig” to define the usage of the corresponding module in the Linux kernel.

Module commands

The Linux system comes with a number of different commands to handle kernel modules. This includes listing the modules currently loaded into the Linux kernel, displaying module information, as well as loading and unloading kernel modules. Below we will explain these commands in more detail.

For the current Linux kernels, the following commands are provided by the kmod package [3]. All the commands are symbolic links to kmod.

The list currently loaded modules with lsmod

We start with the lsmod command. lsmod abbreviates “list modules” and displays all modules currently loaded into the Linux kernel by nicely formatting the contents of the file /proc/modules. Listing 2 shows its output that consists of three columns: module name, the size used in memory, and other kernel modules that use this specific one.

Listing 2: Using lsmod

$ lsmod

Module Size Used by

ctr 12927 2

ccm 17534 2

snd_hrtimer 12604 1

snd_seq 57112 1

snd_seq_device 13132 1 snd_seq

...

$

Find available modules for your current kernel

There might be kernel modules available that you are not aware of yet. They are stored in the directory /lib/modules. With the help of find, combined with the uname command, you can print a list of these modules. “uname -r” just prints the version of the currently running Linux kernel. Listing 3 demonstrates this for an older 3.16.0-7 Linux
kernel, and shows modules for IPv6 and IRDA.

Listing 3: Displaying available modules (selection)

$ find /lib/modules/$(uname -r) -name '*.ko'

/lib/modules/3.16.0-7-amd64/kernel/net/ipv6/ip6_vti.ko

/lib/modules/3.16.0-7-amd64/kernel/net/ipv6/xfrm6_tunnel.ko

/lib/modules/3.16.0-7-amd64/kernel/net/ipv6/ip6_tunnel.ko

/lib/modules/3.16.0-7-amd64/kernel/net/ipv6/ip6_gre.ko

/lib/modules/3.16.0-7-amd64/kernel/net/irda/irnet/irnet.ko

/lib/modules/3.16.0-7-amd64/kernel/net/irda/irlan/irlan.ko

/lib/modules/3.16.0-7-amd64/kernel/net/irda/irda.ko

/lib/modules/3.16.0-7-amd64/kernel/net/irda/ircomm/ircomm.ko

/lib/modules/3.16.0-7-amd64/kernel/net/irda/ircomm/ircomm-tty.ko

...

$

Display module information using modinfo

The command modinfo tells you more about the requested kernel module (“module information”). As a parameter, modinfo requires either the full module path or simply the module name. Listing 4 demonstrates this for the IrDA kernel module dealing with the Infrared Direct Access protocol stack.

Listing 4: Display module information

$ /sbin/modinfo irda

filename: /lib/modules/3.16.0-7-amd64/kernel/net/irda/irda.ko

alias:    net-pf-23

license:    GPL

description: The Linux IrDA Protocol Stack

author: Dag Brattli <dagb@cs.uit.no> & Jean Tourrilhes <jt@hpl.hp.com>

depends:    crc-ccitt

vermagic: 3.16.0-7-amd64 SMP mod_unload modversions

$

The output contains different information fields such as the full path for the kernel module, its alias name, software license, module description, authors, as well as kernel internals. The field “depends” shows which other kernel modules it depends on.

The information fields differ from module to module. In order to limit the output to a specific information field, modinfo accepts the parameter “-F” (short for “–field”) followed by the field name. In Listing 5, the output is limited to the license information made available using the license field.

Listing 5: Display a specific field only.

$ /sbin/modinfo -F license irda

GPL

$

In newer Linux kernels, a useful security feature is available. This covers cryptographically signed kernel modules. As explained on the Linux kernel project website [4], “this allows increased kernel security by disallowing the loading of unsigned modules or modules
signed with an invalid key. Module signing increases security by making it harder to load a malicious module into the kernel. The module signature checking is done by the kernel so that it is not necessary to have “trusted userspace bits.” The figure below shows this for the
parport_pc module.

Show module configuration using modprobe

Every kernel module comes with a specific configuration. The command modprobe followed by the option “-c” (short for “–showconfig”) lists the module configuration. In combination with grep, this output is limited to a specific symbol. Listing 6 demonstrates this for IPv6 options.

Listing 6: Show module configuration

$ /sbin/modprobe -c | grep ipv6

alias net_pf_10_proto_0_type_6 dccp_ipv6

alias net_pf_10_proto_33_type_6 dccp_ipv6

alias nf_conntrack_10 nf_conntrack_ipv6

alias nf_nat_10 nf_nat_ipv6

alias nft_afinfo_10 nf_tables_ipv6

alias nft_chain_10_nat nft_chain_nat_ipv6

alias nft_chain_10_route nft_chain_route_ipv6

alias nft_expr_10_reject nft_reject_ipv6

alias symbol:nf_defrag_ipv6_enable nf_defrag_ipv6

alias symbol:nf_nat_icmpv6_reply_translation nf_nat_ipv6

alias symbol:nft_af_ipv6 nf_tables_ipv6

alias symbol:nft_reject_ipv6_eval nft_reject_ipv6

$

Show module dependencies

The Linux kernel is designed to be modular, and functionality is distributed over a number of modules. This leads to several module dependencies that can be displayed using modprobe again. Listing 7 uses the option “–show-depends” in order to list the dependencies for the i915 module.

Listing 7: Show module dependencies

$ /sbin/modprobe --show-depends i915

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/i2c/i2c-core.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/i2c/algos/i2c-algo-bit.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/thermal/thermal_sys.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/gpu/drm/drm.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/acpi/video.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/acpi/button.ko

insmod /lib/modules/3.16.0-7-amd64/kernel/drivers/gpu/drm/i915/i915.ko

$

In order to display the dependencies as a tree similar to the “tree” or “lsblk” command, the modtree project [5] can help (see figure below for the i915 module tree). Although it is freely available on GitHub, it requires some adaptations to comply with the rules for free software and to become part of a Linux distribution as a package.

Loading modules

Loading a module to a running kernel can be done by two commands — insmod (“insert module”) and modprobe. Be aware that there is a slight but important difference between these two: insmod does not resolve module dependencies, but modprobe is cleverer and does that.

Listing 8 shows how to insert the IrDA kernel module. Please note that insmode works with the full module path, whereas modprobe is happy with the name of the module and looks it up itself in the module tree for the current Linux kernel.

Listing 8: Inserting a kernel module

# insmod /lib/modules/3.16.0-7-amd64/kernel/net/irda/irda.ko

...

# modprobe irda

Unloading modules

The last step deals with unloading modules from a running kernel. Again, there are two commands available for this task — modprobe and rmmod (“remove module”). Both commands expect the module name as a parameter. Listing 9 shows this for removing the IrDA module from the running Linux kernel.

Listing 9: Removing a kernel module

# rmmod irda

...

# modprobe -r irda

...

Conclusion

Handling Linux kernel modules is not big magic. Just a few commands to learn, and you are the master of the kitchen.

Thank you

The author would like to thank Axel Beckert (ETH Zürich) and Saif du Plessis (Hothead Studio Cape Town) for their help while preparing the article.

Links and References

[1] Kernel module, Arch Linux wiki, https://wiki.archlinux.org/index.php/Kernel_module
[2] Kernel Configuration, https://tldp.org/HOWTO/SCSI-2.4-HOWTO/kconfig.html
[3] kmod, https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
[4] Kernel module signing facility, https://www.kernel.org/doc/html/v4.15/admin-guide/module-signing.html
[5] modtree, https://github.com/falconindy/modtree

Compiling Code in Parallel using Make

Frank Hofmann — Sun, 24 Jan 2021 22:56:14 +0000

Whoever you ask how to build software properly will come up with Make as one of the answers. On GNU/Linux systems, GNU Make [1] is the Open-Source version of the original Make that was released more than 40 years ago — in 1976. Make works with a Makefile — a structured plain text file with that name that can be best described as the construction manual for the software building process. The Makefile contains a number of labels (called targets) and the specific instructions needed to be executed to build each target.

Simply speaking, Make is a build tool. It follows the recipe of tasks from the Makefile. It allows you to repeat the steps in an automated fashion rather than typing them in a terminal (and probably making mistakes while typing).

Listing 1 shows an example Makefile with the two targets “e1” and “e2” as well as the two special targets “all” and “clean.” Running “make e1” executes the instructions for target “e1” and creates the empty file one. Running “make e2” does the same for target “e2” and creates the empty file two. The call of “make all” executes the instructions for target e1 first and e2 next. To remove the previously created files one and two, simply execute the call “make clean.”

Listing 1

all: e1 e2

e1:

touch one

e2:

touch two

clean:

rm one two

Running Make

The common case is that you write your Makefile and then just run the command “make” or “make all” to build the software and its components. All the targets are built in serial order and without any parallelization. The total build time is the sum of time that is required to build every single target.

This approach works well for small projects but takes rather long for medium and bigger projects. This approach is no longer up-to-date as most of the current cpus are equipped with more than one core and allow the execution of more than one process at a time. With these ideas in mind, we look at whether and how the build process can be parallelized. The aim is to simply reduce the build time.

Make Improvements

There are a few options we have — 1) simplify the code, 2) distribute the single tasks onto different computing nodes, build the code there, and collect the result from there, 3) build the code in parallel on a single machine, and 4) combine options 2 and 3.

Option 1) is not always easy. It requires the will to analyze the runtime of the implemented algorithm and knowledge about the compiler, i.e., how does the compiler translate the instructions in the programming language into processor instructions.

Option 2) requires access to other computing nodes, for example, dedicated computing nodes, unused or less used machines, virtual machines from cloud services like AWS, or rented computing power from services like LoadTeam [5]. In reality, this approach is used to build software packages. Debian GNU/Linux uses the so-called Autobuilder network [17], and RedHat/Fedors uses Koji [18]. Google calls its system BuildRabbit and is perfectly explained in the talk by Aysylu Greenberg [16]. distcc [2] is a so-called distributed C compiler that allows you to compile code on different nodes in parallel and to set up your own build system.

Option 3 uses parallelization at the local level. This may be the option with the best cost-benefit ratio for you, as it does not require additional hardware as in option 2. The requirement to run Make in parallel is adding the option -j in the call (short for –jobs). This specifies the number of jobs that are run at the same time. The listing below asks to Make to run 4 jobs in parallel:

Listing 2

$ make --jobs=4

According to Amdahl’s law [23], this will reduce the build time by nearly 50%. Keep in mind that this approach works well if the single targets do not depend on each other; for example, the output of target 5 is not required to build target 3.

However, there is one side effect: the output of the status messages for each Make target appears arbitrary, and these can no longer be clearly assigned to a target. The output order depends on the actual order of the job execution.

Define Make Execution Order

Are there statements that help Make to understand which targets depend on each other? Yes! The example Makefile in Listing 3 says this:

* to build target “all,” run the instructions for e1, e2, and e3

* target e2 requires target e3 to be built before

This means that the targets e1 and e3 can be built in parallel, first, then e2 follows as soon as the building of e3 is completed, finally.

Listing 3

all: e1 e2 e3

e1:

touch one

e2: e3

touch two

e3:

touch three

clean:

rm one two three

Visualize the Make Dependencies

The clever tool make2graph from makefile2graph [19] project visualizes the Make dependencies as a directed acyclic graph. This helps to understand how the different targets depend on each other. Make2graph outputs graph descriptions in dot format that you can transform into a PNG image using the dot command from the Graphviz project [22]. The call is as follows:

Listing 4

$ make all -Bnd | make2graph | dot -Tpng -o graph.png

Firstly, Make is called with the target “all” followed by the options “-B” to unconditionally build all the targets, “-n” (short for “–dry-run”) to pretend running the instructions per target, and “-d” (“–debug”) to display debug information. The output is piped to make2graph that pipes its output to dot that generates the image file graph.png in PNG format.

The build dependency graph for listing 3

More Compilers and Build Systems

As already explained above, Make was developed more than four decades ago. Over the years, executing jobs in parallel has become more and more important, and the number of specially designed compilers and build systems to achieve a higher level of parallelization has grown since then. The list of tools includes these:

Bazel [20]
CMake [4]: abbreviates cross-platform Make and creates description files later used by Make
distmake [12]
Distributed Make System (DMS) [10] (seems to be dead)
dmake [13]
LSF Make [15]
Apache Maven
Meson
Ninja Build
NMake [6]: Make for Microsoft Visual Studio
PyDoit [8]
Qmake [11]
redo [14]
SCons [7]
Waf [9]

Most of them have been designed with parallelization in mind and offer a better result regarding build time than Make.

Conclusion

As you have seen, it is worth thinking about parallel builds as it significantly reduces build time up to a certain level. Still, it is not easy to achieve and comes with certain pitfalls [3]. It is recommended to analyse both your code and its build path before stepping into parallel builds.

Links and References

[1] GNU Make Manual: Parallel Execution, https://www.gnu.org/software/make/manual/html_node/Parallel.html
[2] distcc: https://github.com/distcc/distcc
[3] John Graham-Cumming: The Pitfalls and Benefits of GNU Make Parallelization, https://www.cmcrossroads.com/article/pitfalls-and-benefits-gnu-make-parallelization
[4] CMake, https://cmake.org/
[5] LoadTeam, https://www.loadteam.com/
[6] NMake, https://docs.microsoft.com/en-us/cpp/build/reference/nmake-reference?view=msvc-160
[7] SCons, https://www.scons.org/
[8] PyDoit, https://pydoit.org/
[9] Waf, https://gitlab.com/ita1024/waf/
[10] Distributed Make System (DMS), http://www.nongnu.org/dms/index.html
[11] Qmake, https://doc.qt.io/qt-5/qmake-manual.html
[12] distmake, https://sourceforge.net/projects/distmake/
[13] dmake, https://docs.oracle.com/cd/E19422-01/819-3697/dmake.html
[14] redo, https://redo.readthedocs.io/en/latest/
[15] LSF Make, http://sunray2.mit.edu/kits/platform-lsf/7.0.6/1/guides/kit_lsf_guide_source/print/lsf_make.pdf
[16] Aysylu Greenberg: Building a Distributed Build System at Google Scale, GoTo Conference 2016, https://gotocon.com/dl/goto-chicago-2016/slides/AysyluGreenberg_BuildingADistributedBuildSystemAtGoogleScale.pdf
[17] Debian Build System, Autobuilder network, https://www.debian.org/devel/buildd/index.en.html
[18] koji – RPM building and tracking system, https://pagure.io/koji/
[19] makefile2graph, https://github.com/lindenb/makefile2graph
[20] Bazel, https://bazel.build/
[21] Makefile tutorial, https://makefiletutorial.com/
[22] Graphviz, http://www.graphviz.org
[23] Amdahl’s law, Wikipedia, https://en.wikipedia.org/wiki/Amdahl%27s_law

Building your own Network Monitor with PyShark

Frank Hofmann — Mon, 18 Jan 2021 14:40:45 +0000

Existing tools

Many tools for network analysis have existed for quite some time. Under Linux, for example, these are Wireshark, tcpdump, nload, iftop, iptraf, nethogs, bmon, tcptrack as well as speedometer and ettercap. For a detailed description of them, you may have a look at Silver Moon’s comparison [1].

So, why not use an existing tool, and write your own one, instead? Reasons I see are a better understanding of TCP/IP network protocols, learning how to code properly, or implementing just the specific feature you need for your use case because the existing tools do not give you what you actually need. Furthermore, speed and load improvements to your application/system can also play a role that motivates you to move more in this direction.

In the wild, there exist quite several Python libraries for network processing and analysis. For low-level programming, the socket library [2] is the key. High-level protocol-based libraries are httplib, ftplib, imaplib, and smtplib. In order to monitor network ports and the packet stream competitive candidates, are python-nmap [3], dpkt [4], and PyShark [5] are used. For both monitoring and changing the packet stream, the scapy library [6] is widely in use.

In this article, we will have a look at the PyShark library and monitor which packages arrive at a specific network interface. As you will see below, working with PyShark is straightforward. The documentation on the project website will help you for the first steps — with it, you will achieve a usable result very quickly. However, when it comes to the nitty-gritty, more knowledge is necessary.

PyShark can do a lot more than it seems at first sight, and unfortunately, at the time of this writing, the existing documentation does not cover that in full. This makes it unnecessarily difficult and provides a good reason to look deeper under the bonnet.

About PyShark

PyShark [8] is a Python wrapper for Tshark [10]. It simply uses its ability to export XML data using its parsing. Tshark itself is the command-line version of Wireshark. Both Tshark and PyShark depend on the Pcap library that actually captures network packages and is maintained under the hood of Tcpdump [7]. PyShark is developed and continuously maintained by Dan (he uses the name KimiNewt on Twitter).

In order to prevent possible confusion, there exists a similar-sounding tool, Apache Spark [11], which is a unified analytics engine for large-scale data processing. The name PySpark is used for the Python interface to Apache Spark, which we do not discuss here.

Installing PyShark

PyShark requires both the Pcap library and Tshark to be installed. The corresponding packages for Debian GNU/Linux 10 and Ubuntu are named libpcap0.8 and tshark and can be set up as follows using apt-get:

Listing 1: Installing the Pcap library and Tshark

# pip3 install python-pyshark

If not installed yet, Python3 and Pip have to be added too. The corresponding packages for Debian GNU/Linux 10 and Ubuntu are named python3 and python3-pip and can be installed as follows using apt-get:

Listing 2: Install Python 3 and PIP for Python 3

# apt-get install python3 python3-pip

Now it is time to add PyShark. Based on our research PyShark is not packaged for any major Linux distribution yet. The installation of it is done using the Python package installer pip3 (pip for Python 3) as a system-wide package as follows:

Listing 3: Install PyShark using PIP

# pip3 install python-pyshark

Now, PyShark is ready to be used in Python scripts on your Linux system. Please note to execute the Python scripts below as an administrative user, for example, using sudo because the Pcap library does not permit you to look for packages as a regular user.

The following statement adds the content of the PyShark module to the namespace of your Python script:

Listing 4: Import the PyShark module

import pyshark

Methods of Capturing Packages

Out of the box, PyShark comes with two different modes with which it offers to collect packets from the observed network interface. For continuous collection, use the LiveCapture() method, and for saving to a local file, use the FileCapture() method from the PyShark module. The result is a package list (Python iterator object) that allows you to go through the captured data package by package. The listings below demonstrate how to use the two methods.

Listing 5: Use PyShark to capture from the first Wifi interface wlan0

import pyshark
capture = pyshark.LiveCapture(interface='wlan0')

With the previous statements, the captured network packages are kept in memory. The available memory might be limited, however, storing the captured packages in a local file is an alternative. In use is the Pcap file format [9]. This allows you to process and interpret the captured data by other tools that are linked to the Pcap library too.

Listing 6: Use PyShark to store the captured packages in a local file

import pyshark
capture = pyshark.FileCapture('/tmp/networkpackages.cap')

Running listings 5 and 6, you will not have any output yet. The next step is to narrow down the packages to be collected more precisely based on your desired criteria.

Selecting Packets

The previously introduced capture object establishes a connection to the desired interface. Next, the two methods sniff() and sniff_continuously() of the capture object collect the network packets. sniff() returns to the caller as soon as all the requested packets have been collected. In contrast, sniff_continuously() delivers a single packet to the caller as soon as it was collected. This allows a live stream of the network traffic.

Furthermore, the two methods allow you to specify various limitations and filtering mechanism of packages, for example, the number of packages using the parameter packet_count, and the period during which the packages are to be collected using the parameter timeout. Listing 7 demonstrates how to collect 50 network packages, only, as a live stream, using the method sniff_continuously().

Listing 7: Collect 50 network packages from wlan0

import pyshark

capture = pyshark.LiveCapture(interface='wlan0')
for packet in capture.sniff_continuously(packet_count=5):
print(packet)

Various packet details are visible using the statement print(packet) (see Figure 1).

Figure 1: package content

In listing 7, you collected all kinds of network packets no matter what protocol or service port. PyShark allows you to do advanced filtering, using the so-called BPF filter [12]. Listing 8 demonstrates how to collect 5 TCP packages coming in via port 80 and printing the packet type. The information is stored in the packet attribute highest_layer.

Listing 8: Collecting TCP packages, only

import pyshark

capture = pyshark.LiveCapture(interface='wlan0', bpf_filter='tcp port 80')
capture.sniff(packet_count=5)
print(capture)
for packet in capture:
print(packet.highest_layer)

Save listing 8, as the file tcp-sniff.py, and run the Python script. The output is as follows:

Listing 9: The output of Listing 8

# python3 tcp-sniff.py
<LiveCapture (5 packets)>
TCP
TCP
TCP
OCSP
TCP
#

Unboxing the captured packets

The captured object works as a Russian Matroska doll — layer by layer, it contains the content of the corresponding network packet. Unboxing feels a bit like Christmas — you never know what information you find inside until you opened it. Listing 10 demonstrates capturing 10 network packets and revealing its protocol type, both the source and destination port and address.

Listing 10: Showing source and destination of the captured packet

import pyshark
import time

# define interface
networkInterface = "enp0s3"

# define capture object
capture = pyshark.LiveCapture(interface=networkInterface)

print("listening on %s" % networkInterface)

for packet in capture.sniff_continuously(packet_count=10):
# adjusted output
try:
# get timestamp
localtime = time.asctime(time.localtime(time.time()))

# get packet content
protocol = packet.transport_layer # protocol type
src_addr = packet.ip.src # source address
src_port = packet[protocol].srcport # source port
dst_addr = packet.ip.dst # destination address
dst_port = packet[protocol].dstport # destination port

# output packet info
print ("%s IP %s:%s <-> %s:%s (%s)" % (localtime, src_addr, src_port, dst_addr, dst_port, protocol))
except AttributeError as e:
# ignore packets other than TCP, UDP and IPv4
pass
print (" ")

The script generates an output, as shown in Figure 2, a single line per received packet. Each line starts with a timestamp, followed by the source IP address and port, then the destination IP address and port, and, finally, the type of network protocol.

Figure 2: Source and destination for captured packages

Conclusion

Building your own network scanner has never been easier than that. Based on the foundations of Wireshark, PyShark offers you a comprehensive and stable framework to monitor the network interfaces of your system in the way you require it.

Links and References

[1] Silver Moon: 18 Commands to Monitor Network Bandwidth on Linux server, https://www.binarytides.com/linux-commands-monitor-network/
[2] Python socket library, https://docs.python.org/3/library/socket.html
[3] python-nmap, https://pypi.org/project/python3-nmap/
[4] dpkt, https://pypi.org/project/dpkt/
[5] PyShark, https://pypi.org/project/pyshark/
[6] scapy, https://pypi.org/project/scapy/
[7] Tcpdump and libpcap, http://www.tcpdump.org/
[8] PyShark, project website, http://kiminewt.github.io/pyshark/
[9] Libpcap File Format, Wireshark Wiki, https://gitlab.com/wireshark/wireshark/-/wikis/Development/LibpcapFileFormat
[10] Tshark, https://www.wireshark.org/docs/man-pages/tshark.html
[11] Apache Spark, https://spark.apache.org/
[12] BPF filter, https://wiki.wireshark.org/CaptureFilters

Is Linux POSIX-Compliant?

Frank Hofmann — Wed, 13 Jan 2021 06:58:57 +0000

Software is written by numerous developers with various backgrounds. General algorithms are available under a free license or have been scientifically published, and they might also be available for free for studying purposes. This results in different implementations and software versions that fit a variety of needs. A standardization of interfaces and data formats is necessary to make these different implementations both interchangeable and modular.

In short, POSIX [1] does exactly that for UNIX and UNIX-like systems (see Zak H’s article [4] for a more detailed history on this topic). It defines the exchange interfaces, calling mechanisms, and transferred data for the software but leaves the internal implementation to the developer or maintainer of the software. The aim is to unify all the various UNIX forks and UNIX-like systems in such a way that different software implementations can interact with one another. The main advantage of POSIX is to have a binding documentation for these components – interfaces, mechanisms, and data – available in written form.

An operating system that follows the POSIX standard in its entirety is classified as being POSIX-compliant. In this article, we explain what POSIX stands for, determine whether Linux belongs to this category, and list which Linux components must be excluded from this classification.

What Does the Term POSIX Stand for?

POSIX is an abbreviation for Portable Operating System Interface. As briefly explained above, POSIX is the name for a collection of standards that are required to maintain compatibility between operating systems. As stated in [1], “[it] defines the application programming interface (API), along with command-line shells and utility interfaces, for software compatibility with variants of Unix and other operating systems.” The first version of POSIX was published in 1988. Since then, POSIX has been continuously expanded and updated by the Austin Common Standards Revision Group (also known simply as The Austin Group) [7].

As of 2021, the POSIX standard contains the following parts:

Core Services (Incorporates Standard ANSI C) (IEEE std 1003.1-1988) – Process Creation and Control, Signals, File and Directory Operations, Pipes, C library, I/O Port Interface and Control, Process Triggers

Extensions (Symbolic Links)
Real-time and I/O extensions (IEEE Std 1003.1b-1993) – Priority Scheduling, Real-Time Signals, Clocks and Timers, Semaphores, Message Passing, Shared Memory, Asynchronous and Synchronous I/O, Memory Locking Interface
Threads extensions (IEEE Std 1003.1c-1995) – Thread Creation, Control, and Clean-up, Thread Scheduling, Thread Synchronization, Signal Handling
More real-time extensions
Security extensions (Access control lists)

Shell and Utilities (IEEE Std 1003.2-1992) – Command Interpreter, Utility Programs

The standard is regularly reviewed to reflect technical changes and improvements. It can sometimes take several years before a new version is published and the changes are incorporated. This can be disadvantageous, but it is understandable given the scope of the standard.

In recent years, extensions to real-time processing have been added. The current version was released in early 2018 [3]. The authors of SibylFS [5] have also published many annotations to the POSIX standard to determine higher-order logic and interactions.

What Does Being POSIX-Compliant Mean?

The term “POSIX-compliant” means that an operating system meets all the POSIX criteria. An operating system can run UNIX programs natively, or an application can be ported from the UNIX system to another system. Porting an application from UNIX to the target operating system is easy, or at least easier, than if it does not support POSIX. To be on the safe side, an operating system should have successfully achieved the POSIX certification [2]. This step is achieved (at a cost) by passing an automated certification test. The corresponding test suite can be found here [11].

As of 2021, the list of POSIX-certified operating systems contains AIX from IBM, HP-UX from HP, IRIX from SGI, EulerOS [6] from Huawei, Mac OS X from Apple (since 10.5 Leopard), Solaris and QNX Neutrino from Oracle, Inspur’s K-UX [11], and the real-time OS INTEGRITY from Green Hills Software [15]. It is currently unclear whether newer versions of the three Solaris successors, OpenSolaris, Illumos, and OpenIndiana, are classified as fully POSIX-compliant, as well. These operating systems were POSIX-compliant until POSIX 2001.

Other operating systems that are seen as mostly (but not fully) POSIX-compliant include Android, BeOS, FreeBSD, Haiku, Linux (see below), and VMWare ESXi. For Microsoft Windows, Cygwin provides a largely POSIX-compliant development and run-time environment.

Is Linux POSIX-Compliant?

The term “Linux” refers to the entire Linux operating system, regardless of flavor, such as Debian GNU/Linux, RedHat Linux, Linux Mint, Ubuntu Linux, Fedora, and CentOS, for example. To be precise, Linux is just the name of the kernel that is the core component of this free operating system.

As Linus Torvalds described in the book “Just For Fun” [8], to develop the Linux kernel, he requested a copy of the POSIX standard. This helped him to implement the same mechanisms that are used in commercial UNIX systems. Furthermore, this allowed him to link the Linux kernel with the GNU tools that mainly followed the same approach. To be fair, the software on a Linux system is contributed from a variety of sources that respect the POSIX standard, but that also sometimes implement their own concepts. At the same time, however, this also shows the diversity that makes up Linux as an operating system.

One example of this is the way in which command-line arguments are written. Arguments with two dashes (e.g., “–help”) are GNU conventions, whereas POSIX commands never use two-dash arguments but instead only a single (e.g., “-help”). Right from the start, Linux was designed with GNU in mind, and that is why the commands contain GNU-style

arguments. To achieve POSIX compliance, POSIX-style arguments have been added step-by-step. Still, the final decision is made by the developer. As of today, most commands accept both short and long arguments, or even arguments without any dashes, such as the “find” command, for example. To be fair, there is no consistency between the commands on one system, and this can be a problem when you intend to use the same command on a different UNIX-based system, particularly when switching between Linux, OS X, and Solaris.

For now, Linux is not POSIX-certified due to high costs, except for the two commercial Linux distributions Inspur K-UX [12] and Huawei EulerOS [6]. Instead, Linux is seen as being mostly POSIX-compliant.

This assessment is due to the fact that major Linux distributions follow the Linux Standard Base (LSB) instead of POSIX [9]. LSB aims “to minimize the differences between individual Linux distributions” [14]. This refers to the software system structure, including the Filesystem Hierarchy Standard (FHS) used in the Linux kernel. LSB is based on the POSIX specification, the Single UNIX Specification (SUS) [10], and several other open standards, but also extends them in certain areas.

LSB-based Linux distributions include RedHat Linux, Debian GNU/Linux (2002-2015), and Ubuntu (until 2015), to name a few.

Developing with POSIX in mind

To understand POSIX in greater detail, we recommend obtaining a copy of the POSIX standard and reading it in full. You can get the book from the Open Group website. This requires a registration fee but gives you full access to this valuable resource. Standards help since they allow you to develop software in such a way that it behaves in the same way on all UNIX platforms.

Links and References

[1] POSIX, Wikipedia, https://en.wikipedia.org/wiki/POSIX
[2] POSIX Certification, http://get.posixcertified.ieee.org/
[3] POSIX Standard, Open Group, https://publications.opengroup.org/t101
[4] Zak H: POSIX Standard, https://linuxhint.com/posix-standard/
[5] POSIX Annotations, SybilFS, https://github.com/sibylfs/sibylfs_src
[6] EulerOS, https://developer.huaweicloud.com/ict/en/site-euleros/euleros
[7] The Austin Common Standards Revision Group, https://www.opengroup.org/austin/
[8] Torvalds, Linus; Diamond, David (2001). Just for Fun: The Story of an Accidental Revolutionary. New York City, United States: HarperCollins. ISBN 0-06-662072-4
[9] Linux Standard Base (LSB), Wikipedia, https://en.wikipedia.org/wiki/Linux_Standard_Base
[10] Single UNIX Specification (SUS), Wikipedia, https://en.wikipedia.org/wiki/Single_UNIX_Specification
[11] POSIX Test Suites, https://www.opengroup.org/testing/testsuites/vsx4.htm
[12] Inspur K-UX, Wikipedia, https://en.wikipedia.org/wiki/Inspur_K-UX
[14] Linux Standard Base (LSB), https://wiki.linuxfoundation.org/lsb/start
[15] INTEGRITY, https://www.ghs.com/products/rtos/integrity.html

Thank You

The author would like to thank Axel Beckert and Veit Schiele for their help and advice while preparing this article.

Understanding Bash Shell Configuration On Startup

Frank Hofmann — Wed, 30 Dec 2020 16:15:34 +0000

For years, the Bash shell [1] has been an integral part of many Linux distributions. In the beginning, Bash was chosen as the official GNU shell because it was well-known, quite stable, and offered a decent set of features.

Today the situation is somewhat different — Bash is still present everywhere as a software package but has been replaced by alternatives in the standard installation. These include, for example, Debian Almquist shell (Dash) [2] (for Debian GNU/Linux) or Zsh [3] (for GRML [5]). In the well-known distributions Ubuntu, Fedora, Arch Linux, and Linux Mint, Bash has so far remained the standard shell.

It is quite helpful to understand Bash startup and to know how to configure this properly. This includes the customization of your shell environment, for example, setting the $PATH variable, adjusting the look of the shell prompt, and creating aliases. Also, we will have a look at the two files .bashrc and .bash_profile that are read on startup. The corresponding knowledge is tested in Exam 1 of the Linux Professional Institute Certification [4].

Comparing an Interactive Login and Non-interactive batch Shell

In general, a shell has two modes of operation. It can run as an interactive login shell and as a non-interactive batch shell. The mode of operation defines the Bash startup and which configuration files are read [7]. The mode of operation can be differentiated as follows [6] — interactive login shell, interactive non-login-shell, non-interactive login shell, and non-interactive (batch) non-login shell.

To put it simply, an interactive shell reads and writes to a user’s terminal. In contrast, a non-interactive shell is not associated with a terminal, like when executing a batch shell script. An interactive shell can be either login or a non-login shell.

Interactive Login shell

This mode refers to logging into your computer on a local machine using a terminal that ranges from tty1 to tty4 (depends on your installation — there may be more or less terminals). Also, this mode covers remotely logging into a computer, for example, via a Secure Shell (ssh) as follows:

$ ssh user@remote-system

$ ssh user@remote-system remote-command

The first command connects to the remote system and opens an interactive shell only. In contrast, the second command connects to the remote system, executes the given command in a non-interactive login shell, and terminates the ssh connection. The example below shows this in more detail:

$ ssh localhost uptime

user@localhost's password:

11:58:49 up 23 days, 11:41, 6 users, load average: 0,10, 0,14, 0,20

$

In order to find out if you are logged into your computer using a login shell, type the following echo command in your terminal:

$ echo $0

-bash

$

For a login shell, the output starts with a “-” followed by the name of the shell, which results in “-bash” in our case. For a non-login shell, the output is just the name of the shell. The example below shows this for the two commands echo $0, and uptime is given to ssh as a string parameter:

$ ssh localhost "echo $0; uptime"

user@localhost's password:

bash

11:58:49 up 23 days, 11:41, 6 users, load average: 0,10, 0,14, 0,20

$

As an alternative, use the built-in shopt command [8] as follows:

$ shopt login_shell

login_shell off

$

For a non-login shell, the command returns “off”, and for a login shell, “on”.

Regarding the configuration for this type of shell, three files are taken into account. These are /etc/profile, ~/.profile, and ~/.bash_profile. See below for a detailed description of these files.

Interactive Non-login shell

This mode describes opening a new terminal, for example, xterm or Gnome Terminal, and executing a shell in it. In this mode, the two files /etc/bashrc and ~/.bashrc are read. See below for a detailed description of these files.

Non-interactive Non-login shell

This mode is in use when executing a shell script. The shell script runs in its own subshell. It is classified as non-interactive unless it asks for user input. The shell only opens to execute the script and closes it immediately once the script has terminated.

./local-script.sh

Non-interactive Login shell

This mode covers logging into a computer from a remote, for example, via Secure Shell (ssh). The shell script local-script.sh is run locally, first, and its output is used as the input of ssh.

./local-script.sh | ssh user@remote-system

Starting ssh without any further command starts a login shell on the remote system. In case the input device (stdin) of the ssh is not terminal, ssh starts a non-interactive shell and interprets the output of the script as commands to be executed on the remote system. The the example below runs the uptime command on the remote system:

$ echo "uptime" | ssh localhost

Pseudo-terminal will not be allocated because stdin is not a terminal.

frank@localhost's password:

The programs included with the Debian GNU/Linux system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent

permitted by applicable law.

You have new mail.

11:58:49 up 23 days, 11:41, 6 users, load average: 0,10, 0,14, 0,20

$

Interestingly, ssh complains about stdin not being a terminal and shows the message of the day (motd) that is stored in the global configuration file /etc/motd. In order to shorten the terminal output, add the “sh” option as a parameter of the ssh command, as shown below. The result is that a shell is opened first, and the two commands are run without displaying the motd, first.

$ echo "uptime" | ssh localhost sh

frank@localhost's password:

12:03:39 up 23 days, 11:46, 6 users, load average: 0,07, 0,09, 0,16

$$

Next, we will have a look at the different configuration files for Bash.

Bash Startup Files

The different Bash modes define which configuration files are read on startup:

interactive login shell
- /etc/profile: if it exists, it runs the commands listed in the file.
- ~/.bash_profile, ~/.bash_login, and ~/.profile (in that order). It executes the commands from the first readable file found from the list. Each individual user can have their own set of these files.

interactive non-login shell
- /etc/bash.bashrc: global Bash configuration. It executes the commands if that file exists, and it is readable. Only available in Debian GNU/Linux, Ubuntu, and Arch Linux.
- ~/.bashrc: local Bash configuration. It executes the commands if that file exists, and it is readable.

It may be helpful to see this as a graph. During the research, we found the picture below, which we like very much [9].

image: config-path.png
text: Evaluation process for Bash configuration

The different configuration files explained

For the files explained below, there is no general ruleset on which option to store in which file (except for global options vs. local options). Furthermore, the order the configuration files are read is designed with flexibility in mind so that a change of the shell you use ensures that you can still use your Linux system. That’s why several files are in use that configures the same thing.

/etc/profile

This file is used by the Bourne shell (sh) as well as Bourne compatible shells like Bash, Ash, and Ksh. It contains the default entries for the environment variables for all users that login interactively. For example, this influences the $PATH and the prompt design for regular users as well the user named “root”. The example below shows a part of /etc/profile from Debian GNU/Linux.

setuserpath(){

# Common directories to executables for all users

PATH="/usr/local/bin:/usr/bin:/bin"

# Test for root user to add for system administration programs

if [ "`id -u`" -eq 0 ]; then

PATH="/usr/local/sbin:/usr/sbin:/sbin:$PATH"

else

PATH="/usr/local/games:/usr/games:$PATH"

fi

export PATH

}

setuserpath()

# PS1 is the primary command prompt string

if [ "$PS1" ]; then

if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then

# The file bash.bashrc already sets the default PS1.

# PS1='\h:\w\$ '

if [ -f /etc/bash.bashrc ]; then

. /etc/bash.bashrc

fi

else

if [ "`id -u`" -eq 0 ]; then

PS1='# '

else

PS1='$ '

fi

fi

fi

Further configuration files can be saved in the directory /etc/profile.d. They are sourced into the Bash configuration as soon as /etc/profile is read.

~/.bash_profile

This local configuration file is read and executed when Bash is invoked as an interactive login shell. It contains commands that should run only once, such as customizing the $PATH environment variable.

It is quite common to fill ~/.bash_profile just with lines like below that source the .bashrc file. This means each time you log in to the terminal, the contents of your local Bash configuration is read.

if [ -f ~/.bashrc ]; then

. ~/.bashrc

fi

If the file ~/.bash_profile exists, then Bash will skip reading from ~/.bash_login (or ~/.profile).

~/.bash_login

The two files ~/.bash_profile and ~/.bash_login are analogous.

~/.profile

Most Linux distributions are using this file instead of ~/.bash_profile. It is used to locate the local file .bashrc and to extend the $PATH variable.

# if running bash

if [ -n "$BASH_VERSION" ]; then

# include .bashrc if it exists

if [ -f "$HOME/.bashrc" ]; then

. "$HOME/.bashrc"

fi

fi

# set PATH so it includes user's private bin if it exists

if [ -d "$HOME/bin" ] ; then

PATH="$HOME/bin:$PATH"

fi

In general, ~/.profile is read by all shells. If either ~/.bash_profile or ~/.bash_login exists, Bash will not read this file.

/etc/bash.bashrc and ~/.bashrc

This file contains the Bash configuration and handles local aliases, history limits stored in .bash_history (see below), and Bash completion.

# don't put duplicate lines or lines starting with space in history.

# See bash(1) for more options

HISTCONTROL=ignoreboth

# append to the history file, don't overwrite it

shopt -s histappend

# for setting history length, see HISTSIZE and HISTFILESIZE in bash(1)

HISTSIZE=1000

HISTFILESIZE=2000

What to configure in which file

As you have learned so far, there is not a single file but a group of files to configure Bash. These files just exist for historical reasons — especially the way the different shells evolved and borrowed useful features from each other. Also, there are no strict rules known that

define which file is meant to keep a certain piece of the setup. These are the recommendations we have for you (based on TLDP [10]):

All settings that you want to apply to all your users’ environments should be in /etc/profile.
All global aliases and functions should be stored in /etc/bashrc.
The file ~/.bash_profile is the preferred configuration file for configuring user environments individually. In this file, users can add extra configuration options or change default settings.
All local aliases and functions should be stored in ~/.bashrc.

Also, keep in mind that Linux is designed to be very flexible: if any of the startup files named above is not present on your system, you can create it.

Links and References

[1] GNU Bash, https://www.gnu.org/software/bash/
[2] Debian Almquist shell (Dash), http://gondor.apana.org.au/~herbert/dash/
[3] Zsh, https://www.zsh.org/
[4] Linux Professional Institute Certification (LPIC), Level 1, https://www.lpice.eu/en/our-certifications/lpic-1
[5] GRML, https://grml.org/
[6] Differentiate Interactive login and non-interactive non-login shell, AskUbuntu, https://askubuntu.com/questions/879364/differentiate-interactive-login-and-non-interactive-non-login-shell
[7] Bash Startup Files, https://www.gnu.org/software/bash/manual/html_node/Bash-Startup-Files.html#Bash-Startup-Files
[8] The Shopt Builtin, https://www.gnu.org/software/bash/manual/html_node/The-Shopt-Builtin.html
[9] Unix Introduction — Bash Startup Files Loading Order, https://medium.com/@youngstone89/unix-introduction-bash-startup-files-loading-order-562543ac12e9
[10] The Linux Documentation Project (TLDP), https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_01.html

Thank you

The author would like to thank Gerold Rupprecht for his advice while writing this article.

Compared: Raspberry Pi OS vs. Armbian vs. Debian GNU/Linux

Frank Hofmann — Sun, 27 Dec 2020 00:53:46 +0000

Many programmers may have the same question: Is Armbian just another flavor of Debian GNU/Linux, or is it something completely different? What are the differences between Raspberry Pi OS, Armbian, and Debian? In this article, we will discuss the Armbian, Debian, and Raspberry Pi operating systems in detail, including a comparison between these different systems.

Fruity Awakening

In 2012, Raspberry Pi popularized the single-board computers (SBC) class for the general public. Back then, anyone with knowledge about devices like the RouterBOARD from Mikrotik [9] or the ALIX Board from PC Engines [11] was seen as exotic. Today, it is impossible to imagine everyday existence without these powerful mini-computers. You can find these devices everywhere — in wifi routers, weather stations, home automation devices, and fine dust measuring instruments. These devices are run with specially adapted Linux or BSD distributions, of which Armbian and RaspberryPi OS are only two representatives of many.

‘Armbian’ is an artificial word that combines the words ‘ARM,’ for the corresponding RISC processor architecture [3], and the last two syllables, ‘bian,’ from ‘Debian.’ This makes it very clear what sets Armbian apart from Debian GNU/Linux; unlike Debian, Armbian is focused and optimized for the ARM architecture.

Moreover, while the Debian GNU/Linux distribution supports a variety of hardware architectures, including ARM7 (32 bit) [4] and ARM8, the Armbian distribution focuses only on a wide range of ARM-based development boards. From the project website, you can download distribution images for the Orange Pi [5], the Cubieboard [6],

and the Asus Tinkerboard [7], among other images. Cubian [12], a fork of Debian GNU/Linux for the Cubieboard, seems no longer to be maintained, as the last release dates back to 2014.

Raspberry Pi OS [8] is the official operating system of the Raspberry Pi Foundation [17] for their SBCs. Initially, it was named Raspbian, for the Raspbian project [15] on which it is based. The Raspberry Pi Foundation later added another package repository with partially closed source software to their images. The Raspbian project never published its own images, but instead always referred to the images of the Raspberry Pi Foundation. The foundation eventually added their own desktop flavor and many more customizations, reaching far beyond Raspbian’s rebuilding and minimal patching of Debian packages. To clearly distinguish between the Raspbian project and the Raspberry Pi Foundation derivative, the latter was renamed to Raspberry Pi OS in 2019.

Compared to Armbian, the Raspbian project and Raspberry Pi OS follow an opposite approach: these distributions rely on dozens of contributors to focus on a single SBC platform. Based on the 32-bit ‘armhf’ version of Debian GNU/Linux, it is meant to run on all versions of the Raspberry Pi board but is not designed to work on any other ARM SBCs. The Raspberry Pi 3 and 4 hardware can run 64-bit operating systems. Meanwhile, the Raspberry Pi OS always runs 32-bit, with the exception of the Linux kernel, which can be a 64-bit kernel. Some packages made specifically for the Raspberry Pi OS are also available for the Intel architecture (32- and 64-bit variants) and can even run on a normal desktop PC running Debian GNU/Linux.

For a limited time only, there are also (unofficial) Debian GNU/Linux images offered for the Raspberry Pi family of SBCs [16]. The main difference to the Raspberry Pi OS is that the images for those Raspberry Pi systems, capable of running a 64-bit OS (Raspberry Pi 3 and 4), also contain a 64-bit OS (‘arm64’ in Debian); while the other images run the 32-bit ‘armhf’ (Raspberry Pi 2) or ‘armel’ (Raspberry Pi 1 and Zero) architectures. The latter two differ from the ‘armhf’ packages provided by Raspbian and Raspberry Pi OS. Historically, several distributions, including Debian GNU/Linux and Fedora, decided on a minimum set of CPU instructions [19] needed for the ‘armhf’ architecture. The first Raspberry Pi OS was published shortly afterward and supported all but one of the required CPU instructions.

So, there were two options: either 1) use the much slower but not optimized ‘armel’ architecture, as Debian GNU/Linux still does for Raspberry Pi 1 and 0, or 2) redefine the ‘armhf’ architecture. Debian GNU/Linux did not want to do the second option, as this option would deviate from what had already been decided and implemented. This was the moment when the Raspbian project was born: the Debian Developer Peter Green (also known by the tag plugwash in IRC) recompiled all ‘armhf’ Debian packages for Raspberry Pi 1 CPUs (back then, only Raspberry Pi 1 existed) with the single CPU instruction missing. This is also the reason why you cannot mix Debian’s ‘armhf’ and Raspbian’s ‘armhf’ releases.

Image Size

The installation images offered by the three projects are quite different. Armbian requires you to select a category (such as General, IOT, NAS, Networking, or Desktop) and the SBC, first. Next, you will choose the corresponding image offered with either the 4.9 or 5.9 Linux kernel for oldstable (previous release), stable (current release), and testing (upcoming release). The image size is between 270 and 600 M. Each image file can be retrieved as a direct download or via BitTorrent from the project website. Updating an existing Armbian installation is done using the same instructions as those used for maintaining Debian GNU/Linux.

In contrast, the options for Raspberry Pi OS are a bit more limited. Raspberry Pi requires you to choose between OS Lite, OS with desktop, and OS with desktop and recommended software. All images are equipped with the 32-bit version of a 5.4 Linux kernel. The image size varies from 440 M to 3 G. Downloading the image can be done directly, as a torrent data stream, or via the Raspberry Pi Imager, a GUI-based setup tool available for Windows, macOS, and Ubuntu. As with Armbian, updating an existing version of Raspberry Pi is done using the same instructions as those used for maintaining Debian GNU/Linux.

Finally, for most devices, including most ARM devices, Debian GNU/Linux offers a variety of ready-made installer images, including a basic setup, a tiny image for network-based installation, different desktop variants that fit on one CD or DVD, live CDs, and even a set of full CD/DVD images. Though these images are not ready-to-run images, they contain the Debian Installer, a minimal OS that is solely for performing the OS installation. The live images run directly from a read-only installation also contain the Debian Installer.

The image size is between 250 M and 3 G. Downloading an image is possible as a direct download or via BitTorrent. The regular Debian packaging commands are used to update an existing installation.

This is not so for the Raspberry Pi operating system. In fact, there are no official Debian GNU/Linux images for Rasberry Pi. There are, however, unofficial ready-to-run images (no installer images) with Debian GNU/Linux for Raspberry Pi, made by the same developers behind the official (but “non-free”) Raspberry Pi firmware packages in Debian GNU/Linux [16].

First, you will decide between daily built images based on the most current packages in Debian GNU/Linux 10 Buster (the current stable release at the time of writing this article) or “tested” images that are guaranteed to run. In comparison to the Raspberry Pi OS, which offers images that work on all Raspberry Pi boards, with this distribution, you have to choose which Raspberry Pi board will contain the image. The images for the Raspberry Pi 1 and Raspberry Pi 0 (not 0W) operating systems are roughly the same, as they use more or less the same CPU and have no Wi-Fi components. Depending on that, you also get different OS architectures; namely, ‘armel’ for Raspberry Pi 1, 0, and 0W; the original ‘armhf’ for Raspberry Pi 2; and ‘arm64’ for Raspberry Pi 3 and 4.

Supported Devices

Regarding supported platforms and devices, the three projects go in slightly different directions. For Armbian, the device information for every supported SBC can be found at the Armbian website. This is accompanied by a list of tested third-party hardware to ensure that all hardware components work well together. Overall, Armbian supports several different ARM SBCs, but it does not support the Raspberry Pi family of SBCs.

For Raspberry Pi OS, device information for every Raspberry Pi version is available online, at the Raspberry Pi website. And, of course, Raspberry Pi OS provides support for all Raspberry Pi devices.

For Debian GNU/Linux, the information is organized in a wiki, sorted by OS architecture, with specialized sections for more specific information. Debian currently supports nine OS architectures officially (of which three are for ARM devices). Debian also builds its packages and installer images for 13 further OS architectures that are not officially supported, running under the label ‘Debian Ports’ [21].

Development

Furthermore, the methods by which each of the three Linux distributions are developed differ significantly. Armbian and Debian GNU/Linux are community-based projects. For Armbian, the corresponding GitHub project page is key. Debian GNU/Linux uses its own distributed infrastructure that allows for development of the Linux distribution from all over the world.

Meanwhile, Raspberry Pi OS is maintained by the non-profit Raspberry Pi Foundation as an in-house project. Contributions to the Raspberry Pi Foundation can be made via the Raspberry Pi Forum [20]. The Raspbian project is largely a recompilation of the Debian packages created for Raspberry Pi and does not seem to have a big community of its own. The outdated Raspbian website [16] often refers users to either of the Debian GNU/Linux or Raspberry Pi Foundation websites.

Licensing

Armbian is licensed under GPL2, whereas both Raspberry Pi OS and Debian GNU/Linux use a mix of licenses, including GPL and others. The Raspberry Pi OS image “with recommended software” contains several “free-to-use” commercial software packages, most of which are limited demo versions. The plan is of these free package offerings is to hook users so that they buy that software for their other computers.

Also, some firmware blobs needed for Raspberry Pi and other ARM SBCs are only available as “binary only,” i.e., without source code. In the software world, these software packages are considered “non-free.” The previously mentioned unofficial Debian images for Raspberry Pi contain Debian’s “non-free” repository, enabled by default because it includes the ‘raspi-firmware’ software package.

Software Packages and Setup

Armbian describes itself as a “Lightweight Debian or Ubuntu based Linux distribution specialized for ARM development boards.” It comes as a ready-to-run image optimized for memory flash devices, such as NAND, SATA, eMMC, and USB. Both SSH and DHCP services are activated right from the start. A wireless adapter supports DHCP (if present), but this feature needs to be enabled by the user. This allows for an easy setup to connect this system to your router or create an individual access point. XFCE is used as the Desktop Environment [18].

To increase execution speed for code and data and minimize I/O operations, several functionalities have been transferred to work as much as possible from memory. For example, the log2ram service keeps log files in the memory and saves them to the disk daily and on shutdown [13]. Disk caches are kept in the memory for ten minutes using the option “commit=600” in the directory configuration in the file /etc/fstab [14].

As previously noted, the Raspberry Pi OS targets the different Raspberry Pi models, which started out with quite limited hardware components. To deal with these limitations as a Desktop Environment, the default setup starts a modified LXDE Desktop named PIXEL (Pi Improved X-windows Environment Lightweight), which is also available from the Raspberry Pi Foundation for Intel-based Linux PCs.

By default, a user named “pi” with the password “raspberry” exists, and the SSH service is disabled for this user. You can enable it for a single boot up by editing the file config.txt on the first partition. It is strongly advised to change the password immediately after the first login. Only then can you can enable the SSH service permanently to avoid well-known default passwords accessible via SSH.

Debian’s unofficial Raspberry Pi images also come wired with the network enabled by default via DHCP, but the Wi-Fi does not come pre-configured, as of this writing. Another difference with Raspberry Pi OS images is that there is no normal user, just a root user with no password and the SSH root login disabled. Setting the root password or an SSH public key for root login in advance is supported by editing “sysconf.txt” on the first partition. These settings are wiped after they have been applied to the booted system to avoid leakage of the plain text password.

Currently, the option to configure access to a Wi-Fi network is in the planning stages. Future versions of Raspberry Pi OS images will come equipped with this feature.

Conclusion

The programming community has been using Debian GNU/Linux and Armbian in production-like environments without fail for many years; for example, a CubieTruck as a mobile collaboration platform (“mobile cloud”). Devices with Raspberry Pi OS have been used in experimental stages, and we were very happy about them, too. It is a great pleasure to have access to such small, reliable, affordable, and powerful machines. We wish to have more time to explore them in even more detail.

Links and References

[1] The Debian GNU/Linux project, https://www.debian.org/
[2] The Armbian project, https://www.armbian.com/
[3] ARM, Wikipedia, https://en.wikipedia.org/wiki/ARM_architecture
[4] ARM7, Wikipedia, https://en.wikipedia.org/wiki/ARM7
[5] Orange Pi, http://www.orangepi.org/
[6] Cubieboard, http://cubieboard.org/
[7] Tinkerboard, https://www.asus.com/us/Single-Board-Computer/Tinker-Board/
[8] Raspberry Pi OS, https://www.raspberrypi.org/software/operating-systems/
[9] Mikrotik, https://mikrotik.com/
[10] Frank Hofmann: Zwergenaufstand. Das Cubietruck im Alltagstest, RaspberryPi Geek 04/2016, https://www.raspberry-pi-geek.de/ausgaben/rpg/2016/04/das-cubietruck-im-alltagstest/
[11] PC Engines, https://www.pcengines.ch/
[12] Cubian, http://cubian.org/
[13] Log2Ram, https://github.com/azlux/log2ram
[14] Advantages/disadvantages of increasing “commit” in fstab, https://unix.stackexchange.com/questions/155784/advantages-disadvantages-of-increasing-commit-in-fstab
[15] Raspbian Project, https://www.raspbian.org/
[16] Unofficial Debian images for the Raspberry Pi SBC family, https://raspi.debian.net/
[17] RaspberryPi Foundation, https://www.raspberrypi.org/about/
[18] XFCE, https://xfce.org/
[19] “armhf” on Wikipedia, https://en.wikipedia.org/wiki/ARM_architecture#VFP
[20] RaspberryPi Forum, https://www.raspberrypi.org/forums/
[21] Debian Ports, https://www.ports.debian.org/

About the authors

Frank Hofmann works on the road – preferably from Berlin (Germany), Geneva (Switzerland), and Cape Town (South Africa) – as a developer, trainer, and author for magazines like Linux-User and Linux Magazine.

Axel Beckert works as a Linux system administrator and specialist for network security with the central IT services of ETH Zurich. He is also a volunteer with the Debian GNU/Linux distribution, Linux User Group Switzerland (LUGS), Hackerfunk radio show and podcast, and various open-source projects.

Hofmann and Beckert have also authored a Debian package management book

(http://www.dpmb.org).

Understanding NUMA Architecture

Frank Hofmann — Thu, 22 Oct 2020 23:28:40 +0000

Designing computers is always a compromise. The four basic components of a computer – the central processing unit (CPU) or processor, the memory, the storage, and the board for connecting the components (I/O bus system) — are combined as cleverly as possible to create a machine that is both cost-effective and powerful. The design process mostly involves an optimization towards processors (co-processors, multi-core setup), memory type and amount, storage (disks, file system), as well as price.The idea behind co-processors and multi-core architecture is to distribute operations to as many single computing units in the smallest space possible and to make parallel execution of computing instructions more available and affordable. In terms of memory, it is a question of the amount or size that can be addressed by the individual computing unit, and which memory type works with the lowest latency possible. Storage belongs to the external memory, and its performance depends on the disk type, the file system that is in use, threading, transfer protocol, communication fabric, and the number of attached memory devices.

The design of I/O buses represents the computer arteries and significantly determines how much and how quickly data can be exchanged between the single components listed above. The top category is led by components used in the field of High Performance Computing (HPC). As of mid-2020, among the contemporary representatives of HPC are Nvidia Tesla and DGX, Radeon Instinct, and Intel Xeon Phi GPU-based accelerator products (see [1,2] for product comparisons).

Understanding NUMA

Non-Uniform Memory Access (NUMA) describes a shared memory architecture used in contemporary multiprocessing systems. NUMA is a computing system composed of several single nodes in such a way that the aggregate memory is shared between all nodes: “each CPU is assigned its own local memory and can access memory from other CPUs in the system” [12,7].

NUMA is a clever system used for connecting multiple central processing units (CPU) to any amount of computer memory available on the computer. The single NUMA nodes are connected over a scalable network (I/O bus) such that a CPU can systematically access memory associated with other NUMA nodes.

Local memory is the memory that the CPU is using in a particular NUMA node. Foreign or remote memory is the memory that a CPU is taking from another NUMA node. The term NUMA ratio describes the ratio of the cost of accessing foreign memory to the cost of accessing local memory. The greater the ratio, the greater the cost, and thus the longer it takes to access the memory.

However, it takes longer than when that CPU is accessing its own local memory. Local memory access is a major advantage, as it combines low latency with high bandwidth. In contrast, accessing memory belonging to any other CPU has higher latency and lower bandwidth performance.

Looking Back: Evolution of Shared-Memory Multiprocessors

Frank Dennemann [8] states that modern system architectures do not allow truly Uniform Memory Access (UMA), even though these systems are specifically designed for that purpose. Simply speaking, the idea of parallel computing was to have a group of processors that cooperate to compute a given task, thereby speeding up an otherwise classical sequential computation.

As explained by Frank Dennemann [8], in the early 1970s, “the need for systems that could service multiple concurrent user operations and excessive data generation became mainstream” with the introduction of relational database systems. “Despite the impressive rate of uniprocessor performance, multiprocessor systems were better equipped to handle this workload. To provide a cost-effective system, shared memory address space became the focus of research. Early on, systems using a crossbar switch were advocated, however with this design complexity scaled along with the increase of processors, which made the bus-based system more attractive. Processors in a bus system [can] access the entire memory space by sending requests on the bus, a very cost-effective way to use the available memory as optimally as possible.”

However, bus-based computer systems come with a bottleneck — the limited amount of bandwidth that leads to scalability problems. The more CPUs that are added to the system, the less bandwidth per node available. Furthermore, the more CPUs that are added, the longer the bus, and the higher the latency as a result.

Most CPUs were constructed in a two-dimensional plane. CPUs also had to have integrated memory controllers added. The simple solution of having four memory buses (top, bottom, left, right) to each CPU core allowed full available bandwidth, but that goes only so far. CPUs stagnated with four cores for a considerable time. Adding traces above and below allowed direct buses across to the diagonally opposed CPUs as chips became 3D. Placing a four-cored CPU on a card, which then connected to a bus, was the next logical step.

Today, each processor contains many cores with a shared on-chip cache and an off-chip memory and has variable memory access costs across different parts of the memory within a server.

Improving the efficiency of data access is one of the main goals of contemporary CPU design. Each CPU core was endowed with a small level one cache (32 KB) and a larger (256 KB) level 2 cache. The various cores would later share a level 3 cache of several MB, the size of which has grown considerably over time.

To avoid cache misses — requesting data that is not in the cache — a lot of research time is spent on finding the right number of CPU caches, caching structures, and corresponding algorithms. See [8] for a more detailed explanation of the protocol for caching snoop [4] and cache coherency [3,5], as well as the design ideas behind NUMA.

Software Support for NUMA

There are two software optimization measures that may improve the performance of a system supporting NUMA architecture — processor affinity and data placement. As explained in [19], “processor affinity […] enables the binding and unbinding of a process or a thread to a single CPU, or a range of CPUs so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU.” The term “data placement” refers to software modifications in which code and data are kept as close as possible in memory.

The different UNIX and UNIX-related operating systems support NUMA in the following ways (the list below is taken from [14]):

Silicon Graphics IRIX support for ccNUMA architecture over 1240 CPU with Origin server series.
Microsoft Windows 7 and Windows Server 2008 R2 added support for NUMA architecture over 64 logical cores.
Version 2.5 of the Linux kernel already contained basic NUMA support, which was further improved in subsequent kernel releases. Version 3.8 of the Linux kernel brought a new NUMA foundation that allowed for the development of more efficient NUMA policies in later kernel releases [13]. Version 3.13 of the Linux kernel brought numerous policies that aim at putting a process near its memory, together with the handling of cases, such as having memory pages shared between processes, or the use of transparent huge pages; new system control settings allow NUMA balancing to be enabled or disabled, as well as the configuration of various NUMA memory balancing parameters [15].
Both Oracle and OpenSolaris model NUMA architecture with the introduction of logical groups.
FreeBSD added Initial NUMA affinity and policy configuration in version 11.0.

In the book “Computer Science and Technology, Proceedings of the International Conference (CST2016)” Ning Cai suggests that the study of NUMA architecture was mainly focused on the high-end computing environment and proposed NUMA-aware Radix Partitioning (NaRP), which optimizes the performance of shared caches in NUMA nodes to accelerate business intelligence applications. As such, NUMA represents a middle ground between shared memory (SMP) systems with a few processors [6].

NUMA and Linux

As stated above, the Linux kernel has supported NUMA since version 2.5. Both Debian GNU/Linux and Ubuntu offer NUMA support for process optimization with the two software packages numactl [16] and numad [17]. With the help of the numactl command, you can list the inventory of available NUMA nodes in your system [18]:

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 8157 MB
node 0 free: 88 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 8191 MB
node 1 free: 5176 MB
node distances:
node 0 1
0: 10 20
1: 20 10

NumaTop is a useful tool developed by Intel for monitoring runtime memory locality and analyzing processes in NUMA systems [10,11]. The tool can identify potential NUMA-related performance bottlenecks and hence help to re-balance memory/CPU allocations to maximise the potential of a NUMA system. See [9] for a more detailed description.

Usage Scenarios

Computers that support NUMA technology allow all CPUs to access the entire memory directly — the CPUs see this as a single, linear address space. This leads to more efficient use of the 64-bit addressing scheme, resulting in faster movement of data, less replication of data, and easier programming.

NUMA systems are quite attractive for server-side applications, such as data mining and decision support systems. Furthermore, writing applications for gaming and high-performance software becomes much easier with this architecture.

Conclusion

In conclusion, NUMA architecture addresses scalability, which is one of its main benefits. In a NUMA CPU, one node will have a higher bandwidth or lower latency to access the memory on that same node (e.g., the local CPU requests memory access at the same time as the remote access; the priority is on the local CPU). This will dramatically improve memory throughput if the data are localized to specific processes (and thus processors). The disadvantages are the higher costs of moving data from one processor to another. As long as this case does not happen too often, a NUMA system will outperform systems with a more traditional architecture.

Links and References

Compare NVIDIA Tesla vs. Radeon Instinct, https://www.itcentralstation.com/products/comparisons/nvidia-tesla_vs_radeon-instinct
Compare NVIDIA DGX-1 vs. Radeon Instinct, https://www.itcentralstation.com/products/comparisons/nvidia-dgx-1_vs_radeon-instinct
Cache coherence, Wikipedia, https://en.wikipedia.org/wiki/Cache_coherence
Bus snooping, Wikipedia, https://en.wikipedia.org/wiki/Bus_snooping
Cache coherence protocols in multiprocessor systems, Geeks for geeks, https://www.geeksforgeeks.org/cache-coherence-protocols-in-multiprocessor-system/
Computer science and technology – Proceedings of the International Conference (CST2016), Ning Cai (Ed.), World Scientific Publishing Co Pte Ltd, ISBN: 9789813146419
Daniel P. Bovet and Marco Cesati: Understanding NUMA architecture in Understanding the Linux Kernel, 3rd edition, O’Reilly, https://www.oreilly.com/library/view/understanding-the-linux/0596005652/
Frank Dennemann: NUMA Deep Dive Part 1: From UMA to NUMA, https://frankdenneman.nl/2016/07/07/numa-deep-dive-part-1-uma-numa/
Colin Ian King: NumaTop: A NUMA system monitoring tool, http://smackerelofopinion.blogspot.com/2015/09/numatop-numa-system-monitoring-tool.html
Numatop, https://github.com/intel/numatop
Package numatop for Debian GNU/Linux, https://packages.debian.org/buster/numatop
Jonathan Kehayias: Understanding Non-Uniform Memory Access/Architectures (NUMA), https://www.sqlskills.com/blogs/jonathan/understanding-non-uniform-memory-accessarchitectures-numa/
Linux Kernel News for Kernel 3.8, https://kernelnewbies.org/Linux_3.8
Non-uniform memory access (NUMA), Wikipedia, https://en.wikipedia.org/wiki/Non-uniform_memory_access
Linux Memory Management Documentation, NUMA, https://www.kernel.org/doc/html/latest/vm/numa.html
Package numactl for Debian GNU/Linux, https://packages.debian.org/sid/admin/numactl
Package numad for Debian GNU/Linux, https://packages.debian.org/buster/numad
How to find if NUMA configuration is enabled or disabled?, https://www.thegeekdiary.com/centos-rhel-how-to-find-if-numa-configuration-is-enabled-or-disabled/
Processor affinity, Wikipedia, https://en.wikipedia.org/wiki/Processor_affinity

Thank You

The authors would like to thank Gerold Rupprecht for his support while preparing this article.

About the Authors

Plaxedes Nehanda is a multiskilled, self-driven versatile person who wears many hats, among them, an events planner, a virtual assistant, a transcriber, as well as an avid researcher, based in Johannesburg, South Africa.

Prince K. Nehanda is an Instrumentation and Control (Metrology) Engineer at Paeflow Metering in Harare, Zimbabwe.

10 Reasons to Use Open Source

Frank Hofmann — Mon, 20 Jul 2020 13:28:36 +0000

For more than 50 years, the production and use of software and hardware have been almost entirely commercial. This is in stark contrast to the principles of the Free Open Source Software (FOSS) model. FOSS is based on communities and does not require the exchange of material goods to participate in the development process or to share the results.

Rather, the interaction of individual actors is based on a shared philosophy in which common goods are created (abbreviated as “commons”) for the benefit of all. Behaviour is controlled by social norms, rather than legal regulations. The motivation in participating is less profit, but greater meaningful contributions to society for the benefit of all.

Contribution in Open Source/FOSS projects is based on several factors, for example:

Interest-based
What would I like to contribute to? What do I want to use?

Non-binding
Not a must. What do I like to do? What do I feel like doing?

According to ability
What am I particularly good at? What do I want to learn as I try new things?

The results are very interesting, diverse projects that arise from the personal will of developers and are cultivated by these individuals or by their collaborators. Passion and enthusiasm are reflected in these projects, without any material incentive necessary.

License Models

Without the appropriate license models, the realization and maintenance of FOSS projects would be much more difficult. A license model is a usage agreement chosen by the developer for the project that gives all of us a reliable, stable framework to work with. License models set clear guidelines and specify what you can do with the open-source code. The general goal is to keep the software or artwork available for everyone. License models are much less restrictive than other commercial license agreements.

For software, licenses like the GNU Public License (GPL) or BSD License are in use. Information goods, drawings, and audio and video data are commonly licensed under Creative Commons [1]. All license models are legally verified. The use of license models has continually risen during the last decade and is widely accepted nowadays.

10 Reasons for Open Source

The central questions around open source software include, “Why is open source software a good thing for you?” “What are the advantages of using an open source license for software or Creative Commons for artwork?” and “How can using open source software put you ahead of your competitors as a company?” Below, you will find our list of the top ten reasons to use open source coding.

1. Availability of Source Code
You can see the source code of software entirely, download it, get inspired, and use the basic structure for your own projects. Open Source is highly configurable and allows you as a developer to create your own custom variants for meeting your specific needs and requirements.

2. Availability of Software
Everyone can download and use open source software. There are no limitations regarding the user group or intended audience, purpose, frequency of use, and devices on which open source software can be installed. There are no license fees to pay, either.

3. Lower Total Cost of Ownership (TCO)
With open source code, there are no license or usage fees. As a commercial service, costs apply only to implementation, setup, configuration, maintenance, documentation, and support services.

4. Brings the World Closer

Through open source communities, you can easily contact other developers from other countries, ask them questions, and learn from them, as well as the code or artwork they have written and published. This encourages global teamwork and collaboration which improves and diversifies the applications of shared technology. You will find that open source communities are created and thrive because everyone has a common goal to support and improve the code more quickly, more innovatively, and more effectively, such that the community and beyond can reap the benefits.

5. FOSS Offers Diversity

The use of open source standards does not limit the available software pool to a single software, but widens it. Using open source, you can choose from among a variety of different implementations and software solutions according to your own unique needs.

6. Educational Possibilities

Open source is vital to the educational advancement of all because both information and resources are now freely available. You can learn from other developers how they are creating code and using the software that they have shared through open source.

7. Creates Opportunities & Community

As open source software brings new ideas and contributions, the developer community becomes an increasingly vibrant community that can share ideas freely. Through the community, you can meet people with similar interests. It is said that many hands make light work; similarly, it is much easier to deliver outstanding outcomes if the code is developed by an “army” of talented individuals working as a team to troubleshoot and deliver in record time.

8. FOSS Encourages Innovation

FOSS fosters a culture of sharing and experimentation. You are encouraged to be innovative by coming up with new ideas, products, and methods. Be inspired by what you learn from others. Solutions and options can also be marketed much more quickly, and open source allows developers to try, test, and experiment with the best available solutions.

9. Trust
By testing your software through open source, customers and users can see what your product is doing what are its limitations. Customers can take a look at how the software works, validate it, and customize it if necessary. This creates trust in what the product or software is doing. Nobody likes solutions or software products that are mysterious and difficult to understand.

10. Reliability and Security

The more people that are working together on the code, the higher is the reliability of that code. A code based on collaboration will be superior because it is easier to pick up any bugs and select the best fix. Security is also improved, as the code is thoroughly assessed and evaluated by the community of developers that have access to it. It is common to have tester groups who check new releases. Any issues that may arise are fixed diligently by the community.

Examples of Successful Usage of Open Source (Use Cases)

FOSS has not been a niche market for long. The most prominent examples are Linux-based computer systems that are in use everywhere — from web servers, to TVs, to network appliances like wireless access points. This immensely reduces licensing costs and increases the stability of the core infrastructure on which many fields, companies, and industries depend. Companies like Facebook and Google use FOSS to run their services — this includes the website, the Android phone, as well as the search engine, and the Chrome web browser.

The list remains incomplete without mentioning the Open Source Car (OSCar) [4,5], OpenStreetMap [6], Wikimedia [7] as well as LibriVox [8], a service that provides free audiobooks read by volunteers from all over the world. Below, you will find a selection of case studies that we think might inspire you to use FOSS-based solutions.

Case Studies

1. Makoko, Nigeria

The shantytown slum community of Makoko in Lagos, Nigeria houses nearly 95,000 people. A complete map of this town is now available on Google maps due to the availability of Open Source coding in Africa, courtesy of the Code for Africa Initiative together with the World Bank [9]. Originally, Makoko did not appear on any maps or city planning documents [23]. At one point, it was only 3 dots on the map, regardless of the fact that it is one of the largest slums in Africa with a complex system of waterways and houses.

Through data collection, this initiative created jobs for women from the community, who were taught to use drones to collect the data needed to create a map of the community. The collected data, which included highly detailed pictures and information about the waterways, streets, and buildings, were analyzed by data analysts before being uploaded online using OpenStreetMap.

This initiative is improving the lives and the view of this society with the aim to improve Makoko’s information infrastructure. If this initiative had not been performed using closed source software, the costs and funds required to do this would have been prohibitive due to the additional cost of items such as data, funds to pay the staff, buying the hardware, transport, logistics costs, licensing, and permits.

2. Computing Cluster at Mésocentre de Calcul, Université de Franche-Comté, France

The Université de Franche-Comté, located in Besancon, France, runs a computing center for scientific computing [10]. The primary areas of research include nanomedicine, chemical-physical processes and materials, and genetic simulations. CentOS and Ubuntu Linux are used to provide a high-performance, parallel computing infrastructure.

3. GirlHype Coders (Women Who Code), Cape Town, South Africa

Baratang Miya [11] — a self-taught coder — started GirlHype Coders [12,24] in 2003 as an initiative to empower young girls in Africa. This is a software engineering school that is focused on training young women and girls on how to program and develop apps to improve their digital literacy and economic mobility. Baratang Miya aims to increase the percentage of women in the science, engineering, and technology industries. Clubs are operated so that girls can attend free after-school classes to explore and learn coding.

GirlHype is helping to improve not just the lives of the girls and women that are in this initiative, but also their communities, through a global tech entrepreneurship competition called Technovation, of which GirlHype is the regional ambassador. In this program, girls find a problem in their communities, design a solution for it, and using Open Source coding, build an app for that solution. Other women who are qualified coders have the opportunity to mentor and lead younger women in the industry. GirlHype also teaches women in business how to use the web to market their businesses online. This initiative has helped girls to get jobs in an industry they would otherwise not have been able to work in.

Twitter VP of Engineering visit to GirlHype in Khayelitsha, Cape Town, South Africa [25]

4. Cartoons and Open Source

Open Source is becoming the norm for software development for the sake of collaboration and contribution. Companies are increasingly moving towards using Open Source technologies for their programming needs. In the world of cartoons and animation, this is because this approach allows the industry to attract outside talent in independent developers and artists, as well as creating an industry standard where diverse individuals collaborate on and adopt the same technology.

Among those in the industry that have embraced this technology idea include Pixar Animation Studios [13], which has open sourced their Universal Scene Description (USD) technology [14]. USD helps filmmakers with reading, writing, and previewing 3D scene data, allowing many different artists to work on the same project. Pixar has also released the software RenderMan [15], a photorealistic 3D rendering software free for non-commercial purposes such as educational purposes and personal projects.

From Free Software to a Free Society

Ten years ago, Thomas Winde and Frank Hofmann asked the question, “What would happen if FOSS principles were transferred to society and thus changed the model of society?” [3] The implementation of this step is often doubted and classified as utopia. We wanted to know more about it. The result of our investigation was a curious look at our society (from a predominantly European view) that observed the evolution of processes that consciously or unconsciously followed FOSS principles. We found a long list of surprising examples, ranging from free wireless networks like Freifunk [16] to open libraries, free hardware projects (RaspberryPi, Arduino, BeagleBoard), non-profit office communities, the Global Village Construction Set (GVCS) [17], and the sharing of recipes such as FreeBeer [18] and OpenCola [19].

Our conclusion was that a more general, systemic adoption of FOSS principles promises to make a significant positive difference to our global society. A transition from wage labor to voluntary, community-based work could help to achieve, step by step, a free society, in which the needs of all can be recognized and met. On the African continent, this idea of community is very strong (“Ubuntu” [20]), while in Europe and North America, it has been lost over the centuries in favor of a profit-oriented approach.

Conclusion

People for whom the FOSS philosophy is new, and who grew up with a capitalistic, profit-based model of society, may come up with a number of reasonable questions in regard to open source content. Here, we will answer some of the most common questions:

Can someone steal my “invention”?
Through open source, we simply share our ideas, and we benefit from each other through this sharing of ideas. It is common practice, however, to give credit to the people who helped us to develop the idea.

How much can we learn from each other?
There is so much knowledge and there are so many ways of doing things to simplify and develop society. In using open source, we are learning together and teaching society, so that everyone benefits at the same time. The best solutions come from collaboration, as it multiplies and expands upon individual knowledge. Everyone has an idea that may inspire the other users, boost creativity, and encourage innovation.

We are standing on the shoulders of giants to make something great. Our work is based on the work of others. What can we give back to the community?
As individuals, we can evaluate a solution and report what is missing or whether the code is not working as expected. This feedback helps creators look at specific points, and repair or improve their code. This may include the insertion of missing parts in the documentation that can make it difficult to understand the idea behind the solution and the code’s intended use.

As a company that uses FOSS, you can also contribute support for hardware (running in a computing center), or sponsor events by providing meeting rooms or co-organizing conferences. Many scientific institutes and companies allow their employees to work on FOSS projects while being at work — the time spent improving open source code helps to improve the software that is used by the company.

A charity organization called Architecture for Humanity, recently renamed to Open Architecture Network [21, 22], is a free, online, open source community dedicated to improving global living conditions through innovative and sustainable building designs. This network includes project management, file sharing, a resource database, and online collaborative design tools. Through the use of open source software, this organization seeks to bring solutions to humanitarian crises by building community schools, homes, centers, etc. They do this by making professional architectural designs freely available, allowing architects, designers, innovators, and community leaders to share innovative and sustainable ideas, designs, and plans that support eco-friendly, humanitarian design and architecture. This organization was started as an initiative to help communities and was not focused on code, but rather on practical help.

References

[1] Creative Commons, https://creativecommons.org/
[2] Open Source Licenses comparison, https://choosealicense.com/licenses/
[3] Thomas Winde, Frank Hofmann: Von der Freien Software zur Freien Gesellschaft, Linux-User 12/2012, https://www.linux-community.de/ausgaben/linuxuser/2012/12/von-der-freien-software-zur-freien-gesellschaft/
[4] The Open Source Car (OSCar), theoscarproject.org
[5] The Open Source Car (OSCar), Wikipedia, https://en.wikipedia.org/wiki/OScar
[6] OpenStreetMap, http://www.openstreetmap.org/
[7] Wikimedia, https://www.wikimedia.org/
[8] Librivox, https://librivox.org/
[9] Code for Africa: Using Drones to Map Makoko, One of Africa’s Largest Slums, https://www.hotosm.org/projects/code-for-africa-using-drones-to-map-makoko-one-of-africas-largest-slums/
[10] Mesocentre de calcul, Université de Franche-Comté, Besancon, http://meso.univ-fcomte.fr/
[11] Baratang Miya, https://storyengine.io/baratang-miya/
[12] GirlHype Coders, https://girlhype.co.za/
[13] Pixar Animation Studios, https://www.pixar.com/
[14] Universal Scene Description Technology, https://graphics.pixar.com/usd/docs/index.html
[15] RenderMan, https://renderman.pixar.com/
[16] Freifunk, https://freifunk.net/
[17] Global Village Construction Set (GVCS), https://www.opensourceecology.org/gvcs/
[18] FreeBeer, http://freebeer.org/blog/
[19] OpenCola, https://www.artofdrink.com/soda/open-cola-recipe
[20] Jacom Mucumbate and Andrew Nyanguru: Exploring African Philosophy: The Value of Ubuntu in Social Work, African Journals Online, https://www.ajol.info/index.php/ajsw/article/download/127543/117068
[21] Alan G Brake: Architecture for humanity, https://www.dezeen.com/2016/03/10/architecture-for-humanity-relaunches-as-open-architecture-collaborative-humanitarian-charity/
[22] Open Architecture Collaborative, http://openarchcollab.org/
[23] The Slum that doesn’t exist, Deutsche Welle, https://www.dw.com/en/the-slum-that-doesnt-exist/av-51519062
[24] GirlHype South Africa, Youtube video, https://youtu.be/hfRINsiBhng
[25] Image taken from https://girlhype.co.za/index.php/blog

AUTHORS

Plaxedes Nehanda is a multiskilled, self-driven versatile person who wears many hats among them an events planner, a virtual assistant, transcriber as well as an avid researcher on any topic based in Johannesburg, South Africa.

Frank Hofmann works on the road – preferably from Berlin, Geneva, and Cape Town – as a developer, trainer, and author for magazines like Linux-User and Linux Magazine. He is also the co-author of the Debian package management book (http://www.dpmb.org).

Converting Documents From Markdown Into Microsoft Word Format

Frank Hofmann — Wed, 25 Dec 2019 07:28:32 +0000

Among other activities, writing and editing text documents belongs to the most common actions we use our (desktop) computers for. The exact way it is done follows different paths — from using a bare text editor like Vim to graphical applications like Open/Libre Office or cloud-based services that are accessible via webbrowser like Google Docs. To our disadvantage, every tool comes with its own native document format as well as selection of other supported document formats. The quality of the conversion between these formats varies widely, and can lead to a lot of frustration when crossing format boundaries.

In this article we have a look at the conversion between Markdown [1] and DOCX — the native document format of Microsoft Word that is in use since 2007. You may wonder why an enthusiast of Markdown and Asciidoc (like me) deals with this case. Well, collaborating with a group of other writers can lead to a situation whereas one or more participants request DOCX as the output format. Don’t let anybody down, and find out which limitations exist, instead, and how we can try to make all group members happy.

What is Markdown?

As already pointed out in “An Introduction into Markdown” [2], the intention for Markdown is a simple text to HTML conversion. The idea behind it was to make writing web pages, documentation and especially blog entries as easy as writing an e-mail. As of today it is the de facto-synonym for a class of lightweight markup description languages, and the goal can be seen as reached.

Markdown uses a plain text formatting syntax. With a similar approach as HTML a number of markers indicate headlines, lists, images, and references in your text. The few lines below illustrate a basic document that contains two headlines (1st and 2nd level) as well as two paragraphs, and a list environment.

# Recommended Places To Visit In Europe
## France
This is a selection of places:
* Paris (_Ile de France_)
* Strasbourg (_Alsace_)
For a proper visit plan about a week.

Conversion to DOCX

In order to convert your Markdown document to DOCX, use the tool pandoc [3]. Pandoc is a Haskell library, and describes itself as “the universal document converter”, or the “Swiss army knife for document conversions”. It is available for a variety of platforms such as Linux, Microsoft Windows, Mac OS X, and BSD. Pandoc is commonly included as a package for Linux distributions like Debian GNU/Linux, Ubuntu, and CentOS.

A simple call for a conversion is as follows:

$ pandoc -o test.docx test.md

The first parameter `-o` refers to the output file, followed by the name of the file (`test.docx`). The file extension helps pandoc to identify the desired output format. The second parameter names the input file — in our case it is simply `test.md`.

The long version of the command shown above contains the two parameters `-f markdown` and `-t docx`. The first one abbreviates the term `flavour`, and describes the format of the input file. The second one does the same for the output file, and abbreviates `-to`.

The full command is as follows:

$ pandoc -o test.docx -f markdown -t docx test.md

Opening the converted file using Microsoft Word results in the following output:

For the different text elements Pandoc uses stylesheets. This allows you to adjust these elements later according to your needs throughout the entire document. The newer versions of Pandoc also offer the other way around — you can convert a DOCX file into Markdown as follows:

$ pandoc -o test.md test.docx

Then, the generated file has the following content:

Recommended Places To Visit In Europe
=====================================
France
------
This is a selection of places:
- Paris (*Ile de France*)
- Strasbourg (*Alsace*)
For a proper visit plan about a week.

Useful Command-line Options

The list of Pandoc options is rather long. The following ones help you to produce better results, and make your life much easier:

* `-P` (long version `–preserve-tabs`): Preserve tabs instead of converting them to spaces. This is useful for code blocks with indented lines that are part of your text.

* `-S` (long version `–smart`): Produce typographically correct output.

This option corrects quotes, hyphens/dashes as well as ellipses (“…”). Additional, non-breaking spaces are added after certain abbreviations such as “Mr.”.

* `–track-changes=value`: Specifies what to do with insertions, deletions, and comments that are produced with the help of the Microsoft Word “Track Changes” feature. The value can be either accept, reject, or all in order to include or remove the changes made in the document. The result is a flat file.

For more options have a look at the documentation, and the manual page of Pandoc.

Summary

The conversion between Markdown and DOCX is no longer a mystery. It is done within a few steps, and works very well. Happy hacking

Links and References

* [1] Markdown
* [2] Frank Hofmann: Introduction to Markdown
* [3] Pandoc

Acknowledgements

The author would like to thank Annette Kalbow for her help while preparing the article.

Enabling IP-Forwarding for IPv4 in Debian GNU/Linux

Frank Hofmann — Mon, 25 Nov 2019 10:39:25 +0000

Setting up a computer network can be tricky sometimes. Enabling IPv4 Forwarding on a Linux machine is a rather simple task, luckily.

The term IP Forwarding describes sending a network package from one network interface to another one on the same device. It should be enabled when you want your system to act as a router that transfers IP packets from one network to another.

On a Linux system the Linux kernel has a variable named `ip_forward` that keeps this value. It is accessible using the file `/proc/sys/net/ipv4/ip_forward`. The default value is 0 which means no IP Forwarding, because a regular user who runs a single computer without further components is not in need of that, usually. In contrast, for routers, gateways and VPN servers it is quite an essential feature.

Next, we will explain to you how to enable IP Forwarding temporarily, and permanently.

IP Forwarding As A Temporary Solution

In order to enable this kernel parameter on the fly you have two options. Option 1 simply stores the value of 1 in the variable from above as follows:

# echo 1 > /proc/sys/net/ipv4/ip_forward

Option 2 uses the `sysctl` command that allows you to adjust different kernel parameters at runtime, too [2]. As an administrative user run the following command:

# sysctl -w net.ipv4.ip_forward=1

Keep in mind that this setting is changed instantly. Also, the result will not be preserved after rebooting the system.

You can query the stored value as follows:

# cat /proc/sys/net/ipv4/ip_forward

This command returns a value of 0 for no IP Forwarding, and a value of 1 for IP Forwarding enabled. As an alternative, using `sysctl` also shows you the current status:

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
#

Enabling IP Forwarding Permanently

In order to achieve this some other steps have to be done. First, edit the file `/etc/sysctl.conf`. Search for a line containing the entry “#net.ipv4.ip_forward=1”, and remove the # at the beginning of the line.

Then, save the file, and run the `sysctl` command in order to enable the adjusted settings:

# sysctl -p /etc/sysctl.conf

The option `-p` is short for `–load`, and requires a name for the configuration file to be followed.

Next, restart the proc file system that provides information about the status of the Linux kernel using the following command:

# /etc/init.d/procps restart

In about 2015 the file name was shortened from `procps.sh` to `procps`. So, on elderly Debian systems the script that you have to invoke is named `procps.sh`, instead.

Dealing With Systemd

The next hurdle came with the release of Systemd version 221. IP Forwarding is disabled by default, and enabling requires an additional file to be there. If it is not there yet, just add it. The file name consists of the name of the network interface followed by the suffix `.network`, for example `eth0.network` for the network interface `/dev/eth0`. As stated in the documentation [4], other extensions are ignored.

The following code snippet shows the setup for the network interface `/dev/tun0`. It contains of two sections — `Match` and `Network`. In the Match section define the name of the network interface, and in the network section enable IP Forwarding.

# cat /etc/systemd/network/tun0.network
[Match]
Name=tun0
[Network]
IPForward=ipv4

Conclusion

Activating IP Forwarding for IPv4 is not a mystery. Just a few steps, and your are there. Happy hacking!

Links and references

* [1] Setting up Systemd-Networkd, Debian Wiki
* [2] Juergen Haas: Learn the Linux sysctl command
* [3] Systemd News for version 221
* [4] Documentation for Systemd

Comparing ISO Images

Frank Hofmann — Mon, 07 Oct 2019 08:48:41 +0000

In order to setup and maintain computing devices, Linux distributors regularly provide according ISO images for their releases. This simplifies keeping our systems up-to-date with the help of a full compilation of software that actually fits together, in ideal circumstances.

Imagine that you have several of these ISO images stored locally. How do you figure out that the retrieved ISO images are authentic? In this article we show you how to verify the integrity and authenticity of an ISO image that has been downloaded before, and how to figure out what are the differences between the actual content of two ISO images. This helps you to verify the building process for the ISO image, and allows you to see what may have changed between two builds, or releases that are available.

Image formats

The format of disk images has its own history [11]. The common standard is ISO 9660 [12] that describes the contents of an optical disc as a whole. In use is the file extension .iso in order to identify an image file (cloned copy).

The original ISO 9660 format comes with a number of limitations such as 8 directory levels as well as the length of file names. These limitations have been reduced by the introduction of a number of extensions such as Rock Ridge [13] (preservation of POSIX permissions and longer names), Joliet [14] (storage of Unicode names in UCS-2), and Apple ISO 9660 Extensions [15] that introduced HFS support.

In order to get more details regarding an image file use the `file` command followed by the name of the data file as follows:

.Listing 1: Displaying the details for an ISO file

$ file *.iso
debian-10.1.0-amd64-netinst.iso: DOS/MBR boot sector;
partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63),
startsector 3808, 5664 sectors
xubuntu-18.04.3-desktop-amd64.iso: DOS/MBR boot sector;
partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63),
startsector 11688, 4928 sectors $

Verifying downloaded ISO files

Trustworthy software providers always offer you two things for download — the actual ISO image as well as the according checksum of the image in order to do an integrity check for the downloaded file. The latter one allows you to confirm that your local file is an exact copy of the file present on the download servers, and nothing went wrong during the download. In case of an error during the download the local file is corrupted, and can trigger random issues during the installation [16].

Furthermore, in case the ISO image has been compromised (as it happened with Linux Mint in early 2016 [17]) the two checksums will not match. You can calculate the checksums using `md5sum` (deprecated, no longer recommended) and `sha256sum` as follows:

.Listing 2: Calculating the checksum for ISO files

$ md5sum *.iso
b931ef8736c98704bcf519160b50fd83 debian-10.1.0-amd64-netinst.iso
0c268a465d5f48a30e5b12676e9f1b36 xubuntu-18.04.3-desktop-amd64.iso

$ sha256sum *.iso
7915fdb77a0c2623b4481fc5f0a8052330defe1cde1e0834ff233818dc6f301e debian-10.1.0-amd64-netinst.iso
3c9e537ee1cf64088251e56b4ca1694944ad59126f298f24a78cd43af152b5b3 xubuntu-18.04.3-desktop-amd64.iso

$

You can invoke the comparison between the provided checksum file and the locally stored ISO image as displayed in listing 3. The output of OK at the end of a line signalizes that both checksums are the same.

.Listing 3: Compare provided checksums

$ sha256sum --check sha256sum.txt xubuntu-18.04.3-desktop-amd64.iso: OK
$

Comparing two locally stored ISO files

It may happen that you have downloaded two ISO files, and you would like to figure out if they are entirely the same. The `sha256sum` command is useful, again, and we recommend you to encapsulate this check in a shell script. In Listing 4 you see an according bash script that combines the four commands `sha256sum`, `cut`, `uniq`, and `wc` in order to separate the first column for all the output lines, merge them in case they are identical, and count the number of lines that remain. If the two (or more) ISO files are the same then its checksums are identical, only a single line will remain, and the bash script will output the message “the files are the same”, eventually:

.Listing 4: Automatically comparing checksums of ISO files using `sha256sum`

#!/bin/bash

if [ `sha256sum *.iso | cut -d' ' -f1 | uniq | wc -l` eq 1 ]
then
echo "the files are the same"
else
echo "the files are not identical"
fi

In case the script returns that the two files are different you may be interested in the exact position of inequality. A byte-order comparison can be done using the `cmp` command that outputs the first byte that differs between the files:

.Listing 5: See the differences between two or more files using `cmp`

$ cmp *.iso
debian-10.1.0-amd64-netinst.iso xubuntu-18.04.3-desktop-amd64.iso differ: byte 433, line 4
$

Comparing the actual content

So far, we did a byte-order comparison, and now we will have a closer look inside — at the actual content of the ISO files to be compared with each other. At this point a number of tools come into play that help to compare single files, entire directory structures as well as compressed archives, and ISO images.

The `diff` command helps to compare a directory using the two switches `-r` (short for `–recursive`) and `-q` (short for `–brief`) followed by the two directories to be compared with each other. As seen in

Listing 6, `diff` reports which files are unique to either directory, and if a file with the same name has changed.

.Listing 6: Comparing two directories using `diff`

$ diff -qr t1/ t2/
Only in t1/: blabla.conf.
The files t1/nsswitch.conf and t2/nsswitch.conf are different.
Only in t2/: pwd.conf.
$

In order to compare two ISO images simply mount the two image files to separate directories, and go from there.

A more colourful output on the commandline is provided by the tools `colordiff` [1,2] and `icdiff` [18,19]. Figure 1 shows the output of `icdiff` in which the differences between the two files of `nsswitch.conf` are highlighted in either green or red.

Figure 1: Comparing two directories using `icdiff`

Graphical tools for a comparison of directories include `fldiff` [5], `xxdiff` [6] and `dirdiff` [7]. `xxdiff` was inspired by `fldiff`, and that’s why they look rather similar. Entries that have a similar content come with a white or gray background, and entries that differ come with a light-yellow background, instead. Entries with a bright-yellow or green background are unique to a directory.

Figure 2: Comparing two directories using `fldiff`

`xxdiff` displays the file differences in a separate window by clicking on an entry (see Figure 3).

Figure 3: Comparing two directories using `xxdiff`

The next candidate is `dirdiff`. It builds on top of the functionality of `xxdiff`, and can compare up to five directories. Files that exist in either directory are marked with an X. Interestingly, the colour scheme that is in use for the output window is the same one as `icdiff` uses (see Figure 4).

Figure 4: Comparing two directories using `dirdiff`

Comparing compressed archives and entire ISO images is the next step. While the `adiff` command from the `atool` package [10] might be already known to you, we will have a look at the `diffoscope` command [8,9], instead. It describes itself as “a tool to get to the bottom of what makes files or directories different. It recursively unpacks archives of many kinds and transforms various binary formats into more human readable forms to compare them”. The origin of the tool is The Reproducible Builds Project [19,20] which is “a set of software development practices that create an independently-verifiable path from source to binary code”. Among others, it supports the following file formats:

* Android APK files and boot images
* Berkeley DB database files
* Coreboot CBFS filesystem images
* Debian .buildinfo and .changes files
* Debian source packages (.dsc)
* ELF binaries
* Git repositories
* ISO 9660 CD images
* MacOS binaries
* OpenSSH public keys
* OpenWRT package archives (.ipk)
* PGP signed/encrypted messages
* PDF and PostScript documents
* RPM archives chives

Figure 5 shows the output of `diffoscope` when comparing two different versions of Debian packages — you will exactly see the changes that have been made. This includes both file names, and contents.

Figure 5: Comparing two Debian packages using `diffoscope` (excerpt)

Listing 7 shows the output of `diffoscope` when comparing two ISO images with a size of 1.9G each. In this case the two ISO images belong to Linux Mint Release 19.2 whereas one image file was retrieved from a French server, and the other one from an Austrian server (hence the letters `fr` and `at`). Within seconds `diffoscope` states that the two files are entirely identical.

.Listing 7: Comparing two ISO images using `diffoscope`

$ diffoscope linuxmint-19.2-xfce-64bit.fr.iso linuxmint-19.2-xfce-64bit.at.iso
|####################################################| 100% Time: 0:00:00
$

In order to look behind the scenes it helps to call `diffoscope` with the two options `–debug` and `–text -` for both more verbose output to the terminal. This allows you to learn what the tool is doing. Listing 8 shows the according output.

.Listing 8: Behind the scenes of `diffoscope`

$ diffoscope --debug --text - linuxmint-19.2-xfce-64bit.fr.iso
linuxmint-19.2-xfce-64bit.at.iso

2019-10-03 13:45:51 D: diffoscope.main: Starting diffoscope 78
2019-10-03 13:45:51 D: diffoscope.locale: Normalising locale, timezone, etc.
2019-10-03 11:45:51 D: diffoscope.main: Starting comparison
2019-10-03 11:45:51 D: diffoscope.progress: Registering < diffoscope.progress.ProgressBar object at 0x7f4b26310588> as a progress observer
2019-10-03 11:45:52 D: diffoscope.comparators: Loaded 50 comparator classes64bit.fr.iso ETA: --:--:--
2019-10-03 11:45:52 D: diffoscope.comparators.utils.specialize: Unidentified file. Magic says: DOS/MBR boot sector; partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63), startsector 652, 4672 sectors
2019-10-03 11:45:52 D: diffoscope.comparators.utils.specialize: Unidentified file. Magic says: DOS/MBR boot sector; partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63), startsector 652, 4672 sectors
2019-10-03 11:45:52 D: diffoscope.comparators.utils.compare: Comparing linuxmint-19.2-xfce-64bit.fr.iso (FilesystemFile) and linuxmint-19.2-xfce-64bit.at.iso (FilesystemFile)
2019-10-03 11:45:52 D: diffoscope.comparators.utils.file: Binary.has_same_content: <<class 'diffoscope.comparators.binary.FilesystemFile'> linuxmint-19.2-xfce-64bit.fr.iso> <<class 'diffoscope.comparators. binary.FilesystemFile'> linuxmint-19.2-xfce-64bit.at.iso>
2019-10-03 11:45:53 D: diffoscope.comparators.utils.compare: has_same_content_as returned True; skipping further comparisons
|####################################################| 100% Time: 0:00:01
2019-10-03 11:45:53 D: diffoscope.tempfiles: Cleaning 0 temp files
2019-10-03 11:45:53 D: diffoscope.tempfiles: Cleaning 0 temporary directories
$

Well, so far, so good. The next tests have been done on images from different releases and with different file sizes. All of them resulted in an internal error that traces back to the `diff` command running out of internal memory. It looks like that there is a file size limit of about 50M. That’s why I have built two smaller images of 10M each, and handed it over to `diffoscope` for a comparison. Figure 6 shows the result. The output is a tree structure containing the file `nsswitch.conf` with the highlighted differences.

Figure 6: Comparing two ISO images using `diffoscope`

Also, an HTML version of the output can be provided. Figure 7 shows the output as an HTML file in a webbrowser. It is achievable via the switch

`--html output.html`.

Figure 7: Comparing two ISO images using `diffoscope` (HTML output)

In case you do not like the output style, or would like to match it with the corporate identity of your company, you can customize the output by your own CSS file using the switch `–css style.css` that loads the style from the referenced CSS file.

Conclusion

Finding differences between two directories or even entire ISO images is a bit tricky. The tools shown above help you mastering this task. So, happy hacking!

Thank you
The author would like to thank Axel Beckert for his help while preparing the article.

Links and references

* [1] colordiff
* [2] colordiff, Debian package,
* [3] diffutils
* [4] diffutils, Debian package,
* [5] fldiff
* [6] xxdiff
* [7] dirdiff
* [8] diffoscope
* [9] diffoscope, Debian package
* [10] atool, Debian package
* [11] Brief introduction of some common image file formats
* [12] ISO 9660, Wikipedia
* [13] Rock Ridge, Wikipedia
* [14] Joliet, Wikipedia
* [15] Apple ISO 9660 Extensions, Wikipedia
* [16] How to verify ISO images, Linux Mint
* [17] Beware of hacked ISOs if you downloaded Linux Mint on February 20th!
* [18] icdiff
* [19] icdiff, Debian package
* [20] The Reproducible Builds Project
* [21] The Reproducible Builds Project, Debian Wiki

Setting up PostgreSQL with PostGIS on Debian GNU/Linux 10

Frank Hofmann — Fri, 27 Sep 2019 12:31:54 +0000

As symbolized by the blue elephant with its disctinctive project symbol, PostgreSQL belongs to the most stable Open Source SQL Database Management Systems (DBMS) ever: an elephant is well known to have a great memory, and never forgets what he has observed.

Available for more than 20 years now, PostgreSQL has proven its remarkable reliability in use cases ranging from small to huge datasets. The list of satisfied commercial and non-commercial users is quite long, and among others it includes the United Nations Children’s Fund (UNICEF), the Creative Commons archive, Skype, and the BMW Group.

Its built-in transaction management model as well as the set of geometric data types helped to stand out the software from other developments such as MySQL/MariaDB, Redis , or SQLite . In this article we focus on the setup of PostgreSQL 11.5 in combination with PostGIS 2.5 .

PostGIS is the spatial extension of PostgreSQL which adds both geometric functions and geographic features to PostgreSQL. Simply speaking, these spatial datatypes act as shapes, and both abstract and encapsulate spatial structures such as boundary and dimension. Among others, newly available datatypes are Point, Surface, and Curve.

One of the most prominent users of PostGIS is the Institute Géographique National (IGN) of France which collects, integrates, manages and distributes reference geographical information for the entire country. Since July 2006, PostGIS is in extensive use. Up to now the IGN’s database holds more than 100 million spatial objects.

We will set up PostgreSQL/PostGIS on Debian GNU/Linux 10 “Buster” using the XFCE desktop environment .

Setting up PostgreSQL

Setting up the PostgreSQL DBMS on a Debian GNU/Linux requires only a moderate level of knowledge of system administration. The challenge here is the right order of steps that are required (see for a full list with images). As with every other Linux distribution, there are default settings and package names that can be a bit troublesome. We don’t moan, and just start, instead.

Installing PostgreSQL as a software

Step one is the installation of the PostgreSQL package. In a terminal you can do that as follows:

# apt-get install postgresql

Using the Chef configuration management system, a basic recipe that leads to the same result contains just the following lines:

package ‘postgresql’ do action :install end
service ‘postgresql’ do action : [ :enable, :start ] end

These lines lead to the installation of the postgresql package (plus package dependencies), and enabling the according service. In order to check the PostgreSQL service for being running, this command should give you a positive output, then:

# service postgresql status

Completing the setup for the administrator’s account

The user postgres administrates the PostgreSQL databases. Step two is finalizing this account, and begins with adding a password to his credentials as follows:

# passwd postgres
New password:
Retype new password:
passwd: password updated successfully
#

Logging in as the user postgres allows you to grant other users access to the PostgreSQL database. Subsequently, we have to add a user in step three. Please be aware of the fact that both the Linux system and PostgreSQL keep their user databases separately. That’s why you have to make sure that a regular Linux user with the same name exists on your system, too, before enabling access to PostgreSQL for him.

Adding a user account

Step four is done as the user postgres. Change from root to postgres, and create a new account for the user linuxhint in the PostgreSQL database with the help of this command:

postgres $ createuser –interactive linuxhint
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role ve allowed to create new roles? (y/n) n
postgres $

Next, set a password for the newly created user linuxhint. Login to the database shell using psql, and set the new password using the command \password. After that type in \q in order to quit the database shell, and to return to the shell in the terminal:

postgres $ psql psql (11.5 (Debian 11.5-1+deb10u1)) Type “help” for further help.
postgres=# linuxhint Enter new password: Retype the new password: postgres=# postgres $

Step five is the creation of a separate database for the user linuxhint. In order to do so type in the command createdb as user postgres:

postgres $ createdb linuxhint

Now, the user linuxhint has its own database, and can work with it according to his needs.

Adding PostGIS

Step six consists of the installation of the PostGIS package. As done for PostgreSQL before, it can be done as follows using apt-get:

# apt-get install postgis

Alternatively, a simple recipe for Chef would be this one:

package ‘postgis’ do
action :install
end

The PostGIS package has a dependency for the Debian package postgresql-11-postgis-2.5-scripts (automatically installed) that connects PostGIS to PostgreSQL, and eliminates a number of manual steps needed in other distributions. No matter which one of the two installation methods you choose – apt-get or Chef – , the Debian package management will make sure that all the depending packages are both installed, and configured correctly.

Step seven is the enabling of the PostGIS extension. As explained in the PostGIS documentation, do not install it in the database named postgres as this one is in use for the internal datastructures of PostgreSQL, and only enable it in each user database you actually need it in. Login as the user postgres, connect to the desired database, and create the two extensions postgis and postgis_topology as shown below. The command \c connects you to the desired database, and CREATE EXTENSION makes the desired extension available:

postgres=#

Now you are connected with the database “linuxhint” as user “postgres”.

linuxhint=# CREATE EXTENSION postgis;
CREATE EXTENSION
linuxhint=# CREATE EXTENSION postgis_topology;
CREATE EXTENSION
linuxhint=#

Step seven is for validation that the activation of the extension was successful. The PostgreSQL command \dx lists the extensions that are installed, and both postgis and postgis_topology should be in the list, now.

PostGIS provides other extensions, too. We recommend to install only what you need. See the PostGIS documentation for more information regarding the extensions.

Adding Data

Having setup PostGIS successfully it is time to add tables, and fill them with data. Quite a lot of geographic data is available online for free, for example from Geofabrik. The data is provided as shape files which is a common vector data format for GIS software.

Having downloaded the shape file, load the content of the shape file into PostGIS with the help of the special commandline tool shp2pgsql. The example below demonstrates how to convert the shape file into a sequence of SQL commands, first, and upload the list of SQL commands to the database using psql, next:

linuxhint $ shp2pgsql -cDiI railways.shp railway > railway.sql
Shapefile type: arc
Postgis type: MULTILINESTRING[2]
linuxhint $
linuxhint $ psql -f railway.sql

The figure below shows the output that is printed on screen as soon as you upload the data.

Now, PostgreSQL/PostGIS is at your service, and ready to recieve your SQL queries. For example, pgadmin allows you a look under the hood within minutes. The figure below shows this for the uploaded data. The rightmost column has a geometric type MultiLineString.

Conclusion

Setting up PostgreSQL/PostGIS is not rocket science. With the steps explained above you can do this in less than an hour, and have results quickly. Et voila!

Links and References

Understanding the ELF File Format

Frank Hofmann — Mon, 09 Sep 2019 10:16:22 +0000

From Source Code To Binary Code

Programming starts with having a clever idea, and writing source code in a programming language of your choice, for example C, and saving the source code in a file. With the help of an adequate compiler, for example GCC, your source code is translated into object code, first. Eventually, the linker translates the object code into a binary file that links the object code with the referenced libraries. This file contains the single instructions as machine code that are understood by the CPU, and are executed as soon the compiled program is run.

The binary file mentioned above follows a specific structure, and one of the most common ones is named ELF that abbreviates Executable and Linkable Format. It is widely used for executable files, relocatable object files, shared libraries, and core dumps.

Twenty years ago – in 1999 – the 86open project has chosen ELF as the standard binary file format for Unix and Unix-like systems on x86 processors. Luckily, the ELF format had been previously documented in both the System V Application Binary Interface, and the Tool Interface Standard [4]. This fact enormously simplified the agreement on standardization between the different vendors and developers of Unix-based operating systems.

The reason behind that decision was the design of ELF – flexibility, extensibility, and cross-platform support for different endian formats and address sizes. ELF’s design is not limited to a specific processor, instruction set, or hardware architecture. For a detailed comparison of executable file formats, have a look here [3].

Since then, the ELF format is in use by several different operating systems. Among others, this includes Linux, Solaris/Illumos, Free-, Net- and OpenBSD, QNX, BeOS/Haiku, and Fuchsia OS [2]. Furthermore, you will find it on mobile devices running Android, Maemo or Meego OS/Sailfish OS as well as on game consoles like the PlayStation Portable, Dreamcast, and Wii.

The specification does not clarify the filename extension for ELF files. In use is a variety of letter combinations, such as .axf, .bin, .elf, .o, .prx, .puff, .ko, .so, and .mod, or none.

The Structure of an ELF File

On a Linux terminal, the command man elf gives you a handy summary about the structure of an ELF file:

Listing 1: The manpage of the ELF structure

$ man elf

ELF(5) Linux Programmer's Manual ELF(5)

NAME
elf - format of Executable and Linking Format (ELF) files

SYNOPSIS
#include

DESCRIPTION
The header file defines the format of ELF executable binary
files. Amongst these files are normal executable files, relocatable
object files, core files and shared libraries.

An executable file using the ELF file format consists of an ELF header,
followed by a program header table or a section header table, or both.
The ELF header is always at offset zero of the file. The program
header table and the section header table's offset in the file are
defined in the ELF header. The two tables describe the rest of the
particularities of the file.

...

As you can see from the description above, an ELF file consists of two sections – an ELF header, and file data. The file data section can consist of a program header table describing zero or more segments, a section header table describing zero or more sections, that is followed by data referred to by entries from the program header table, and the section header table. Each segment contains information that is necessary for run-time execution of the file, while sections contain important data for linking and relocation. Figure 1 illustrates this schematically.

The ELF Header

The ELF header is 32 bytes long, and identifies the format of the file. It starts with a sequence of four unique bytes that are 0x7F followed by 0x45, 0x4c, and 0x46 which translates into the three letters E, L, and F. Among other values, the header also indicates whether it is an ELF file for 32 or 64-bit format, uses little or big endianness, shows the ELF version as well as for which operating system the file was compiled for in order to interoperate with the right application binary interface (ABI) and cpu instruction set.

The hexdump of the binary file touch looks as follows:

.Listing 2: The hexdump of the binary file

$ hd /usr/bin/touch | head -5
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 e3 25 40 00 00 00 00 00 |..>......%@.....|
00000020 40 00 00 00 00 00 00 00 28 e4 00 00 00 00 00 00 |@.......(.......|
00000030 00 00 00 00 40 00 38 00 09 00 40 00 1b 00 1a 00 |....@.8...@.....|
00000040 06 00 00 00 05 00 00 00 40 00 00 00 00 00 00 00 |........@.......|

Debian GNU/Linux offers the readelf command that is provided in the GNU ‘binutils’ package. Accompanied by the switch -h (short version for “–file-header”) it nicely displays the header of an ELF file. Listing 3 illustrates this for the command touch.

.Listing 3: Displaying the header of an ELF file

$ readelf -h /usr/bin/touch
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4025e3
Start of program headers: 64 (bytes into file)
Start of section headers: 58408 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 27
Section header string table index: 26

The Program Header

The program header shows the segments used at run-time, and tells the system how to create a process image. The header from Listing 2 shows that the ELF file consists of 9 program headers that have a size of 56 bytes each, and the first header starts at byte 64.

Again, the readelf command helps to extract the information from the ELF file. The switch -l (short for –program-headers or –segments) reveals more details as shown in Listing 4.

.Listing 4: Display information about the program headers

$ readelf -l /usr/bin/touch

Elf file type is EXEC (Executable file)
Entry point 0x4025e3
There are 9 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000d494 0x000000000000d494 R E 200000
LOAD 0x000000000000de10 0x000000000060de10 0x000000000060de10
0x0000000000000524 0x0000000000000748 RW 200000
DYNAMIC 0x000000000000de28 0x000000000060de28 0x000000000060de28
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x000000000000bc40 0x000000000040bc40 0x000000000040bc40
0x00000000000003a4 0x00000000000003a4 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x000000000000de10 0x000000000060de10 0x000000000060de10
0x00000000000001f0 0x00000000000001f0 R 1

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got

The Section Header

The third part of the ELF structure is the section header. It is meant to list the single sections of the binary. The switch -S (short for –section-headers or –sections) lists the different headers. As for the touch command, there are 27 section headers, and Listing 5 shows the first four of them plus the last one, only. Each line covers the section size, the section type as well as its address and memory offset.

.Listing 5: Section details revealed by readelf

$ readelf -S /usr/bin/touch
There are 27 section headers, starting at offset 0xe428:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
...
...
[26] .shstrtab STRTAB 0000000000000000 0000e334
00000000000000ef 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)

Tools to Analyze an ELF file

As you may have noted from the examples above, GNU/Linux is fleshed out with a number of useful tools that help you to analyze an ELF file. The first candidate we will have a look at is the file utility.

file displays basic information about ELF files, including the instruction set architecture for which the code in a relocatable, executable, or shared object file is intended. In listing 6 it tells you that /bin/touch is a 64-bit executable file following the Linux Standard Base (LSB), dynamically linked, and built for the GNU/Linux kernel version 2.6.32.

.Listing 6: Basic information using file

$ file /bin/touch
/bin/touch: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l,
for GNU/Linux 2.6.32, BuildID[sha1]=ec08d609e9e8e73d4be6134541a472ad0ea34502, stripped
$

The second candidate is readelf. It displays detailed information about an ELF file. The list of switches is comparably long, and covers all the aspects of the ELF format. Using the switch -n (short for –notes) Listing 7 shows the note sections, only, that exist in the file touch – the ABI version tag, and the build ID bitstring.

.Listing 7: Display Selected sections of an ELF file

$ readelf -n /usr/bin/touch

Displaying notes found at file offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.32

Displaying notes found at file offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: ec08d609e9e8e73d4be6134541a472ad0ea34502

Note that under Solaris and FreeBSD, the utility elfdump [7] corresponds with readelf. As of 2019, there has not been a new release or update since 2003.

Number three is the package named elfutils [6] that is purely available for Linux. It provides alternative tools to GNU Binutils, and also allows validating ELF files. Note that all the names of the utilities provided in the package start with eu for ‘elf utils’.

Last but not least we will mention objdump. This tool is similar to readelf but focuses on object files. It provides a similar range of information about ELF files and other object formats.

.Listing 8: File information extracted by objdump

$ objdump -f /bin/touch

/bin/touch: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004025e3

$

There is also a software package called ‘elfkickers’ [9] which contains tools to read the contents of an ELF file as well as manipulating it. Unfortunately, the number of releases is rather low, and that’s why we just mention it, and do not show further examples.

As a developer you may have a look at ‘pax-utils’ [10,11], instead. This set of utilities provides a number of tools that help to validate ELF files. As an example, dumpelf analyzes the ELF file, and returns a C header file containing the details – see Figure 2.

Conclusion

Thanks to a combination of clever design and excellent documentation the ELF format works very well, and is still in use after 20 years. The utilities shown above allow you an insight view into an ELF file, and let you figure out what a program is doing. These are the first steps for analyzing software – happy hacking!

Links and References

[1] Executable and Linkable Format (ELF), Wikipedia
[2] Fuchsia OS
[3] Comparison of executable file formats, Wikipedia
[4] Linux Foundation, Referenced Specifications
[5] Ciro Santilli: ELF Hello World Tutorial
[6] elfutils Debian package
[7] elfdump
[8] Michael Boelen: The 101 of ELF files on Linux: Understanding and Analysis
[9] elfkickers
[10] Hardened/PaX Utilities
[11] pax-utils, Debian package

Acknowledgements

The writer would like to thank Axel Beckert for his support regarding the preparation of this article.

Understanding the Locales on Debian GNU/Linux

Frank Hofmann — Thu, 29 Aug 2019 18:38:27 +0000

Each computer system comes with its specific setup regarding the system language, and character encoding that is in use. Based on this configuration the error messages, the help system as well as the program’s feedback is displayed on screen.

On UNIX/Linux systems this setup is called POSIX [7] locales, and standardized as IEEE Std 1003.1-2017 [3]. Such a locale can vary for the system as a whole, and the single user accounts as every single user can individualize his working environment. In this article we will explain to you how to figure out the current locale setup on Debian GNU/Linux, to understand its single adjusting screws, and how to adapt the system to your needs.

Note that this article is tailored to Debian GNU/Linux Release 10 “Buster”. Unless otherwise stated the techniques described here also work for its derivates like Ubuntu or Linux Mint [8].

What is a locale?

Generally speaking, a locale is a set of values that reflect the nature and the conventions of a country, or a culture. Among others these values are stored as environment variables that represent the language, the character encoding, the date and time formatting, the default paper size, the country’s currency as well as the first day of the week.

As touched on before, there is a general setting known as ‘default locale’, and a user-defined setting. The default locale works system-wide and is stored in the file /etc/default/locale. Listing 1 displays the default locale on a Debian GNU/Linux using German as the main language, and 8 bit unicode (UTF-8) as the character set [11].

Listing 1: The default locale on a German Debian GNU/Linux

$ cat /etc/default/locale # File generated by update-locale LANG=“de_DE.UTF-8” $ —-

Please note that in contrast to Debian GNU/Linux, on some earlier Ubuntu versions the system-wide locale setup is stored at /etc/locale.conf.

The user-defined settings are stored as a hidden file in your home directory, and the actual files that are evaluated depend on the login shell that you use [6]. The traditional Bourne shell (/bin/sh) [4] reads the two files /etc/profile and ~/.profile, whereas the Bourne-Again shell (Bash) (/bin/bash) [5] reads /etc/profile and ~/.bash_profile. If your login shell is Z shell (/bin/zsh) [9], the two files ~/.zprofile and ~/.zlogin are read, but not ~/.profile unless invoked in Bourne shell emulation mode [10].

Starting a shell in a terminal in an existing session results in an interactive, non-login shell. This may result in reading the following files – ~/.bashrc for Bash, and /etc/zshrc as well as ~/.zshrc for Z shell [6].

Naming a locale

As explained here [12], the name of a locale follows a specific pattern. The pattern consists of language codes, character encoding, and the description of a selected variant.

A name starts with an ISO 639-1 lowercase two-letter language code [13], or an ISO 639-2 three-letter language code [14] if the language has no two-letter code. For example, it is de for German, fr for French, and cel for Celtic. The code is followed for many but not all languages by an underscore _ and by an ISO 3166 uppercase two-letter country code [15]. For example, this leads to de_CH for Swiss German, and fr_CA for a French-speaking system for a Canadian user likely to be located in Québec.

Optionally, a dot . follows the name of the character encoding such as UTF-8, or ISO-8859-1, and the @ sign followed by the name of a variant. For example, the name en_IE.UTF-8@euro describes the setup for an English system for Ireland with UTF-8 character encoding, and the Euro as the currency symbol.

Commands and Tools

The number of commands related to locales is relatively low. The list contains locale that purely displays the current locale settings. The second one is localectl that can be used to query and change the system locale and keyboard layout settings. In order to activate a locale the tools dpkg-reconfigure and locale-gen come into play – see the example below.

Show the locale that is in use

Step one is to figure out the current locale on your system using the locale command as follows:

Listing 2: Show the current locale

$ locale LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE=“de_DE.UTF-8” LC_NUMERIC=“de_DE.UTF-8”
LC_TIME=“de_DE.UTF-8” LC_COLLATE=“de_DE.UTF-8” LC_MONETARY=“de_DE.UTF-8”
LC_MESSAGES=“de_DE.UTF-8” LC_PAPER=“de_DE.UTF-8” LC_NAME=“de_DE.UTF-8”
LC_ADDRESS=“de_DE.UTF-8” LC_TELEPHONE=“de_DE.UTF-8” LC_MEASUREMENT=“de_DE.UTF-8”
LC_IDENTIFICATION=“de_DE.UTF-8” LC_ALL= $ —-

Please note that other Linux distributions than Debian GNU/Linux may use additional environment variables not listed above. The single variables have the following meaning:

LANG: Determines the default locale in the absence of other locale related environment variables
LANGUAGE: List of fallback message translation languages
LC_CTYPE: Character classification and case conversion
LC_NUMERIC: Numeric formatting
LC_TIME: Date and time formats
LC_COLLATE: Collation (sort) order
LC_MONETARY: Monetary formatting
LC_MESSAGES: Format of interactive words and responses
LC_PAPER: Default paper size for region
LC_NAME: Name formats
LC_ADDRESS: Convention used for formatting of street or postal addresses
LC_TELEPHONE: Conventions used for representation of telephone numbers
LC_MEASUREMENT: Default measurement system used within the region
LC_IDENTIFICATION: Metadata about the locale information
LC_RESPONSE: Determines how responses (such as Yes and No) appear in the local language (not in use by Debian GNU/Linux but Ubuntu)
LC_ALL: Overrides all other locale variables (except LANGUAGE)

List available locales

Next, you can list the available locales on your system using the locale command accompanied by its option -a. -a is short for –all-locales:

Listing 3: Show available locales

$ locale -a C C.UTF-8 de_DE@euro de_DE.utf8 en_US.utf8 POSIX $ —-

Listing 3 contains two locale settings for both German (Germany) and English (US). The three entries C, C.UTF-8, and POSIX are synonymous and represent the default settings that are appropriate for data that is parsed by a computer program. The output in Listing 3 is based on the list of supported locales stored in /usr/share/i18n/SUPPORTED.

Furthermore, adding the option -v (short for –verbose) to the call leads to a much more extensive output that includes the LC_IDENTIFICATION metadata about each locale. Figure 1 shows this for the call from Listing 3.

In order to see which locales already exist, and which ones need further help to be completed you may also have a look at the map of the Locale Helper Project [20]. Red markers clearly show which locales are unfinished. Figure 2 displays the locales for South Africa that look quite complete.

Show available character maps

The locale command comes with the option -m that is short for –charmaps. The output shows the available character maps, or character set description files [16]. Such a file is meant to “define characteristics for the coded character set and the encoding for the characters specified in Portable Character Set, and may define encoding for additional characters supported by the implementation” [16]. Listing 4 illustrates this with an extract of the entire list.

Listing 4: Character set description files

$ locale -m ANSI_X3.110-1983 ANSI_X3.4-1968 ARMSCII-8 ASMO_449 BIG5 BIG5-HKSCS … $ —-

Show the definitions of locale variables

Each variable used for a locale comes with its own definition. Using the option -k (short for –keyword-name) the locale command displays this setting in detail. Listing 5 illustrates this for the variable LC_TELEPHONE as it is defined in a German environment – the phone number format, the domestic phone format, the international selection code as well as the country code (international prefix), and the code set. See the Locale Helper Project [20] for a detailed description of the values.

Listing 5: The details of LC_TELEPHONE

$ locale -k LC_TELEPHONE tel_int_fmt=“+%c %a %l” tel_dom_fmt=“%A %l”
int_select=“00” int_prefix=“49” telephone-codeset=“UTF-8” $ —-

Changing the current locale

The knowledge regarding the locale becomes necessary as soon as you run a system that comes with a different locale than you are used to – for example, on a Linux live system. Changing the locale can be done in two ways – reconfiguring the Debian locales package [19], and adding the required locale using the command locale-gen. For option one, running the following command opens a text-based configuration dialog shown in Figure 3:

# dpkg-reconfigure locales

Press the space bar in order to choose the desired locale(s) from the list shown in the dialog box, and choose “OK” to confirm your selection. The next dialog window offers you a list of locales that are available for the default locale. Select the desired one, and choose “OK”. Now, the according locale files are generated, and the previously selected locale is set for your system.

For option two, generating the desired locale is done with the help of the command locale-gen. Listing 6 illustrates this for a French setup:

Listing 6: Generating a French locale

locale-gen fr_FR.UTF-8
Generating locales… fr_FR.UTF-8… done Generation complete. # —-

In order to use the previously generated locale as the default one, run the command in Listing 7 to set it up properly:

Listing 7: Manually setting the locale

# update-locale LANG=fr_FR.UTF-8

As soon as you open a new terminal session, or re-login to your system, the changes are activated.

Compile a locale definition file

The command localectl helps you to manually compile a locale definition file. In order to create a French setting run the command as follows:

Listing 8: Compile a locale definition

# localedef -i fr_FR -f UTF-8 fr_FR.UTF-8

Conclusion

Understanding locales can take a while as it is a setup that is influenced by several factors. We explained how to figure out your current locale, and how to change it properly. Adpating the Linux system to your needs should be much easier for you from now on.

Links and References

[1] Locale, Debian Wiki
[2] ChangeLanguage, How to change the language of your Debian system
[3] POSIX Locale, The Open Group Base Specifications Issue 7, 2018 edition
[4] Bourne shell, Wikipedia
[5] Bourne-Again shell, Wikipedia
[6] Difference between Login Shell and Non-Login Shell?, StackExchange
[7] Portable Operating System Interface (POSIX), Wikipedia
[8] Linux Mint
[9] Z shell, Wikipedia
[10] Zsh Shell Builtin Commands
[11] UTF-8, Wikipedia
[12] What should I set my locale to and what are the implications of doing so?
[13] ISO 639-1, Wikipedia
[14] ISO 639-2, Wikipedia
[15] ISO 3166, Wikipedia
[16] Character Set Description Files
[17] Locale, Ubuntu Wiki
[19] locales Debian package
[20] Locale Helper Project

Debian Changing Hostname

Frank Hofmann — Wed, 06 Feb 2019 11:15:14 +0000

The Hostname

The hostname is the label assigned to a device on a network – a desktop computer, database server, tablet pc, wifi router, or smartphone. This name is used to distinguish the devices from each another on a specific network or over the internet.

Mostly, the chosen name is human-readable, and has to be unique among the other machines in the local network. Hostnames must not contain a space since they can only contain letters, digits and a hyphen.

In institutions with a large number of users like universities it is quite common to name a computer after fruits, favourite places, greek letters, geographical regions, or musical instruments. For private networks there are no name conventions to be followed, and hostnames like “FamiliyPC”, “dads-tablet”, or “printer” can be found.

The computer’s hostname is set initially during the installation, and stored in the file “/etc/hostname”. The screenshot below is taken from the graphical setup of Debian GNU/Linux 9, and uses the label “debian95” as a hostname referring to the release of Debian GNU/Linux 9.5.

As soon as your computer starts several services are initialized. This also includes the network, and the hostname, which can be used to address the device from then onwards. Using the UNIX command “hostname” reveals its name as follows:

$ hostname
debian95
$

More information can be retrieved using the command hostnamectl as follows:

$ hostnamectl
Static hostname: debian95
Icon name: computer-laptop
Chassis: laptop
Machine ID: 7c61402c22bf4cf2a9fcb28a4210da0b
Boot ID: 6e8ca49158ff4bc4afaa26763f42793b
Operating System: Debian GNU/Linux 8 (jessie)
Kernel: Linux 3.16.0-4-amd64
Architecture: x86-64
$

The hostname plus domain name result in the fully qualified domain name (FQDN) [1] that is needed to identify a computer without fail. In order to get the FQDN of the device use the switch “-f” (short for “–fqdn” or “–long”), instead:

$ hostname -f
debian95.wunderwerk.net
$

Changing The Hostname

At first sight, changing the hostname (or renaming a computer) is comparably easy and takes a few minutes, only. It can be done in the following ways:

temporary change (valid until reboot) open a terminal window, change to user root, and invoke the command “hostname” followed by the new hostname:
# hostname cucumber
# hostname
cucumber
#
permanent change open the file “/etc/hostname” with a text editor as user “root”, change the hostname, and save the file
permanent change for users of systemd open a terminal window, change to user root, and invoke the command “hostnamectl” as follows:
# hostnamectl set-hostname cucumber

The picture below illustrates this step using “hostnamectl”.

Being aware of side-effects

Still, it is half of the story. The file “/etc/hostname” is not the only place in which programs on your computer store the hostname. Using the “grep” command we find out which other files are affected, and need to be adjusted. The command below shows this for the hostname “debian95”:

# grep –color -l -r debian95 /*
/boot/grub/grub.cfg
/etc/hostname
/etc/hosts
/etc/wicd/wired-settings.conf
/etc/wicd/wireless-settings.conf
/etc/mailname
/etc/exim4/update-exim4.conf.conf
/etc/initramfs-tools/conf.d/resume
/etc/ssh/ssh_host_rsa_key.pub
/etc/ssh/ssh_host_ed25519_key.pub
/etc/ssh/ssh_host_ecdsa_key.pub
/etc/ssh/ssh_host_dsa_key.pub
/etc/fstab
/home/debian/.ssh/id_rsa.pub
…
#

The file “/etc/hosts” is essential for networking, and needs to be adjusted. Change “debian95” to “cucumber” to have the following result:

$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 cucumber
# The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost
ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters $

Next, reload the network configuration as follows:

# invoke-rc.d hostname.sh start
# invoke-rc.d networking force-reload

In order to check your new network configuration you may ping your machine with the new hostname:

Et voila – it worked well. The final step is to check your applications according to the list above. The referring page in the Debian Wiki [2] gives you a good overview what to do with each application, and shall work as a reference guide for you.

Links and References

[1] FQDN, Wikipedia
[2] How To Change The Hostname, Debian Wiki

How to keep a Debian Network installation up-to-date

Frank Hofmann — Thu, 17 Jan 2019 10:40:42 +0000

The Linux distribution Debian GNU/Linux [1] is made available as different CD/DVD ISO images. These images are prepared to fit to the needs of different interests and usage cases — desktop environment, server, or mobile devices. At present, the following image variants are offered from the website of the Debian project and the according mirror network:

a full set of CD/DVD images that contains all the available packages[2]
a single CD/DVD image with a selection of packages that are tailor-made for a specific desktop environment — GNOME [3], XFCE [4], and for the commandline, only.
a smaller CD image for network-based installation [5]
a tiny CD image for network-based installation [5]
a live CD/DVD [6] in order to test Debian GNU/Linux before installing it
a cloud image [7]

Downloading the right image file depends on your internet connection (bandwidth), which combination of packages fits your needs, and your level of experience in order to setup and maintain your installation. All the images are available from the mirror network behind the website of the Debian project [8].

What is Debian Netinstall?

As already briefly discussed above a Netinstall image is a smaller CD/DVD image with a size between 150Mb and 300Mb. The actual image size depends on the processor architecture used on your system. Solely, the image contains the setup routines (called Debian Installer) for both text-only and graphical installation as well as the software packages in order to setup a very basic but working Debian GNU/Linux installation. In contrast, the tiny image with a size of about 120Mb contains the Debian Installer, and the network configuration, only.

During the setup, the Debian Installer will ask you which Apt repository you would like to use. An Apt repository is a place that provides the Debian software packages. The tools for package management will retrieve the selected software packages from this location, and install them locally on your system. In this case as an Apt repository we do not use the CD/DVD but a so-called package mirror. This package mirror is a server that is connected to the internet, and that is why internet access is required during the setting up of your system. Furthermore, installing new software or updating existing software packages needs to meet the same technical requirements as above — the packages are retrieved from the same Apt repository too.

Choosing the desired package mirror in Debian GNU/Linux 9

Apt Repositories

The address of the chosen Apt repository is stored in the file /etc/apt/sources.list. In general, this is a text file and contains several entries. According to the previously chosen package mirror it looks as follows:

deb http://ftp.us.debian.org/debian/ stretch main contrib
deb-src http://ftp.us.debian.org/debian/ stretch main contrib

deb http://security.debian.org/ stretch/updates main contrib
deb-src http://security.debian.org/ stretch/updates main contrib

# stretch-updates, previously known as 'volatile'
deb http://ftp.us.debian.org/debian/ stretch-updates main contrib

The first group of lines refers to regular software packages, the second group to the according security updates, and the third group to software updates for these packages. Each line refers to Debian packages (a line starting with deb), or Debian source packages (a line starting with deb-src). Source packages are of interest for you in case you would like to download the source code of the software you use.

The Debian GNU/Linux release is either specified by the alias name of the release — here it is Stretch from Toy Story [9] –, or its release state, for example stable, testing, or unstable. At the end of each line, main and contrib reflect the chosen package categories. The keyword main refers to free software, contrib refers to free software that depends on non-free software, and non-free indicates software packages that do not meet the Debian Free Software Guidelines (DFSG)[10].

Finding the right package mirror

Up until now our setup is based on static entries, only, that are not intended to change. This works well for computers that are kept mostly at the same place during their entire usage.

As of a Debian network installation, the right package mirror plays an important role. When choosing a package mirror take the following criteria into account:

your network connection
your geographic location
the desired availability of the package mirror
reliability

Experiences from managing Linux systems for the last decade show, that chosing a primary package mirror in the same country as the system works best. Such a package mirror should be network-wise nearby, and provide software packages for all the architectures we need. Reliability refers to the person, institute, or company that is responsible for the package mirror we retrieve software from.

A rather dynamic setup can be helpful for mobile devices such as laptops and notebooks. The two commands netselect [11] and netselect-apt [12] come into play. netselect simply expects a list of package mirrors, and validates them regarding availability, ping time as well as the packet loss between the package mirror and your system. The example below demonstrates this for five different mirrors. The last line of the output contains the result — the recommended package mirror is ftp.debian.org.

# netselect -vv ftp.debian.org http.us.debian.org ftp.at.debian.org download.unesp.br
ftp.debian.org.br netselect: unknown host ftp.debian.org.br
Running netselect to choose 1 out of 8 addresses.
...............................................................
128.61.240.89 141 ms 8 hops 88% ok ( 8/ 9) [ 284]
ftp.debian.org 41 ms 8 hops 100% ok (10/10) [ 73]
128.30.2.36 118 ms 19 hops 100% ok (10/10) [ 342]
64.50.233.100 112 ms 14 hops 66% ok ( 2/ 3) [ 403]
64.50.236.52 133 ms 15 hops 100% ok (10/10) [ 332]
ftp.at.debian.org 47 ms 13 hops 100% ok (10/10) [ 108]
download.unesp.br 314 ms 10 hops 75% ok ( 3/ 4) [ 836]
ftp.debian.org.br 9999 ms 30 hops 0% ok
73 ftp.debian.org
#

In contrast, netselect-apt uses netselect to find the best package mirror for your location. netselect-apt asks for the country (-c), the number of package mirrors (-t), the architecture (-a), and the release state (-n). The example below discovers the top-five package mirrors in France that offer stable packages for the amd64 architecture:

# netselect-apt -c france -t 5 -a amd64 -n stable
Using distribution stable.
Retrieving the list of mirrors from www.debian.org...

--2019-01-09 11:47:21-- http://www.debian.org/mirror/mirrors_full
Aufl√∂sen des Hostnamen ¬ªwww.debian.org (www.debian.org)¬´... 130.89.148.14,
5.153.231.4, 2001:41c8:1000:21::21:4, ...
Verbindungsaufbau zu www.debian.org (www.debian.org)|130.89.148.14|:80... verbunden.
HTTP-Anforderung gesendet, warte auf Antwort... 302 Found
Platz: https://www.debian.org/mirror/mirrors_full[folge]
--2019-01-09 11:47:22-- https://www.debian.org/mirror/mirrors_full
Verbindungsaufbau zu www.debian.org (www.debian.org)|130.89.148.14|:443... verbunden.
HTTP-Anforderung gesendet, warte auf Antwort... 200 OK
L√§nge: 189770 (185K) [text/html]
In ¬ª¬ª/tmp/netselect-apt.Kp2SNk¬´¬´ speichern.

/tmp/netselect-apt.Kp2SNk 100%[==========================================>]
185,32K 1,19MB/s in 0,2s

2019-01-09 11:47:22 (1,19 MB/s) - ¬ª¬ª/tmp/netselect-apt.Kp2SNk¬´¬´ gespeichert
[189770/189770

Choosing a main Debian mirror using netselect.
(will filter only for mirrors in country france)
netselect: 19 (19 active) nameserver request(s)...
Duplicate address 212.27.32.66 (http://debian.proxad.net/debian/,
http://ftp.fr.debian.org/debian/); keeping only under first name.
Running netselect to choose 5 out of 18 addresses.
.....................................................................................
............................................
The fastest 5 servers seem to be:

http://debian.proxad.net/debian/
http://debian.mirror.ate.info/
http://debian.mirrors.ovh.net/debian/
http://ftp.rezopole.net/debian/
http://mirror.plusserver.com/debian/debian/

Of the hosts tested we choose the fastest valid for HTTP:
http://debian.proxad.net/debian/

Writing sources.list.
Done.
#

The output is a file called sources.list that is stored in the directory you run the command from. Using the additional option “-o filename” you specify an output file with a name and path of your choice. Nevertheless, you can directly use the new file as a replacement for your original file /etc/apt/sources.list.

Software Strategy

Doing a setup from a smaller installation image gives you the opportunity to make decisions which software to use. We recommend to install what you need on your system, only. The less software packages are installed, the less updates have to be done. So far, this strategy works well for server, desktop systems, routers (specialized devices), and mobile devices.

Keeping your system up-to-date

Maintaining a system means taking care of your setup, and keeping it up-to-date. Install security patches and do software updates regularly, with the help of the package manager like apt.

Often the next step is forgotten — tidying up your system. This includes removing unused software packages, and cleaning the package cache that is located in /var/cache/apt/archives. In the first case the commands “apt autoremove”, “deborphan” [13] and “debfoster” [14] help — they detect unused packages, and let you specify which software shall be kept. Mostly, the removed packages belong to the categories library (lib and oldlib), or development (libdevel). The following example demonstrates this for the tool deborphan. The output columns represent the package size, the package category, the package name, and the package priority.

$ deborphan -Pzs
20 main/oldlibs mktemp extra
132 main/libs liblwres40 standard
172 main/libs libdvd0 optional
...
$

In order to remove the orphaned packages you can use the following command:

# apt remove $(deborphan)
...
#

Still, it will ask you to confirm before removal of the software packages. Next, cleaning the package cache needs to be done. You may either remove the files by “rm /var/cache/apt/archives/*.deb”), or use apt or apt-get as follows:

# apt-get clean

Dealing with Release Changes

In contrast to other Linux distributions, Debian GNU/Linux does not have a fixed release cycle. A new release is available about every two years. Version 10 is expected to be published in mid-2019.

Updating your existing setup is comparable easy. Take the following thoughts into account and follow these steps:

Read the documentation for the release change, the so-called Release Notes. They are available from the website of the Debian project, and also part of the image you have chosen before.
Have your credentials for administrative actions at hand.
Open a terminal, and run the next steps in a terminal multiplexer like screen [15] or tmux [16].
Backup the most important data of your system, and validate the backup for being complete.
Update your current package list using “apt-get update” or “apt update”.
Check your system for orphans and unused software packages using deborphan, or “apt-get autoremove”. Unused packages do not need to be updated.
Run the command “apt-get upgrade” to install the latest software updates.
Edit the file /etc/apt/sources.list, and set the new distribution name, for example from Stretch to Buster.
Update the package list using “apt update” or “apt-get update”.
Start the release change by running “apt-get dist-upgrade”. All existing packages are updated.

The last step may take a while, but leads to a new Debian GNU/Linux system. It might be helpful to reboot the system once in order to start with a new Linux kernel.

Conclusion

Setting up a network-based installation, and keeping it alive is simple. Follow the recommendations we gave you in this article, and using your Linux system will be fun.

Links and References

* [1] Debian GNU/Linux, http://debian.org/
* [2] Debian on CDs/DVDs, https://www.debian.org/CD/index.en.html
* [3] GNOME, https://www.gnome.org/
* [4] XFCE, https://xfce.org/
* [5] Installing Debian via the Internet, https://www.debian.org/distrib/netinst.en.html
* [6] Debian Live install images, https://www.debian.org/CD/live/index.en.html
* [7] Debian Official Cloud Images, https://cloud.debian.org/images/cloud/
* [8] Debian mirror network, https://cdimage.debian.org/
* [9] Stretch at the Pixar Wiki, http://pixar.wikia.com/wiki/Stretch
* [10] Debian Free Software Guidelines (DFSG), https://wiki.debian.org/DFSGLicenses
* [11] netselect Debian package, https://packages.debian.org/stretch/netselect
* [12] netselect-apt Debian package, https://packages.debian.org/stretch/netselect-apt
* [13] deborphan Debian package, https://packages.debian.org/stretch/deborphan
* [14] debfoster Debian package, https://packages.debian.org/stretch/debfoster
* [15] screen, https://www.gnu.org/software/screen/
* [16] tmux, https://github.com/tmux/tmux/wiki

Acknowledgements

The author would like to thank Axel Beckert and Zoleka Hatitongwe for their help and critical remarks while preparing this article.

Understanding Debian GNU/Linux Releases

Frank Hofmann — Thu, 10 Jan 2019 11:27:53 +0000

The universe of the Debian GNU/Linux distribution comes with its own odds and ends. In this article we explain what a release of Debian is, how it is named, and what are the basic criteria for a software package to become part of a regular release.

What is a Debian release?

Debian GNU/Linux is a non-commercial Linux distribution that was started in 1993 by Ian Murdock. Currently, it consists of about 51,000 software packages that are available for a variety of architectures such as Intel (both 32 and 64 bit), ARM, PowerPC, and others [2]. Debian GNU/Linux is maintained freely by a large number of contributors from all over the world. This includes software developers and package maintainers – a single person or a group of people that takes care of a package as a whole [3].

A Debian release is a collection of stable software packages that follow the Debian Free Software Guidelines (DFSG) [4]. These packages are well-tested and fit together in such a way that all the dependencies between the packages are met and you can install und use the software without problems. This results in a reliable operating system needed for your every-day work. Originally targeted for server systems it has no more a specific target (“The Universal OS”) and is widely used on desktop systems as well as mobile devices, nowadays.

In contrast to other Linux distributions like Ubuntu or Linux Mint, the Debian GNU/Linux distribution does not have a release cycle with fixed dates. It rather follows the slogan “Release only when everything is ready” [1]. Nethertheless, a major release comes out about every two years [8]. For example, version 9 came out in 2017, and version 10 is expected to be available in mid-2019. Security updates for Debian stable releases are provided as soon as possible from a dedicated APT repository. Additionally, minor stable releases are published in between, and contain important non-security bug fixes as well as minor security updates. Both the general selection and the major version number of software packages do not change within a release.

In order to see which version of Debian GNU/Linux you are running on your system have a look at the file /etc/debian_version as follows:

$ cat /etc/debian_version
9.6
$

This shows that the command was run on Debian GNU/Linux 9.6. Having installed the package “lsb-release” [14], you can get more detailed information by running the command “lsb_release -a”:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 9.6 (stretch)
Release: 9.6
Codename: stretch
$

What about these funny release names?

This shows that the command was run on Debian GNU/Linux 9.6. Having installed the package “lsb-release” [14], you can get more detailed information by running the command “lsb_release -a”:

You may have noted that for every Debian GNU/Linux release there is a funny release name. This is called an alias name which is taken from a character of the film series Toy Story [5] released by Pixar [6]. When the first Debian 1.x release was due, the Debian Project Leader back then, Bruce Perens, worked for Pixar [9]. Up to now the following names have been used for releases:

Debian 1.0 was never published officially, because a CD vendor shipped a development version accidentially labeled as “1.0” [10], so Debian and the CD vendor jointly announced that “this release was screwed” and Debian released version 1.1 about half a year later, instead.
Debian 1.1 Buzz (17 June 1996) – named after Buzz Lightyear, the astronaut
Debian 1.2 Rex (12 December 1996) – named after Rex the plastic dinosaur
Debian 1.3 Bo (5 June 1997) – named after Bo Peep the shepherd
Debian 2.0 Hamm (24 July 1998) – named after Hamm the piggy bank
Debian 2.1 Slink (9 March 1999) – named after the dog Slinky Dog
Debian 2.2 Potato (15 August 2000) – named after the puppet Mr Potato Head
Debian 3.0 Woody (19 July 2002) – named after the cowboy Woody Pride who is the main character of the Toy Story film series
Debian 3.1 Sarge (6 June 2005) – named after the Seargeant of the green plastic soldiers
Debian 4.0 Etch (8 April 2007) – named after the writing board Etch-A-Sketch
Debian 5.0 Lenny (14 February 2009) – named after the pull-out binocular
Debian 6.0 Squeeze (6 February 2011) – named after the green three-eyed aliens
Debian 7 Wheezy (4 May 2013) – named after Wheezy the penguin with the red bow tie
Debian 8 Jessie (25 April 2015) – named after the cowgirl Jessica Jane “Jessie” Pride
Debian 9 Stretch (17 June 2017) – named after the lila octopus
Debian 10 Buster (no release date known so far) – named after the puppy dog from Toy Story 2

As of the beginning of 2019, the release names for two future releases are also already known [8]:

Debian 11 Bullseye – named after Bullseye, the horse of Woody Pride
Debian 12 Bookworm – named after Bookworm, the intelligent worm toy with a built-in flashlight from Toy Story 3.

Relation between alias name and development state

New or updated software packages are uploaded to the unstable branch, first. After some days a package migrates to the testing branch if it fulfills a number of criterias. This later becomes the basis for the next stable release. The release of a distribution contains stable packages, only, that are actually a snapshot of the current testing branch.

At the same moment as a new release is out the so-far stable release becomes oldstable, and an oldstable release becomes the oldoldstable release. The packages of any end-of-life release get removed from the normal APT repositories and mirrors, and are transferred to the Debian Archive [11], and are no longer maintained. Debian is currently developing a site to search through archived packages at Historical Packages Search [12]. This site is though still under development and known to be not yet fully functional.

As with the other releases, the unstable branch has the alias name Sid which is short for “still in development”. In Toy Story, Sid is the name of the evil neighbours child who always damages the toys. The name Sid accurately describes the condition of a package in the unstable branch.

Additionally, there is also the “experimental” branch which is not a complete distribution but an add-on repository for Debian Unstable. This branch contains packages which do not yet fulfill the quality expectations of Debian unstable. Furthermore, packages are placed there in order to prepare library transitions so that packages from Debian unstable can be checked for build issues with a new version of a library without breaking Debian unstable.

The exprimental branch of Debian also has a Toy Story name – “RC-Buggy”. On the one hand this is Andy’s remote-controlled car, and on the other hand it abbreviates the description “contains release-critical bugs” [13].

Parts of the Debian GNU/Linux Distribution

Debian software packages are categorized by their license as follows:

main: entirely free
contrib: entirely free but the packages depend on non-free packages
non-free: free software that does not conform to the Debian Free Software Guidelines (DFSG)

An official release of Debian GNU/Linux consists of packages from the main branch, only. The packages classified under contrib and non-free are not part of the release, and seen as additions that are just made available to you. Which packages you use on your system is defined in the file /etc/apt/sources.list as follows:

$ cat /etc/apt/sources.list deb
http://ftp.us.debian.org/debian/
stretch main contrib non-free
deb http://security.debian.org/
stretch/updates main contrib
non-free

# stretch-updates, previously
known as ‘volatile’ deb
http://ftp.us.debian.org/debian/
stretch-updates main contrib
non-free

# stretch-backports deb
http://ftp.debian.org/debian
stretch-backports main contrib
non-free

Debian Backports

From the listing above you may have noted the entry titled stretch-backports. This entry refers to software packages that are ported back from Debian testing to the current Debian stable release. The reason for this package repository is that the release cycle of a stable release of Debian GNU/Linux can be quite long, and sometimes a newer version of a software is required for a specific machine. Debian Backports [7] allows you to use packages from future releases in your current setup. Be aware that these packages might not be on par with the quality of Debian stable packages. Also, take into account that there might be the need to switch to a newer upstream release every once in a while even during a stable release cycle, as these packages follow Debian testing, which is a kind of a rolling release (similar to Debian unstable).Debian Backports

Debian Network Interface Setup

Frank Hofmann — Wed, 02 Jan 2019 15:51:17 +0000

The knowledge regarding the setup of a network interface in Debian GNU/Linux and Debian-related distributions is essential for every Linux engineer. In this article we explain to you where to find the appropriate information, and how to set it up for IPv4 IPv4 [2] and IPv6 [3]. The number of options is quite long but gives you a lot of flexibility for your specific situation.

Debian Network setup

The entire configuration for the network interfaces is stored in plain text files in a single directory named /etc/network. This directory contains a number of files and subdirectories to cover both the setup for IPv4 and IPv6.

interfaces and interfaces.d: general configuration per interface
if-down.d: scripts that are run in case the interface goes down
if-post-down.d: scripts that are run after the interface goes down
if-up.d: scripts that are run if the interface goes up
if-pre-up.d: scripts that are run before the interface goes up

The specific configuration is done per network interface. You can store all of it in the single file named interfaces, or as separate files in the directory interfaces.d. A typical IPv4 configuration from a portable device is shown below. It consists of one loopback interface (/dev/lo), an ethernet interface (/dev/eth0), and a wireless interface (/dev/wlan0). Line 1 refers to include all the scripts that are stored in the directory /etc/network/interfaces.d/. The lines 3 to 5 configure /dev/lo, lines 7 to 9 /dev/eth0, and line 11 the interface /dev/wlan0. A detailed explanation for the single commands is given below.

1 source /etc/network/interfaces.d/*
2
3 # The loopback network interface
4 auto lo
5 iface lo inet loopback
6
7 # The primary network interface
8 allow-hotplug eth0
9 iface eth0 inet dhcp
10
11 iface wlan0 inet dhcp

For other Debian GNU/Linux releases or distributions based on it the file “interfaces” may look similar but with different names for the network devices. As of Debian 9 “Stretch” the old network names like /dev/eth0, /dev/eth1 and /dev/wlan0 have gone away as the device name can change. The new names are similar to these ones — /dev/enp6s0, /dev/enp8s0, /dev/enp0s31f6, and /dev/enp5s0 [1]. For the network interfaces available have a look at the file “/sys/class/net” — in our case the interfaces are named /dev/lo and /dev/enp0s3.

The list of available network interfaces:

The configuration for these interfaces looks as follows. The image below is taken from a Debian GNU/Linux 9.5.’

The basic network configuration on a Debian GNU/Linux 9.5:

As the next step we will have a look at the single statements to configure a desired interface.

Debian Network Configuration in detail

Automatic enabling of an interface on startup

At startup of your system the setup scripts go through the configuration files for the network interfaces. In order to automatically enable an interface add the keyword “auto” (short for “allow-auto”) followed by the logical name of the interface(s). The setup scripts will call the command “ifup -a” (short for “–all”) that will activate the mentioned interfaces. The following line will bring up the loopback interface /dev/lo, only:

auto lo

The network interfaces are brought up in the order they are listed. The following line brings up /dev/lo followed by /dev/wlan0, and /dev/eth0, eventually.

auto lo wlan0 eth0

Activate an interface if the network cable is plugged in

The keyword “allow-hotplug” leads to a event based on a physical connection. The named network interface is activated as soon as the network cable is plugged in, and deactivated as soon as the the network cable is unplugged. The next line demonstrates this for the Ethernet interface /dev/eth0 (similar to line 8 of listing 1).

allow-hotplug eth0

Static interface configuration

In order to communicate with other computers in a network an interface is assigned an IP address. This address is obtained either dynamically (via DHCP) or set in a fixed way (static configuration). Therefore, the declaration of the interface starts with the keyword “iface” followed by the logical name of the network interface, the connection type, and the method used to obtain the IP address. The next example shows this for the network interface /dev/eth0 with the static IPv4 address 192.168.1.5.

iface eth0 inet static
address 192.168.1.5
netmask 255.255.255.0
gateway 192.168.1.1

After the interface declaration you are invited to specify a number of options (option name in brackets). This includes values such as the IP address (address), the netmask (netmask), the broadcast range (broadcast), the routing metric for the default gateway (metric), the default gateway (gateway), the address of the other end point (pointtopoint), the link local address (hwaddress), the packet size (mtu) as well as the address validity scope (scope). The next example shows the configuration for IPv6 for the network interface /dev/enp0s3 [4].

iface enp0s3 inet6 static
address fd4e:a32c:3873:9e59:0004::254
netmask 80
gateway fd4e:a32c:3873:9e59:0004::1

Dynamic interface configuration via DHCP

Connecting to different networks requires flexibility. The Dynamic Host Control Protocol (DHCP) [5] makes this flexibility possible and the network scripts assign the IP address to the network interface that is handed over from the DHCP server. The following line demonstrates this for the wlan interface named /dev/wlan0:

iface wlan0 inet dhcp

#For IPv6 use this line, instead:
iface wlan0 inet6 dhcp

Similar to the static configuration from above a number of options are possible to be set. These options depend on your DHCP setup. Among others the list includes the hostname to be requested (hostname), the metric for added routes (metric), the preferred lease time in hours or seconds (leasehours, leasetime), the client identifier (client), or the hardware address (hwaddress).

Other options

The configuration file /etc/interfaces also allows setups for the Bootstrap Protocol (BOOTP) [6] (bootp), PPP (ppp) as well as IPX [7].

Showing the interface configuration

Up to the release 8 of Debian GNU/Linux use the command “/sbin/ifconfig” to display the interface configuration. See the configuration for the first ethernet interface below.

Interface configuration using ifconfig:

From the release 9 onwards, the command “ifconfig” is no longer preinstalled, and replaced by its predecessor “ip”. Use the command “ip addr show”, instead.

Interface configuration using ip:

Enabling and disabling an interface

As already described above the option “auto” enables an interface on startup, automatically. There are two commands to enable and disable an interface, manually. Up to Debian 8, use “ifconfig eth0 up” or “ifup eth0” to enable the interface. From Debian 9, use “ifup eth0”, only. The counterparts are “ifconfig eth0 down” and “ifdown eth0”. The image below shows the default output when enabling an interface.

Interface activation using ifup:

Adding further options

It is possible to add further action in case an interface is activated or deactivated. These scripts are called if-pre-up and if-post-down scripts and come into play before enabling and after disabling an interface.

The next example demonstrates this in combination with a firewall that is active in case the interface is active, too. In line 3 the script /usr/local/sbin/firewall-enable.sh is called before the interface is activated (hence the tag “pre-up”, and in line 4 the script “/usr/local/sbin/firewall-disable.sh” is called after the interface is deactivated.

1 allow-hotplug eth0
2 iface         eth0 inet dhcp
3               pre-up    /usr/local/sbin/firewall-enable.sh
4               post-down /usr/local/sbin/firewall-disable.sh

Conclusion

The basic configuration of network interfaces in Debian GNU/Linux is comparable easy — a few lines of code, and it is done. For more information regarding additional options you may have a look at the resources given below.

Links and References

[1] Debian Wiki, Network Configuration
[2] IPv4, Wikipedia
[3] IPv6, Wikipedia
[4] Debian Static Ip IPv4 and IPv6
[5] Dynamic Host Control Protocol (DHCP), Wikipedia
[6] Bootstrap Protocol (BOOTP), Wikipedia
[7] Internetwork Packet Exchange (IPX), Wikipedia

Thanks

The author would like to thank Axel Beckert for his help and critical comments while preparing this article.

Debian Package Dependencies

Frank Hofmann — Mon, 10 Dec 2018 12:23:00 +0000

For Linux distributions such as Debian GNU/Linux, there exist more than 60.000 different software packages. All of them have a specific role. In this article we explain how does the package management reliably manage this huge number of software packages during an installation, an update, or a removal in order to keep your system working and entirely stable.

For Debian GNU/Linux, this refers to the tools apt, apt-get, aptitude, apt-cache, apt-depends, apt-rdepends, dpkg-deb and apt-mark.

Availability of software packages

As already said above, a Linux distribution consists of tons of different software packages. As of today software is quite complex, and that’s why it is common to divide software into several single packages. These packages can be categorized by functionality or by role such as binary packages, libraries, documentation, usage examples as well as language-specific collections and provide a selected part of the software, only. There is no fixed rule for it, and the division is made by either the development team of a tool, or the package maintainer who takes care of the software package for your Linux distribution. Using aptitude, Figure 1 lists the packages that contain the translations for the different languages for the webbrowser Mozilla Firefox.

Figure 1: aptitude-firefox.png

This way of working makes it possible that each package can be maintained by a different developer or as an entire team. Furthermore, the division into single components allows other software packages to make use of it for their own purposes too. A required functionality can be applied and does not need to be reinvented.

Package Organization

The package management tools on the Debian GNU/Linux distribution take constantly care that the dependencies of the installed packages are met completely. This is especially the case if a software package is meant to be installed, updated, or deleted on or from your system. Missing packages are added to the system, or installed packages are removed from the system in case they are no longer required. Figure 2 demonstrates this for the removal of the package ‘mc-data’ using ‘apt-get’. The package ‘mc-data’ recommends to automatically remove the package ‘mc’, too, because it does not make sense any more to be installed without ‘mc-data’.

Figure 2: apt-get-remove-mc.png

Package marks and flags

During its work the package management tools respect the package flags and marks that are set. They are either set automatically, or set manually by the system administrator. Especially this behaviour refers to the flag ‘essential package’ that is set for packages that should not be removed. A clear warning is issued before you do that (see Figure 3).

Figure 3: apt-get-remove.png

Also, the three marks ‘automatic’, ‘manual’ and ‘hold’ are taken into account. They mark a package as being automatically installed, manually installed, or must not be updated (hold the current version). A software package is either marked ‘automatic’ or ‘manual’ but not both.

Among others, the command ‘apt-mark’ handles the marks and flags using the following subcommands:

auto: set a package as automatically installed
hold: hold the current version of the package
manual: set a package as manually installed
showauto: show the automatically installed packages
showmanual: show the manually installed packages
showhold: list the packages that are on hold
unhold: remove the hold flag for the given package

In order to list all the manually installed packages issue this command:

$ apt-mark showmanual
abiword
abs-guide
ack-grep
acl
acpi
…
$

In order to hold a package version use the subcommand ‘hold’. The example below shows this for the package ‘mc’.

# apt-mark hold mc
mc set on hold
#

The subcommand ‘showhold’ lists the packages that are on hold (in our case it is the package ‘mc’, only):

# apt-mark showhold
mc
#

Using an alternative method titled ‘apt pinning’, packages are classified by priorities. Apt applies them in order to decide how to handle this software package and the versions that are available from the software repository.

Package description

Every software package comes with its own package description that is standardized. Among other fields this description explicitly specifies which further package(s) it depends on. Distribution-specific tools extract this information from the package description, and compute and visualize the dependencies for you, then. The next example uses the command ‘apt-cache show’ in order to display the package description of the package ‘poppler-utils’ (see Figure 4).

Figure 4: package-description-poppler-utils.png

The package description contains a section called ‘Depends’. This section lists the other software packages plus version number that the current package depends on. In Figure 4 this section is framed in red and shows that ‘poppler-utils’ depends on the packages ‘libpoppler64’, ‘libc6’, ‘libcairo2’, ‘libfreetype6’, ‘liblcms2-2’, ‘libstdc++6’ and ‘zlib1g’.

Show the package dependencies

Reading the package description is the hard way to figure out the package dependencies. Next, we will show you how to simplify this.

There are several ways to show the package dependencies on the command line. For a deb package as a local file use the command ‘dpkg-deb’ with two parameters – the file name of the package, and the keyword ‘Depends’. The example below demonstrates this for the package ‘skypeforlinux-64.deb’:

$ dpkg-deb -f Downloads/skypeforlinux-64.deb Depends
gconf-service, libasound2 (>= 1.0.16), libatk1.0-0 (>= 1.12.4), libc6 (>= 2.17),
libcairo2 (>= 1.2.4), libcups2 (>= 1.4.0), libexpat1 (>= 2.0.1),
libfreetype6 (>= 2.4.2), libgcc1 (>= 1:4.1.1), libgconf-2-4 (>= 3.2.5),
libgdk-pixbuf2.0-0 (>= 2.22.0), libglib2.0-0 (>= 2.31.8), libgtk2.0-0 (>= 2.24.0),
libnspr4 (>= 2:4.9-2~), libnss3 (>= 2:3.13.4-2~), libpango-1.0-0 (>= 1.14.0),
libpangocairo-1.0-0 (>= 1.14.0), libsecret-1-0 (>= 0.7), libv4l-0 (>= 0.5.0),
libx11-6 (>= 2:1.4.99.1), libx11-xcb1, libxcb1 (>= 1.6), libxcomposite1 (>= 1:0.3-1),
libxcursor1 (>> 1.1.2), libxdamage1 (>= 1:1.1), libxext6, libxfixes3,
libxi6 (>= 2:1.2.99.4), libxrandr2 (>= 2:1.2.99.3), libxrender1, libxss1,
libxtst6, apt-transport-https, libfontconfig1 (>= 2.11.0), libdbus-1-3 (>= 1.6.18),
libstdc++6 (>= 4.8.1)
$

In order to do the same for an installed package use ‘apt-cache’. The first example combines the subcommand ‘show’ followed by the name of the package. The output is sent to the ‘grep’ command that filters the line ‘Depends’:

$ apt-cache show xpdf | grep Depends
Depends: libc6 (>= 2.4), libgcc1 (>= 1:4.1.1), libpoppler46 (>= 0.26.2),
libstdc++6 (>= 4.1.1), libx11-6, libxm4 (>= 2.3.4), libxt6
$

The command ‘grep-status -F package -s Depends xpdf’ will report the same information.

More specific, the second example again uses ‘apt-cache’ but with the subcommand ‘depends’, instead. The subcommand is followed by the name of the package:

$ apt-cache depends xpdf
xpdf
Depends: libc6
Depends: libgcc1
Depends: libpoppler46
Depends: libstdc++6
Depends: libx11-6
Depends: libxm4
Depends: libxt6
Recommends: poppler-utils
poppler-utils:i386
Recommends: poppler-data
Recommends: gsfonts-x11
Recommends: cups-bsd
cups-bsd:i386
Collides with:
Collides with:
Collides with:
Collides with:
Replaces:
Replaces:
Replaces:
Replaces:
Collides with: xpdf:i386
$

The list above is quite long, and can be shortened using the switch ‘-i’ (short for ‘–important’):

$ apt-cache depends -i xpdf
xpdf
Depends: libc6
Depends: libgcc1
Depends: libpoppler46
Depends: libstdc++6
Depends: libx11-6
Depends: libxm4
Depends: libxt6
$

The command ‘apt-rdepends’ does the same but with version information if specified in the description:

$ apt-rdepends xpdf
Reading package lists… Done
Building dependency tree
Reading state information… Done
xpdf
Depends: libc6 (>= 2.4)
Depends: libgcc1 (>= 1:4.1.1)
Depends: libpoppler46 (>= 0.26.2)
Depends: libstdc++6 (>= 4.1.1)
Depends: libx11-6
Depends: libxm4 (>= 2.3.4)
Depends: libxt6
libc6
Depends: libgcc1
…
$

The command ‘aptitude’ works with switches, too. For dependencies, use the switch ‘~R’ followed by the name of the package. Figure 5 shows this for the package ‘xpdf’. The letter ‘A’ in the second column of the output of ‘aptitude’ identifies the package as being automatically installed.

Figure 5: aptitude-rdepends.png

Package dependencies can be a bit tricky. It may help to show package dependencies graphically. Use the command ‘debtree’ followed by the name of the package in order to create a graphical representation of the package dependencies. The tool ‘dot’ from the Graphviz package transforms the description into an image as follows:

$ debtree xpdf | dot -Tpng > graph.png

In Figure 6 you see the created PNG image that contains the dependency graph.

Figure 6: dot.png

Show the reverse dependencies

Up to now we displayed we have answered the question which packages are required for a package. There is also the other way round – so-called reverse dependencies. The next examples deal with the package as well as the packages that depend on it. Example number one uses ‘apt-cache’ with the subcommand ‘rdepends’ as follows:

Packages, that depend on other packages are marked with a pipe symbol. These package do not need to be installed on your system but have to be listed in package database.

The next example uses ‘aptitude’ to list the packages that have a hard reference to the package ‘xpdf’ (see Figure 7).

Figure 7: aptitude-search.png

Validate the installation for missing packages

‘Apt-get’ offers the subcommand ‘check’ that allows to validate the installation. If you see the following output no packages are missing:

# apt-get check
Reading package lists… Done
Building dependency tree
Reading state information… Done
#

Conclusion

Finding package dependencies works well with the right tools. Using them properly helps you to understand why packages are installed, and which ones might be missing.

Links and References

Axel Beckert, Frank Hofmann: Das Debian-Paketmanagement-Buch, https://www.dpmb.org/

Understanding vm.swappiness

Frank Hofmann — Tue, 07 Aug 2018 05:01:42 +0000

The Linux kernel is a rather complex piece of software with a long list of components such as modules, interfaces and configuration files [1]. These components can be configured with specific values in order to achieve a desired behaviour or mode of operation of the component [2,3,4]. Subsequently, this setup directly influences both the behaviour and the performance of your Linux system as a whole.

The current values of the Linux kernel and its components are made accessible using a special interface — the /proc directory [5]. This is a virtual file system in which the single files are filled with values in real time. The values represent the actual state the Linux kernel is in. You can access the individual files in the /proc directory using the cat command as follows:

$ cat /proc/sys/net/core/somaxconn
128
$

One of these kernel parameters is called vm.swappiness. It “controls the relative weight given to swapping out of runtime memory, as opposed to dropping memory pages from the system page cache” [6]. Starting with Linux kernel releases 2.6 this value was introduced. It is stored in the file /proc/sys/vm/swappiness .

Using Swap

The use of swap [6] was an essential part of using smaller UNIX machines in the early 1990s. It is still useful (like having a spare tire in your vehicle) when nasty memory leaks interfere with your work. The machine will slow down but in most cases will still be usable to finish its assigned task. Free software developers have been making great strides to reduce and eliminate program errors so before changing kernel parameters consider updating to a newer version of your application and related libraries first.

If you run numerous tasks, then the inactive tasks will be swapped out to disk, making better use of memory with your active tasks. Video editing and other large memory consuming applications often have recommended amounts of memory and disk space. If you have an older machine which cannot have a memory upgrade, then making more swap available might be a good temporary solution for you (see [6] on how to learn more about that).

The swapping can happen on a separate partition or on a swap file. The partition is faster and favored by many database applications. The file approach is more flexible (see the dphys-swapfile package in Debian GNU/Linux [7]). Having more than one physical device for swapping allows the Linux kernel to choose the device that is most rapidly available (lower latency).

vm.swappiness

The default value of vm.swappiness is 60 and represents the percentage of the free memory before activating swap. The lower the value, the less swapping is used and the more memory pages are kept in physical memory.

The value of 60 is a compromise that works well for modern desktop systems. A smaller value is a recommended option for a server system, instead. As the Red Hat Performance Tuning manual points out [8], a smaller swappiness value is recommended for database workloads. For example, for Oracle databases, Red Hat recommends a swappiness value of 10. In contrast, for MariaDB databases, it is recommended to set swappiness to a value of 1 [9].

Changing the value directly influences the performance of the Linux system. These values are defined:

* 0: swap is disable
* 1: minimum amount of swapping without disabling it entirely
* 10: recommended value to improve performance when sufficient memory exists in a system
* 100: aggressive swapping

As shown above the cat command helps to read the value. Also, the sysctl command gives you the same result:

# sysctl vm.swappiness
vm.swappiness = 60
#

Keep in mind that the sysctl command is only available to an administrative user. To set the value temporarily set the value in the /proc file system as follows:

# echo 10 > /proc/sys/vm/swappiness

As an alternative you may use the sysctl command as follows:

# sysctl -w vm.swappiness=10

To set the value permanently, open the file /etc/sysctl.conf as an administrative user and add the following line:

vm.swappiness = 10

Conclusion

More and more linux users are running virtual machines. Each one has its own kernel in addition to the hypervisor that actually controls the hardware. Virtual machines have virtual disks created for them, so changing the setting inside the virtual machine will have indeterminate results. Experiment first with changing the values of the hypervisor kernel, as it actually controls the hardware in your machine.

For older machines which can no longer be upgraded (already have maximum supported memory) you can consider placing a small solid state disk in the machine to use it as an additional swap device. This will obviously become a consumable as memory cells fail from lots of writes, but can extend the life of a machine for a year or more for very low cost. The lower latency and quick reads will give much better performance than swapping to an ordinary disk, giving intermediate results to RAM. This should allow you to use somewhat lower vm.swappiness values for optimal performance. You will have to experiment. SSD devices are changing rapidly.

If you have more than one swap device, consider making it a RAID device to stripe data across the available devices.

You can make changes in swappiness without rebooting the machine, a major advantage over other operating systems.

Try to include only the services you need for your business. This will reduce memory requirements, improve performance and keep everything simpler.

A final note: You will be adding load to your swap devices. You will want to monitor the temperatures of them. An overheated system will lower its CPU frequency and slow down.

Acknowledgements

The author would like to say a special thanks to Gerold Rupprecht and Zoleka Hatitongwe for their critical remarks and comments while preparing this article.

Links and References

* [1] Linux Kernel Tutorial for Beginners, https://linuxhint.com/linux-kernel-tutorial-beginners/

* [2] Derek Molloy: Writing a Linux Kernel Module — Part 1: Introduction, http://derekmolloy.ie/writing-a-linux-kernel-module-part-1-introduction/

* [3] Derek Molloy: Writing a Linux Kernel Module — Part 2: A Character Device, http://derekmolloy.ie/writing-a-linux-kernel-module-part-2-a-character-device/

* [4] Derek Molloy: Writing a Linux Kernel Module — Part 3: Buttons and LEDs, http://derekmolloy.ie/kernel-gpio-programming-buttons-and-leds/

* [5] Frank Hofmann: Commands to Manage Linux Memory, https://linuxhint.com/commands-to-manage-linux-memory/

* [6] Frank Hofmann: Linux Kernel Memory Management: Swap Space, https://linuxhint.com/linux-memory-management-swap-space/

* [7] dphys-swapfile package for Debian GNU/Linux, https://packages.debian.org/stretch/dphys-swapfile

* [8] Red Hat Performance Tuning Guide, https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables

* [9] Configuring MariaDB, https://mariadb.com/kb/en/library/configuring-swappiness/

Introduction to Haroopad

Frank Hofmann — Sun, 08 Apr 2018 05:27:46 +0000

In one of our previous blog articles we have already given you an introduction to Markdown — an easy-to-write, clever and very flexible document description language. Markdown allows you to generate HTML documents as well as to maintain technical documentation, blog articles, and presentations. Furthermore, we talked about writing Markdown documents using the text editors PileMD and EME. In this article we focus on Haroopad which claims to be the next document processor for the Markdown language licensed under GPLv3.As long-term writers we have clearly figured out which tools help us in order to be most productive to create text documents — either working on the command-line, or using a graphical user interface (GUI). As an example, Pandoc, Asciidoc and Asciidoctor are command-line tools to transform Markdown documents into HTML files whereas PileMd, Vim-gtk, Atom and Haroopad follow an approach based on a GUI, instead. Figure 1 shows how Haroopad looks like — a dual-panel approach with the source code of the document on the left, and the document translated into HTML on the right side.

Figure 1

Installation and setup

Haroopad aims to give you the same experiences in editing regardless of the platform you are working on. Developed by the Korean programmer Rhio Kim, Haroopad is available from the project website for Microsoft Windows, Mac OS X, and Linux as binary packages for 32 and 64 bit systems. For this article we have tested the package for Debian GNU/Linux 9 (64 bit) and downloaded the according deb package.

To install the Haroopad package on your machine use the following command (as user root or via sudo command):

$ dpkg -i haroopad-v0.13.1-x64.deb

In our test environment just a single software package was missing — the GNOME configuration library named libgconf-2-4. Use either apt, apt-get or aptitude to install the missing package:

$ apt-get install libgconf-2-4

Haroopad itself is based on NodeJS/webkit and is fully documented online. The Haroopad binary package does not contain a manual page, nor has Haroopad help options available as it is common for UNIX/Linux programs like –help. In order to have a look at both the source code and the documentation you will also have to download the corresponding package from GitHub.

Once you have completed the installation you can either start Haroopad by selecting the entry from the Development section of the software menu on your Linux desktop accordingly or by using the following command in a terminal:

$ haroopad

Similar to figure 1 the Haroopad window opens and allows you to edit a new document right away. As already explained above the left panel contains the edit window (the Markdown source code of the document) and the right panel contains its translation that is synchronised with the source code of the document as soon as you have changed it. Above the panels you will find a menu with common items to open and close files, search for text by pattern, insert specific Markdown elements and adjust the way the Haroopad GUI looks like.

The bottom line of the Haroopad window (see Figure 2) contains several items that range from a help window to statistical information, donation buttons, publishing directly on various social media channels and display options. The spaces button allows you to adjust the tab width of the editor window and the column button switches between the way the text is displayed in the output document — as single, double, or three columns. The wheel at the right end allows you to toggle between a normal and a full-screen display.

Figure 2

Exporting documents

Once you are done with your document Haroopad offers to store it in different formats such as an email, as raw HTML, and HTML combined with CSS. The current version failed to export but the menu entry “File” -> “Save as” worked, and created a HTML/CSS page (see figure 3).

figure 3

Haroopad Experiences

What we like about Haroopad is that its complexity is made available in a very simple user interface combined with the What You See Is What You Get approach (WYSIWYG). Writing Markdown feels easy anyway but Haroopad simplifies it even a bit more. This includes auto-completion of lists as well as pre-defined text modules for inline code, text emphasis, links and blockquotes. Also, there is support for several Markdown dialects that are used in GitHub for example. If desired you can enable keybindings for Vi/Vim. Figure 4 shows the according Insert menu.

Figure 4

Haroopad is very customizable in terms of themes for the GUI as well as the general layout, the font size, text indentations and automated corrections if needed. Figure 5 shows the preferences dialog. You can extend the list of available themes by adding your own CSS-based layouts.

Figure 5

Haroopad can be used for scientific documents as well. Using the JavaScript engine MathJax, mathematical equations can be exported to the browser. Also, LaTeX output is supported.

On the downside of Haroopad is that some parts of the software package need further improvements and that the official documentation is in Korean. So it may be a bit hard for non-Korean speakers to find their way around Haroopad. Step by step the translation into English is done.

Also, the default theme is quite dark and makes it a bit complicated to read the source code of the document. A lighter theme could be an option, and improve the usability.

Conclusion

Haroopad simplifies your life a lot. It is quite stable and it is fun to use. Haroopad is under constant development for the given platforms. It is a powerful competitor of Atom, Remarkable and ReText. We are excited to see it grow. Well done!

Acknowledgements

The author would like to thank Mandy Neumeyer for her support while preparing this article.