Parsing linux command results and lessons learned

Parsing linux command results and lessons learned

Over the last two years of my career, I've spent most of my time developing in-house software for my company's own use. Most of it involves the management of our own systems infrasctructure, with servers all around the world and the hardware attached to them. I developed tools that ssh into servers to get configuration settings, hardware serial numbers, record logs and other statistics in a DB, etc. Different pieces of hardware are used for different projects, in the case of one project coming to an end and its hardware being decommissioned, that hardware is usually stored in a cupboard of a data centre until it may be used again. One example of the uses of all this info we record is to help us track our assets and locate the necessary hardware when a new project is comissioned. For this reason, I've been using more linux commands and parsing information more often than I ever did.

I never liked the command line (I know, I'm a weird computer scientist), however, I've come to learn and practice many useful linux commands like head, tail, grep, cat, and pipe their results onto others.

Recently, I had to code a script to parse log files and input their content into our DB, email reports and automatically manage changes in different git repositories. In the process I learnt a couple things that I deem very useful for the future:

The SED command

The sed command in UNIX  is a stream editor and it is used to perform basic text transformations on a file like, searching, find and replace, regular expression matching, etc. Moreover, we can use the input from a pipeline in order to modify the stream without the need for a file. By using sed, you can edit files even without opening them and works by making only one pass over the input, being more efficient than other editors.

Example - Replacing or substituting string : Sed command is mostly used to replace the text in a file. Imagine we have an SQL dump file we need to change the DB schema to which the tables belong.

-- ...

INSERT INTO sessions.viewlog (user_id, viewed_user_id, app_id, time) VALUES (113, 65061, 4, '2020-06-04 11:55:44+01');
INSERT INTO sessions.viewlog (user_id, viewed_user_id, app_id, time) VALUES (2599, 60587, 8, '2020-06-04 11:56:29+01');
INSERT INTO sessions.viewlog (user_id, viewed_user_id, app_id, time) VALUES (32605, 65303, 2, '2020-06-04 11:59:21+01');
INSERT INTO sessions.viewlog (user_id, viewed_user_id, app_id, time) VALUES (118, 67628, 1, '2020-07-08 23:10:54+01');
INSERT INTO sessions.viewlog (user_id, viewed_user_id, app_id, time) VALUES (607, 65303, 2, '2020-06-04 11:59:56+01');

-- ...
usersDump.sql file content

The below sed command would remove the replaces the word "sessions" with "users" in the file.

sed -i -e 's/^INSERT INTO sessions\./INSERT INTO users\./' usersDump.sql 

This would transform every insert statement to be made in the "users" schema, rather than the "sessions" one.

-- ...

INSERT INTO users.viewlog (user_id, viewed_user_id, app_id, time) VALUES (113, 65061, 4, '2020-06-04 11:55:44+01');
INSERT INTO users.viewlog (user_id, viewed_user_id, app_id, time) VALUES (2599, 60587, 8, '2020-06-04 11:56:29+01');
INSERT INTO users.viewlog (user_id, viewed_user_id, app_id, time) VALUES (32605, 65303, 2, '2020-06-04 11:59:21+01');
INSERT INTO users.viewlog (user_id, viewed_user_id, app_id, time) VALUES (118, 67628, 1, '2020-07-08 23:10:54+01');
INSERT INTO users.viewlog (user_id, viewed_user_id, app_id, time) VALUES (607, 65303, 2, '2020-06-04 11:59:56+01');

-- ...
modified usersDump.sql file content

The GREP command

The grep command is a filter that is used to search for lines matching a specified pattern and print the matching lines to standard output.

Example - Get the author line on the last commit on a git repository

If we want to see the last commit information in a git repository we execute the command git show, which will show somthing like the following:

commit 0240dcad7f389a2f18536a8d98547b70124b87c4 (HEAD -> master, origin/master, origin/HEAD)
Author: John Smith <John.Smith@johnsmith.com>
Date:   Sun May 3 10:39:38 2021 +0100

    Use default dates when not present

[...]

We can pipe that result onto grep so we just get the author line:

git show | grep Author:

so we get:

Author: John Smith <John.Smith@johnsmith.com>

I was already very much used to using this command but with this last project I learnt the following important thing (from the docs):

"the exit status is 0 if a line is selected, 1 if no lines were selected, and 2 if an error occurred"

I normally check for exit status = 0 to make sure there are no errors. However, in this case, not finding a pattern was not an error but just some non-crucial information not available. For this reason, I didn't think that the command returning 1 was OK so I struggle to realise I also needed to check for value 1.

Regular expression tester

I used to use some simple PHP tester to test my regular expressions but with this project, I used Python. I found however a very useful tool to test regular expressions in a bunch of coding languages, https://regex101.com/.

This tool allows you to input your regular expression and your test string and validate it, check for syntax errors and evaluate matchers online.

phpseclib Library

As I mentioned earlier, as part of this project, I needed to ssh into different servers, run linux commands and parse their output. For this purpose, I've used phpseclib, a library that provides pure-PHP implementations of SSH2 and other network protocols. It is currently on its version 3 and its only requirement is to be using PHP 5.6+. It can be installed via composer and their docs are very clear and nice:

composer require phpseclib/phpseclib:~3.0

Apart from SSH2, the library also provides implementations of SFTP, X.509, an arbitrary-precision integer arithmetic library, Ed25519 / Ed449 / Curve25519 / Curve449, ECDSA / ECDH (with support for 66 curves), RSA (PKCS#1 v2.2 compliant), DSA / DH, DES / 3DES / RC4 / Rijndael / AES / Blowfish / Twofish / Salsa20 / ChaCha20, GCM / Poly1305.

As always, when I integrate a sophisticated library into my application, I tend to create a Façade to simplify the way I interface with the bits of functionality I need:

<?php

use phpseclib\Net\SSH2;

class Ssh
{
	/**
	 * @var SSH2
	 */
	private $ssh2;

	public function isConnected(): bool
	{
		if ($this->ssh2 instanceof SSH2)
		{
			return $this->ssh2->isConnected();
		}

		return false;
	}

	public function isAuthenticated(): bool
	{
		if ($this->ssh2 instanceof SSH2)
		{
			return $this->ssh2->isAuthenticated();
		}

		return false;
	}

	public function connect($hostname, $username, $password): void
	{
		$this->disconnect();
		$this->ssh2 = new SSH2($hostname);
	    $this->ssh2->login($username, $password))
	}

	public function disconnect(): void
	{
		if ($this->ssh2)
		{
			$this->ssh2->disconnect();
		}
	}

	public function setTimeout($timeout)
	{
		$this->ssh2->setTimeout($timeout);
	}

	public function run($command): array
	{
		$this->ssh2->setTimeout(0);
		$output = $this->ssh2->exec($command);
		$exitCode = $this->ssh2->getExitStatus();

		return [
            "output" => $output,
            "exitCode" => $exitCode
        ];
	}
}
ssh.php

With this simple class, I have a limited but straightforward interface that provides all the functionality I need from phpseclib, making my code simpler and more readable.

Conclusion

These are some of the main "lessons" learned from my most recent project, a simple example of how a one week long development project usually results in a lot of apparently unrelated concepts and pieces that can be learned from most straightforward jobs, and how we can stack those minor pieces of knowledge and use them in further projects.

Show Comments