My goal

Because I always thought about having one single server for running all the tasks I needed, but I also wanted some way to separate the services and make it all as secure as possible, I decided to go down the route of making unprivileged containers with always one specific software per container and networking them all together. This post describes how I did that on my netcup hosted Debian 9 machine.
As my hoster only assigned one IPv4 address and doesn't route the whole /64 IPv6 block to my VPS I also had to do some more work with networking.

My concept

At first, I just started by jumping right into it. Installing LXC, using the built-in network bridge service and making privileged containers left and right. After some time, I noticed that this wasn't a very good nor in any way structured way to do things, so I purged everything and started from scratch by first making a proper concept and it looked something like this:

+-----------------------------------------+
|                                         |
| IPv4 / IPv6 interface                   |
|                                         |
+-----------------------------------------+
|                                         |
| iptables filter nat and routing         |
|                                         |
+-----------------------------------------+
|                                         |
| IPv4 / IPv6 bridge                      |
+-----------------------------------------+
|                                         |
| unpriviliged user starting veth         |
| interfaces for every container          |
|                                         |
+-------------+-------------+-------------+
|             |             |             |
| container01 | container02 | container03 |
|             |             |             |
+-------------+-------------+-------------+

Doing it

For container to make sense in my use case there needs to be a working virtual network running on my host. So, I spent the first step setting that up.

Step 001: Setting up the basic network

Virtual networking on a Linux hosts works basically my making virtual network interfaces to which the container's traffic get routed to and from.

Setting up the bridge

So, the first thing I did was configure a network bridge. At the time of writing this the built-in lxc-net tool didn't have support for IPv6 so I decided to take matters in my own hands and do it all manually. There was not much to it other than installing bridge-utils and adding some new lines into the interfaces file like this:

> /etc/network/interfaces

...
auto lxcbr0
iface lxcbr0 inet static
	hwaddress aa:01:10:19:96:01
	address 10.96.0.1/16
	bridge_ports none
	bridge_fd 5

iface lxcbr0 inet6 static
	address 2a03:4000:24:e3:1096::1/80

After that was done, I restarted the network service and had a brand new network bridge configured as seen in the ip address output

> ip address

3: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
	link/ether aa:01:10:19:96:01 brd ff:ff:ff:ff:ff:ff
	inet 10.96.0.1/16 brd 10.96.255.255 scope global lxcbr0
		valid_lft forever preferred_lft forever
	inet6 2a03:4000:24:e3:1096::1/80 scope global
		valid_lft forever preferred_lft forever
	inet6 fe80::30b0:87ff:fe01:7cdd/64 scope link
		valid_lft forever preferred_lft forever

Enabling packet forwarding

The Linux kernel doesn't allow packet forwarding in the network stack by default, so I needed to activate it. For that I ran these two commands and uncommented following lines in the sysctl configuration to make the changes persistent between reboots.

sysctl -w net.ipv4.ip_foward=1
sysctl -w net.ipv6.conf.all.forwarding=1
> /etc/sysctl.conf

...
net.ipv4.ip_forward=1
...
net.ipv6.conf.all.forwarding=1
...

I also restarted the network service after, just for good measure. And now packets could be forwarded via iptables rules.

Configuring iptables

To enable any kind of communication from containers connected to this bridge to the outside world, I needed to add some iptables rules. So, I installed iptables-persistent and after running following commands, I added some entries to the resulting files.

iptables-save
ip6tables-save
> /etc/iptables/rules.v4

*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -s 10.96.0.0/16 ! -d 10.96.0.0/16 -j SNAT --to-source <public-ipv4-of-server>
COMMIT
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -m limit --limit 10/sec -j ACCEPT
-A INPUT -p tcp -m tcp --dport 8192 -j ACCEPT
COMMIT

What I did here was first off configure a source NAT so traffic that's being routed from the bridge to the outside world will go through the NAT process and use the public IP of the host. The reason I chose SNAT over MASQUERADE is because I have a static IPv4 configured so it makes it a slight bit faster than having the MASQUERADE rule always look up the IP address of the NIC.
The next thing I configured was the filter and here I said to drop any packets that aren't either:

  • coming from the loopback interface
  • are related to any existing connection
  • are exceeding the limit of 10 per second when they are from the type ICMP
  • or are trying to connect to the SSH port for a secure connection

As I add containers that will be connected to from the outside, I will add PREROUTING rules into the NAT section of this file. This will make it possible that connections to a specific port will be routed to the internal address of a container.

> /etc/iptables/rules.v6

*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p ipv6-icmp -m limit --limit 10/sec -j ACCEPT
-A INPUT -d <public-ipv6-of-server>/128 -p tcp -m tcp --dport 22 -j ACCEPT
COMMIT

The IPv6 configuration for iptables is a bit simpler, because I didn’t need NAT as I am able to use the whole /64 address block of the 128 bit long IPv6 address that was assigned to my host. I just needed to do some trickery with it, but I’ll cover that later.
I also configured the same filter rules as I did with IPv4

After I added all necessary lines to those two files and making sure that these are completely correct to not get locked out of the system, I ran these two commands to apply the configurations:

iptables-restore < /etc/iptables/rule.v4
ip6tables-restore < /etc/iptables/rule.v6

Then I made sure that the systemd service netfilter-persistent is enabled, so the changes are applied on every boot.

Circumventing the hosters IPv6 restriction

Because my hoster doesn't route the whole /64 address block to my VPS and just uses the neighbour discovery protocol to figure out where to send the packets and by default the linux network stack doesn't advertise addresses behind a bridge I had to use a tool called ndppd. This neighbour discovery protocol proxy daemon takes the neighbor solicitation message and answers it if there is a host on a specific bridge with the address which is being asked for.
This is the config I useed for it.

> /etc/ndppd.conf

proxy ens3 {
	rule 2a03:4000:24:e3:1096::/80 {
		iface lxcbr0
	}
} 

What this does is answer all neighbor solicitation messages for addresses in the specified network and that exist on the specified interface. In this case my bridge. So, this makes it possible for anything that is connected to the bridge is reachable from outside and can communicate with the outside.

Router advertisement for the bridge

Another useful thing I did so I don’t have to manually configure the IPv6 gateway on every container, was to install and configure radvd. It provides the router advertisement service on the configured interfaces.

> /etc/radvd.conf

interface lxcbr0 {
	AdvSendAdvert on;
	AdvManagedFlag off;
	AdvOtherConfigFlag off;

	prefix 2a03:4000:24:e3:1096::/80 {
		AdvOnLink on;
		AdvAutonomous on;
		AdvRouterAddr on;
	};
};

This basically says that the daemon provides router advertisements to the network connected on the bridge interface and advertises that interface as gateway for hosts in the given network.

At this point I successfully configured the network backbone for my containers to run on and I can move on to configuring LXC

Step 010: Configuring privileged LXC

With this I had a few steps to consider because I not only configured privileged containering with LXC (Containers run as root user) but also unprivileged (Containers run as non-root user). I made this step because it brought me the peace of mind that even if one of my containers got compromised and the attacker would be able to break out of the container, my host system would not be affected. It would still mean that a person like that could take over my other containers as well, if things went badly, but I take that over having my whole host being take over. So, let's get to it!

Installing and configuring LXC

First of all, I installed the lxc package via apt. Then I made sure the service lxc-net was disabled to not have it meddle with my own bridge setup. After that I made the folder structure /srv/lxc/priv and put following line into eh LXC configuration.

> /etc/lxc/lxc.conf
lxc.lxcpath = /srv/lxc/priv

This changes the default folder where all the privileged containers and configurations are stored. I did that just because it’s a little easier for me.
To give new containers a basic setup, which in my case is only network related, I added following lines to the default configuration

> /etc/lxc/default.conf

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = aa:01:10:19:96:xx

This configures the network interface the containers get configured on startup. In my case what type of interface, to which bridge it connects, if it's up or down and what MAC address it should have. The “x"s say LXC to figure out an available one with the given pre- and suffix

Creating a test container

At this point I was able to make new containers and I tested it by making a new Debian based container

lxc-create -n test01 -t download -- -a amd64 -d debian -r buster

This downloads the newest available LXC image of Debian Buster and makes a container named “test01”. Now I used this script of mine https://git.jo-e.de/josef/lxc-staticnet to setup the networking of the container with the addresses I wanted it to have. This script can also copy the hosts /etc/resov.conf to the container and set the network interface to “manual” so the DHCP client doesn't start.
After all that I started the container…

lxc-start -n test01

… and attached its console so I can test if everything works.

lxc-attach -n test01

Now that I was sure everything worked as expected I stopped the container and destroyed it again

lxc-stop -n test01 && lxc-destroy -n test01

Step 011: Making things unprivileged

The very first thing to do is to enable unprivileged user namespaces. Namespaces are on the basic level a way in the Linux kernel to give different processes a completely different view on system rescources from each other (e. g. network interfaces, filesystems). This is the basic feature Linux containering works on and gives the possibilities to basically run multiple systems ontop of a host without virtualizing anything.
For this feature to be accessible for unprivileged users a specific feature needs to be activated on kernel level so I activated it and also made a new file under /etc/sysctl.d for it to be persistent between reboots.

sysctl -w kernel.unprivileged_userns_clone=1
> /etc/sysctl.d/00-userns.conf

# Enable user namespaces
kernel.unprivileged_userns_clone=1

Now that this was out of the way I added a new user with which the unprivileged containers will get started. I placed the home directory of the user under the same path I will store the unprivileged container for an easier access.

useradd upriv -h /srv/lxc/upriv/home

To give the user the ability to create virutal network interfaces as containers get started I made this file and put the needed configuration inside it. In this example the user will be able to start up 5 virtual interfaces concurrently.

> /etc/lxc/lxc-usernet

upriv veth lxcbr0 5

One weirdness of this whole construct is that I couldn't just su into the user to start and attach to container, because some parts of the environment were missing and I didn't want to spend the time to work out what's wrong, so I just allowed the new user to connect via SSH to the server, but only from localhost. After that all I had to do now is run ssh upriv@localhost to connect to the user and was able to work without restrictions.
Every following configuratio I did while connected to the server with that user.

First I made the same configuration as with the privileged containers to store the unprivileged ones somewhere else as well

> ~/.config/lxc/lxc.conf

lxc.lxcpath = /srv/lxc/upriv

After that I had to add some more lines to the default configuration of new containers so they will actually run under an unprivileged user. For that I first needed to find out the user id and group id ranges the user could use which I was able to find in the files /etc/subuid and /etc/subgid/. I filled in the values in following form into the default configuration file of unprivileged containers.
I also included the default config of privileged containers so I wouldn't need to change things in multiple locations.

> ~/.config/lxc/default.conf

# UID mappings
lxc.include = /etc/lxc/default.conf
lxc.id_map = u 0 165536 65536
lxc.id_map = g 0 165536 65536

“u” stands for the UIDs and “g” for the GIDs

After that I was good to go and was able to create my first unprivileged container, set it up with my lxc-staticnet script, start, attach and use it as any other Debian machine. But there are still some things I want to configure.

Step 100: A bunch of nice-to-haves

Shared folder

For one I wanted a shared folder between the host and all the other containers to exchange files easily and without needing root privileges to directly pull or push files to the root filesystem of the containers.
I achieved that by making a folder to share and adding this line to each of the containers configurations. (The folder /mnt/share needs to exist inside the containers)

> /srv/lxc/upriv/<container-name>/config

lxc.mount.entry = /srv/lxc/share/ mnt/share none bind 0 0

Autostart

Another thing that is pretty much necessary is to automatically start the containers on boot, which surprisingly needs a few steps to setup.
First, every container that should automatically start needs these two lines in their configuration files

> /srv/lxc/upriv/<container-name>/config

...
# Autostart
lxc.start.auto = 1
lxc.start.order = 10
...

The first of the two enables autostart for the container and the second one determines the order in which the container gets startet in relation to the other ones. Lower means earlier.

But just configuring those lines is not enough, because those only tell the application lxc-autostart what to do and that is not run boot. Especially not for the unprivileged user. To cicumvent that I used a feature from systemd where the user can actually have their own services run at logon. For this I first added a systemd unit file to the users systemd configuration with following content.

> ~/.config/systemd/user/lxc-autostart.service

[Unit]
Description="autostart for unprivileged containers"

[Service]
Type=oneshot
ExecStart=/usr/bin/lxc-autostart
ExecStop=/usr/bin/lxc-autostart -s
RemainAfterExit=1

[Install]
WantedBy=default.target

After saving I quickly enabled the newly added unit.

systemctl --user enable lxc-autostart

Because users usually don't get logged in until… well… somone logs in as that user, I ran the following command as root to have a session created for that user at startup, which is basically a login, which in turn starts the user at startup.

loginctl enable-linger upriv

This adds another quirk to the system though. If I now tried to stop and then start a container started by this systemd service I ran into an error. I'm not completely sure to why that is, but I'm guessing that it has to do with it being different sessions. As a workaround I added following lines to upriv's bash alias file that gets loaded on logon:

> ~/bash_aliases

alias sys-lxc-stop='systemd-run --user lxc-stop'
alias sys-lxc-start='systemd-run --user lxc-start'

With that I can use the commands sys-lxc-stop and sys-lxc-start while logged into as user upriv for stopping and starting containers in the systemd context.

The conclusion

As far as I'm concerned it took me a long time to figure all these things out, but having the steps all neatly put underneath each other it's actually pretty simple and quick to do. And I know it might have been faster to just use Docker or another containering solution, but I wanted to have it as basic as possible, so not too much of the setup process is getting lost in abstraction. This helped me understand a lot about Linux containering and the OS in general.
There is still one thing I'd like to do though and that is getting a grip on cgroups to manage the physical resources a container is able to use, but that is something I haven't spent any time on at all. But there might be a post about it in the future.
For now, I'm happy with how the whole solution turned out and I've been using it like that for months now.