Category Archives: Software Development

Quirks of DNS traffic with Docker Compose

Recently I had a scenario where I wanted to restrict the network traffic coming out of certain processes I started inside a container, so they could only do the minimum required for them to work, and not reach anything else outside the container. In order to explain what I found, let’s imagine that my process only wants to make a HEAD HTTP request to http://www.google.com (on port 80, not 443).

It will obviously need to send packets with destination port 80, and packets with destination port 53 so it can make DNS requests to resolve http://www.google.com. So let’s implement a quick setup with iptables to accomplish this. We’ll use the following Dockerfile that installs curl, iptables, and dnsutils on top of the default Ubuntu image, so we can test our scenario.

Dockerfile

FROM ubuntu:latest
RUN apt-get update && apt-get install -y curl iptables dnsutils

And the following docker-compose.yml file to help us build and run our container.

docker-compose.yml

version: "3.4"
services:
  my-container:
    build:
      context: .
    image: /my-custom-image
    cap_add:
      - NET_ADMIN
    command: tail -f /dev/null

The scenario I want to talk about only happens when starting services with Docker Compose, not when starting containers directly with docker run, so using a docker-compose.yml file is necessary even if it feels a bit overkill. Note we specify the NET_ADMIN capability for the container, which we need so we can use iptables, and a command that will keep the container running, so we can connect to it after Docker Compose starts it.

Now we run docker-compose -p test up -d in the folder that contains both our files, Docker Compose builds the image and starts a container. We can then connect to that container with docker exec -it test_my-container_1.

Let’s start by verifying that we can make our HEAD request to www.google.com:


HEAD request works

Great. Now let’s set up the iptables rules discussed above and make sure they look right.

iptables --append OUTPUT --destination 127.0.0.1 --jump ACCEPT
iptables --append OUTPUT --protocol tcp --dport 80 --jump ACCEPT
iptables --append OUTPUT --protocol udp --dport 53 --jump ACCEPT
iptables --append OUTPUT --jump DROP
iptables -L -v
Set up iptables rules

We add the rule for localhost just to make sure that we don’t break anything that’s connecting to the machine itself (without it, the rest of this scenario won’t work as expected).

Now we test curl --head www.google.com again to make sure everything’s fine… but it says it cannot resolve the host! Furthermore, nslookup www.google.com times out. And checking the iptables rules we see 5 packets dropped by the last rule, but none accepted by the rule for UDP port 53. How come?

CURL does not resolve host, nslookup times out

Well, it turns out that when Docker Compose creates a service, it creates iptables rules in another table (the NAT table) to reroute certain things through the Docker infrastructure. In particular, it changes the port of DNS requests from 53 to something else. You can see this by running iptables -L -v -t nat:

iptables rules in the NAT table

Here we can see that there’s a rule mapping UDP port 53 to 53789, when the request is going to IP 127.0.0.11 (where Docker hosts its DNS resolver). So if we now add another iptables rule for that port to our setup, we’ll see that our curl command works again!

CURL works again after adding new iptables rule

However, that port is not static, so the approach that I ended up taking was to create a rule to allow any packet with destination IP 127.0.0.11, which is the one where Docker hosts its DNS server, and the only one for which it maps ports.

Conclusion

If you plan to mess with DNS network traffic in your containers and you use Docker Compose to start them, be aware that Docker sets up rules to change the destination port for DNS requests.

Partial csproj files simplify your NuGet dependencies

Recently I learned about a pretty simple feature that is super useful when working with .NET solutions that have several .csproj files, and the same NuGet package dependency in two or more of them.

The simple way to do this —what Visual Studio’s “Manage Nuget package” dialog does— is to add/update PackageReference elements in each csproj, like this:

separate-nuget-refs

Separate references to the same Nuget package in different projects

But this means that whenever you want to update this dependency, you need to do it separately in each project… and I hope you don’t have too many of them.

The DRY way to do this, is with Import elements in the csproj file, that reference other (partial) csproj files, like this:

common-nuget-refs

Shared, partial, .csproj file referenced by other .csproj files

Note that I used .csproj.include for the shared csproj file; the extension doesn’t actually matter.

Now whenever you need to update that dependency, you can do it in a single place and all the projects that reference that file will keep their versions in sync.

The only caveat I’ve found with this method is that Visual Studio’s “Manage Nuget packages” dialog doesn’t play well with it. If you use it in any particular project to update the package defined in the common file, a new PackageReference will be added to that project file, and the Import statement will remain. This won’t cause a build error, but depending on the order of the PackageReference and Import elements in your project file, might end up causing one or other version of the package to be used. So make sure that your whole team understands how these shared dependencies packages need to be updated going forward.

Don’t use a variable named TMP in your scripts that call the dotnet CLI

For a long time now, I had a script where I was passing --no-build to a dotnet test command, because otherwise it got stuck in a very weird way. The tests never started running (in fact the build never finished), and if I hit Ctrl-C to stop it, even though it apparently stopped, something kept running in the background and printing warnings to my console, on top of whatever else I was doing.

build stuck warnings

I googled keywords from the warning and couldn’t find anything relevant. Today I had to deal with this script again and decided to fix it once and for all.

For context, this is a bash script that runs on Windows (with Git Bash) and basically does the following:
– Start a test environment with several containers using docker-compose.
– Figure out which ports were exposed on the host for some of those containers and export them as environment variables (so the project being run with dotnet test sees them).
– Run dotnet test to execute the tests in a project.
– Use docker-compose to remove the environment we spun up earlier.

So I started troubleshooting my dotnet test command.

It ran fine outside the script, and also by itself inside an .sh file. So I started adding all the other pieces of the script little by little, until I found the one that made dotnet test hang. It was a line pretty much identical to this:

TMP=$(docker port ${PROJECT_NAME}_myservice_1 80)

That’s (part of) how I get the port that was exposed for a particular container, but I refused to believe that executing docker port had anything to do with the problem. So I tried renaming TMP to TMP_PORT_INFO… and what do you know, the script didn’t get stuck anymore!

I couldn’t find any official documentation about this, but it seems like dotnet build (which dotnet test runs implicitly) depends on the TMP variable to be a path to a temporary storage location for the system. A bit of research made me think that in UNIX, the relevant variable is TMPDIR, but in Windows it’s TMP.

So there you have it. If you want to avoid some painful troubleshooting, just don’t use TMP as a variable name in your scripts.

Tweak WiredTiger cache size if several MongoDB instances run side by side

I have a Linux box running 5 instances of MongoDB, each one for a different environment. The server works fine during the day, most of the time at ~90% memory usage. But recently I started seeing that one (sometimes more) of the mongod instances running there died every night when my backup script ran, courtesy of the OS’s out-of-memory Killer. Thanks to Azure I can see the pattern very clearly:

After noticing that swap wasn’t enabled for the server, enabling it, and seeing that mongod processes kept dying nightly, I discovered that MongoDB does not use swap because it uses memory-mapped files:

Nevertheless, systems running MongoDB do not need swap for routine operation. Database files are memory-mapped and should constitute most of your MongoDB memory use. Therefore, it is unlikely that mongod will ever use any swap space in normal operation. The operating system will release memory from the memory mapped files without needing swap and MongoDB can write data to the data files without needing the swap system.

So enabling swap didn’t solve my problem of dying instances, but it’s something that the server should have anyway so I left it enabled.

I kept reading MongoDB’s official documentation and ran into this:

The default WiredTiger internal cache size value assumes that there is a single mongod instance per machine. If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other mongod instances.

That sounded pretty promising! I started by looking at my instances to see what the current value was. You can do that by running db.serverStatus().wiredTiger.cache in the Mongo shell, and looking for property “maximum bytes configured” in the output document.

Sure enough, the server has 16GB of RAM, and those ~7.8GB are more or less what I’d expect based on the 0.5 * (RAM-GB - 1GB) calculation in the docs. The issue is that all five instances have the same value!

So off I went and changed that setting to 3GB instead… and voilà! Stable DB server again, even with 5 separate instances of Mongo running in there.

Fixing error code 137 when building a Docker image

A few days ago I was containerizing an Angular web application and ran into an issue that I think is worth documenting for future reference.

Implementing the application itself went without a hitch, and everything looked good when hitting F5 in Visual Studio. But When I ran docker build, I got the following error from the step that ran dotnet publish:

> client-app@0.0.0 build /src/MyApp/ClientApp
> ng build "--prod"

Killed
npm ERR! code ELIFECYCLE
npm ERR! errno 137
npm ERR! client-app@0.0.0 build: `ng build "--prod"`
npm ERR! Exit status 137
npm ERR!
npm ERR! Failed at the client-app@0.0.0 build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2018-11-03T09_21_34_260Z-debug.log

The first thing to know is that the Dockerfile was building and publishing the application in Release mode, not Debug mode (which I had been using to run it in Visual Studio). So first things first, I tried to publish it in Release mode on its own (not by building the Dockerfile) with dotnet publish -c Release MyApp.csproj… and that worked fine. So the issue had to do with the fact that the app was being built/published inside a container.

With a bit of googling I found out that error 137 usually means that the process was killed by the Linux kernel when the system is running out of memory.

So I looked at my Docker configuration (right-click the docker icon in the system tray, go to Settings, then Advanced) and saw that the Linux VM was configured with only 2GB of RAM. I’m surprised that isn’t enough, but I bumped it to 4GB to see if it made any difference… and it did! docker build now ran successfully!

At some point I’ll figure out why my application requires so much RAM to build… but at least now I’m able to create the docker image successfully.

Improving the throughput of NLog.Targets.Syslog when using UDP

I’m using Luigi Berrettini’s NLog.Targets.Syslog package in one of my projects to log from a set of containers to a centralized syslog server, and I noticed that when one of the containers in my application had a sudden spike of logged messages, they were taking a very long time to reach the syslog server.

A bit of research brought me to this closed issue in the project’s repo, and this question that the author posed to Microsoft in relation to the issue. I think the first link explains the problem pretty well but long story short, it turns out that if you configure the NLog target using the UDP protocol, default settings in the library make it so messages get dequeued to be sent through the UDP socket at a rate of 2 per second. The rationale behind the code that has this effect was to try to minimize message loss if the UDP destination wasn’t there, but IMO the performance impact is a bad trade-off. And since the nature of UDP means that package loss is a possibility, I’d rather know that some of my messages might get lost, but have better logging throughput out of the box.

The fix to improve this throughput is pretty easy, just add a connectionCheckTimeout attribute to the target/messageSend/udp element in the nlog.config file, with a low value (0 being a possibility).

<target xsi:type="Syslog" name="syslogTarget">
  <sl:layout xsi:type="SimpleLayout" text="${message}" />
  <sl:messageSend>
    <sl:udp server="127.0.0.1" port="514" connectionCheckTimeout="0" />
  </sl:messageSend>
</target>

This value (in microseconds) controls the timeout that the target uses when trying to decide if data can be sent on the socket, a check it does for every message. So decide if you want anything other than 0 here, update your nlog.config, and enjoy your improved throughput!

What are all the sections in the Dockerfile generated by Visual Studio?

If you’ve added Docker support to a project through Visual Studio you know that a Dockerfile is automatically created for you. Some things in this file are not very intuitive and took me a while to figure out, so I decided to document my findings and share them with the community. This is all based on my research and understanding, so if anyone knows better feel free to chime in. Also, I assume you have a basic understanding of what the commands in a Dockerfile do, the main purpose of this post is to explain the whys.

I’ll start by creating a .NET Core Console app called DockerConsoleApp and adding Docker support by right clicking on the project in Solution Explorer and selecting Add -> Docker Support (choose Linux or Windows depending on the kind of containers that your Docker daemon is configured to use).

AddDockerSupport.png

How to add Docker Support to your project

Your solution should now have a docker-compose project, and your console app should now have a Dockerfile that looks like this (at least as of the time of writing; I’ve seen it change a couple of times in the past couple of months):

FROM microsoft/dotnet:2.0-runtime AS base
WORKDIR /app

FROM microsoft/dotnet:2.0-sdk AS build
WORKDIR /src
COPY DockerConsoleApp.sln ./
COPY DockerConsoleApp/DockerConsoleApp.csproj DockerConsoleApp/
RUN dotnet restore -nowarn:msb3202,nu1503
COPY . .
WORKDIR /src/DockerConsoleApp
RUN dotnet build -c Release -o /app

FROM build AS publish
RUN dotnet publish -c Release -o /app

FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "DockerConsoleApp.dll"]

Obviously, the directory and file names (DockerConsoleApp in the example above) will depend on the name of your project.

Let’s split the analysis into the four stages defined in this file (base, build, publish and final) but I’ll tackle them from most-to-least obvious or complicated. So let’s start with the publish stage.

Publish stage

FROM build AS publish
RUN dotnet publish -c Release -o /app

The first line indicates that this stage depends on the build one, but that doesn’t prevent us from easily explaining what happens here. The one thing that’s worth noting is that the build stage has a copy of our application’s source code, and so RUNning dotnet publish in this stage does exactly what it sounds like: it builds our code (using the Release configuration specified with the -c parameter) and publishes the output to the /app directory in the image (specified with the -o parameter). Not much else to say here, so let’s move on.

Final stage

FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "DockerConsoleApp.dll"]

This is the easiest stage to figure out. First off, it is based on the base stage, which in fact does nothing. “Why does it exist, then?” you might ask? We’ll get to that. What matters now is that the base stage depends on the official microsoft/dotnet:2.0-runtime docker image from Microsoft, which as its name implies contains the runtime bits to run (but not build) .NET Core applications (in particular console applications, ASP.NET Core applications are a slightly different story). This stage produces the final image that we’d publish to a repository so we want it to be as small as possible, making the 2.0-runtime image the best fit.

Lines 2 to 4 just move to a particular directory in the Docker image, copy the output of the publish stage (which is all that we need to run our app), and define the command to be executed when starting a container based on this image.

Build stage

FROM microsoft/dotnet:2.0-sdk AS build
WORKDIR /src
COPY DockerConsoleApp.sln ./
COPY DockerConsoleApp/DockerConsoleApp.csproj DockerConsoleApp/
RUN dotnet restore -nowarn:msb3202,nu1503
COPY . .
WORKDIR /src/DockerConsoleApp
RUN dotnet build -c Release -o /app

This is the most interesting stage in terms of the lessons it teaches. For starters we see that this stage is based on the microsoft/dotnet:2.0-sdk image in contrast to the microsoft/dotnet:2.0-runtime image used by the final stage above. The SDK image is signifcantly bigger (1.74GB VS 219MB) because it has everything required to build our code. The size comparison should make it clear why we want our final image to be based on the 2.0-runtime image and not the 2.0-sdk one.

The actual work done in this stage starts with copying the .sln and the .csproj files to the image. In this case it’s only one .csproj, but if your project depends on other projects in the solution, you’d see one COPY line per .csproj file1. Then we run dotnet restore2, and finally copy all of the source code (which I should note, overwrites the .sln and .csproj files that were copied earlier) before running dotnet build to compile our application.

So, if the last COPY takes care of the .sln and .csproj files, why are we “cherry-picking” them into the image by hand?

The answer is Docker’s build cache. Docker generates a layer each time it runs any command from a Dockerfile, and tries to reuse them as much as possible. Before running any command, it checks if it has run it before with the same current state (i.e. from the same current layer) and if it believes that running it again would result in the exact same result, then it just grabs that resulting layer from its cache; otherwise it executes the command and foregoes using the cache for any additional commands for the rest of that build. For ADD and COPY commands it uses a hash of the contents of the files to determine if it can use the cache, while for all other commands (like RUN) it just looks at the command string itself.

It should be clear that we want to leverage this cache as much as possible so building our Docker image is fast. One key insight towards this goal is that source code files change pretty much all the time, but not all steps of building our application actually need them. Another way of thinking about this is: when you bring your source code files into the image, you’re pretty much guaranteeing that Docker can’t henceforth use its layer cache, so before you do that you should try to perform as many build steps as possible in the hopes that at least those will be able to leverage the layer cache. The more “static” (deterministic) those build steps, the better their chances of actually being able to use the cache.

dotnet restore is a perfect candidate for this because it only depends on the .csproj files, which for the most part change infrequently (especially when compared to source code files). For a particular set of .csproj files, running dotnet restore always results in the same NuGet packages being downloaded. Package versions are explicitly specified so there’s no risk of asking for a package by name and ending up with a newer version if the package owner published an update. Docker itself cannot know for sure that this command is deterministic, but we do and can use this knowledge to invoke that step in a way that it can leverage the cache.

The .sln file is not technically necessary for dotnet restore, but it lets us execute the command once instead of doing it once per project file.

If you build the Dockerfile manually with docker build, you can actually see layer caching at play. The first time it builds, Docker will say that it’s doing work for each and every step. If you then build it again with no changes to project files nor source code, you’ll see that every step says “Using cache” (as the first image below shows). If you then change Program.cs in any way (say, adding a Console.ReadLine();), you’ll see that all steps up to the dotnet restore keep using the cache, and only subsequent commands need to be executed (as the second image below shows).

EverythingFromCache.PNG

Logs from Docker build once the project has been built before.

AfterSourceCodeChange.PNG

Logs from Docker build after making changes to Program.cs

So the build stage is split like that in order to maximize usage of Docker’s layer cache, and consequently minimize the time it takes to build the image. This means that Docker will only need to download your NuGet dependencies once3 instead of on every build.

Base stage

FROM microsoft/dotnet:2.0-runtime AS base
WORKDIR /app

Finally, we come to the base stage. I said above that it does nothing, which is basically accurate (WORKDIR does create the directory, but nothing is being copied to it). The reason why Visual Studio includes this stage in the Dockerfile it generates, is so it can work its magic to let you debug your code inside a running container. If you debug the docker-compose project, you’ll see something like these two messages in the Output window (replace with your directory and project names as necessary):

docker build -f "F:\Sandbox\DockerConsoleApp\DockerConsoleApp\Dockerfile" --target "base" -t "dockerconsoleapp:dev" "F:\Sandbox\DockerConsoleApp"
docker-compose -f "F:\Sandbox\DockerConsoleApp\docker-compose.yml" -f "F:\Sandbox\DockerConsoleApp\docker-compose.override.yml" -f "F:\Sandbox\DockerConsoleApp\obj\Docker\docker-compose.vs.debug.g.yml" -p dockercompose8626016377156038970 --no-ansi up -d --no-build --force-recreate --remove-orphans

The docker build command uses the --target parameter to indicate that Docker should stop processing the Dockerfile once it completes the steps in the base stage (and use that image as the result of the build). Since it is the first stage in the file, it’s the only one that gets built when VS is doing its magic. Visual Studio leaves this image empty because when it uses it to start a container, it will mount the directory in the host where your code lives, into the /app directory in the container. You can see how it does that by looking at the docker-compose.vs.debug.g.yml file referenced in the docker-compose command, which includes some other volumes in addition to the one that loads the source code:

volumes:
- F:\Sandbox\DockerConsoleApp\DockerConsoleApp:/app
- C:\Users\alexv\vsdbg\vs2017u5:/remote_debugger:ro
- C:\Users\alexv\.nuget\packages\:/root/.nuget/packages:ro
- C:\Program Files\dotnet\sdk\NuGetFallbackFolder:/root/.nuget/fallbackpackages:ro

Without the base stage in the Dockerfile, Visual Studio would not have an empty image to start an empty container where it could mount your source code, and would probably not be able to provide a live debugging experience when running your code inside an actual container.

Conclusion

Hopefully now you have a better understanding of why the Dockerfile generated by Visual Studio looks like it does, which should let you decide where you can safely make changes to it if you need to, while keeping it cache-friendly.


  1. If the dependencies between projects were already there when you added Docker support to your project. If you added dependencies afterwards, it’s in your best interest to add the corresponding COPY commands here (in fact building the Docker image might fail if you don’t). 
  2. The -nowarn:msb3202,nu1503 parameters are workarounds for a couple of open issues that have to do with Visual Studio’s support for Docker and a change of behavior in NuGet that turned a warning (which didn’t cause dotnet build to fail) into an error (which does). 
  3. Technically, once every time you make changes to your .csproj or .sln files. 

Visual Studio, Docker Cloud hooks, and UTF-8 with signature

Today I ran into an issue trying to make custom hooks in Docker Cloud work. I first tried the default post-push hook to add another tag to the built image, straight from the documentation:

#!/bin/bash
docker tag $IMAGE_NAME $DOCKER_REPO:$SOURCE_COMMIT
docker push $DOCKER_REPO:$SOURCE_COMMIT

But the logs in Docker Cloud had the following error: Could not execute hook at 'hooks/post_push'. Is it missing a #! line?

Error in Docker Cloud build log

It was not missing that line, and I got the same error whether I used Windows or Unix end-of-line characters.

After a while I remembered an issue I’ve had with Visual Studio at other times which is that it will save files with encoding “UTF-8 with signature”. The “with signature” part is why sometimes other consumers of the file (in this case Docker Cloud) cannot read it correctly.

What “UTF-8 with signature” does is add an invisible sequence of bytes called a BOM (Byte Order Mark) to the beginning of the file.

However, let’s look at this answer in Stack Overflow:

The UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify a file as being encoded in UTF-8.

Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

Even if it was recommended, I was almost certain that the BOM was the culprit of my issue. I needed the file to be saved as “UTF-8 without signature” (without that invisible sequence of bytes at the beginning of the file). The way to do that is to open your file in Visual Studio and go to File -> Save As. When the dialog comes up, first make sure Visual Studio didn’t add .* to the end of your filename (it probably does this for files that have no extension), otherwise nothing will happen when you try the next step. Now click on the arrow on the right side of the “Save” button, select “Save with Encoding…” (if you didn’t remove .* from the end of your filename, nothing will happen here) and set the Encoding option to “Unicode (UTF-8 without signature) – Codepage 65001”.

How to save a file as UTF-8 without signature – Step 1

How to save a file as UTF-8 without signature – Step 2

Save your file, and retry whatever it is you were doing, it should work this time. At least the Docker Cloud build did :).