Docker is by far the most popular containerization platform dominating the world of containers. I have another post about Docker which gives a high-level overview of its components and working; give it a read and come back. At the heart of Docker's power lies the Dockerfile, a simple yet powerful configuration file to define a Docker image. In this article, we'll explore Dockerfiles from the ground up with their syntax, basic commands, and best practices.
What is a Dockerfile?
A Dockerfile is a text configuration file that contains a set of instructions for building a Docker image. These instructions define everything needed to create a self-contained environment for running your application (i.e. a Docker image), including base operating system, dependencies, configuration, and runtime commands. Rather than going through everything theoretically, let's have a look at a Dockerfile and go line by line to understand it at its core.
# Take latest nginx image as base image
FROM nginx:latest
# Set working directory
WORKDIR /var/www/html
# Update and Upgrade
RUN apt update && apt upgrade -y
# Copy code
COPY source .
# Expose port 80
EXPOSE 80
# Set entrypoint
ENTRYPOINT ["nginx"]
# Start nginx on foreground
CMD ["-g", "daemon off;"]
This is a simple Dockerfile that runs a simple static website using Nginx on Docker. We can add custom nginx configuration files and more for complex application scenarios but that's not the topic for today.
In a Dockerfile, the configuration commands are written line by line in a very straightforward approach where each line is executed in its sequential order when we build the image. Just a quick note: a Docker image is a template defined to run a container. The container is the actual running application. So, we can have as many containers running as we want using the same image. Let's go through each line in this Dockerfile to understand the use of each command.
The FROM nginx:latest
command in the first line is used to fetch and set the base image of the Docker image. In this case, I'm using the latest version of the Nginx docker image. The part before the :
is the name of the image and after that is the tag or the version of the image. When simply using this command, Docker will search locally for this specific image, if not found locally, it will try to find it on Docker Hub and will use it.
The base image is basically a starting point for our image. This will have all the minimum requirements including all required operating system components, required dependencies and everything ready for the image that we're trying to use. In this case, this base image is capable of running nginx on the container with all the requirements fulfilled and we're extending it with our own Dockerfile. This base image will also have its Dockerfile defined, which you can find on its official documentation or on Docker hub.
In the second line, with WORKDIR /var/www/html
, I'm setting up my working directory for the image and the container. When a working directory is defined, all the commands below this definition in the Dockerfile will run on that defined directory.
In RUN apt update && apt upgrade -y
, we should be focusing on RUN
which is Dockerfile config directive to run any command in the base image. You can run any command that you can run on a Linux machine after RUN
but it depends on what base image you've chosen, which variant of Linux the base image uses etc. In this case, as the Nginx base image uses a Debian based image as its base, I am updating the package lists sources and updating the packages using apt
package manager. The apt
is the default package manager for Debian based Linux distributions.
The COPY
directive is used to copy the content from our local machine to the Docker image. In this Dockerfile, it is copying everything inside the source
directory into the current directory of docker image which will be /var/www/html
as we've already defined the working directory. For this to work, the Dockerfile and the source
directory should be inside the same directory level.
As an alternative to COPY
, there is ADD
directive which has the ability to do everything that COPY
does with a few additional features. The ADD
command can fetch or copy the content from remote URLs, copy a zipped file from the local machine and unzip it on the image automatically etc. Generally, it is recommended to use COPY
over ADD
unless it's compulsory as the additional features in ADD
command come with some performace overhead.
The EXPOSE
directive in Dockerfiles is used to expose a specific port from the container to the host. By default, container is a very isolated environment which doesn't allow any inbound traffic, so it is necessary to define the port we want to allow in our Dockerfile for inbound as per our requirement. In this case I'm exposing port 80
on my container to allow inbound http
traffic to the container.
Now let's come to ENTRYPOINT
, which defines the command that is run at the start of the container from the image. Another directive is CMD
which is often mentioned together with ENTRYPOINT
. The reason is, they both work combinedly. The command defined in the ENTRYPOINT
is basically the main command that is executed at the start of the container and the commands defined in the CMD
are the arguments for the ENTRYPOINT
command. So, in this case, when the container starts nginx -g daemon off;
command is run. In both the ENTRYPOINT
and CMD
directives, the commands are defined in an array of strings.
In many cases, you may see only one, either ENTRYPOINT
or CMD
defined in the Dockerfile. That will also work just fine but will have an impact on the flexibility of startup command modifications while running the container. Let's say I build an image from this Dockerfile using the command below on the same location as my docker file: docker build -t MyApp .
. This command will simply build an image from this Dockerfile and name it MyApp:latest
. Let's run this image and see the use of ENTRYPOINT
and CMD
with different scenarios.
To run create a container from this image and start the container, we can use the command docker run -p 8080:80 MyApp
. This command will create and run a container from the MyApp
image and map the port 8080
on host machine to the port 80
of the container. Port mapping is required to make our application accessible from our host machine network as Docker runs on its own network in the host machine. Using this run command, our application should run just fine, and we should be able to access it at port 8080
on localhost from the host machine.
In addition to this run command, let me pass an additional command argument like: docker run -p 8080:80 MyApp -g worker_processes 2;
. Here I have added -g worker_processes 2;
at the end of the run command which will now override the command that I have defined inside my CMD
directive in the Dockerfile. So, the full command that will run at the startup will be nginx -g worker_processes 2;
.
What if there is no CMD
directive and I have only ENTRYPOINT ["nginx", "-g", "daemon off;"]
? In this case, the full command to run at startup will be: nginx -g daemon off;
as defined. But if I pass the additional arguments while running the container, the additional arguments will get appended to it. Like: nginx -g daemon off -g worker_processes 2;
.
What if there is no ENTRYPOINT
and I have only CMD ["nginx", "-g", "daemon off;"]
? In this case also, the full startup command will be: nginx -g daemon off;
. Now, if I pass the additional arguments as in the above example, the whole startup command gets overridden and the final startup command becomes: -g worker_processes 2;
.
So, the use of ENTRYPOINT
and CMD
totally depends upon the use case, and we should use it in a way to take full advantage of them. But what if I built the image but want to override the ENTRYPOINT
command at run time, is it possible? Yes, it is possible, but not recommended. For testing or debugging purposes, you can add an additional argument --entrypoint
followed by an executable while running the container to override the entrypoint commands. For example: docker run -p 8080:80 --entrypoint java MyApp
, this will override the defined entrypoint command and replace it with java
executable. All the other behaviors of CMD
and ENTRYPOINT
remain the same.
Best Practices
While writing a Dockerfile, it is very important to follow the defined guidelines and recommendations to write an optimized and secure Dockerfile. Here are a few of my best practices to follow while writing a Dockerfile:
- Keep the image lightweight - Minimize the size of the images to reduce build times and optimize runtime performance. This can be done by using lightweight base images, reducing the number of layers (each directive line on docker file creates a new layer), combining multiple RUN instructions into one, using
.dockeringore
file to ignore the unnecessary file from being copied to the image etc. - Use multi-stage builds - Using multi-stage builds also reduces the image size and improves performance. In multistage builds, we define two stages where in the first stage, a build artifact is created and only that artifact is copied to the second stage where the artifact is run with minimum runtime requirements. This helps us remove the unnecessary build dependencies and code chunks from our final images.
- Order instructions wisely - Order of instructions means the order in which we define the directives or commands in our Dockerfile. Docker uses caches extensively to improve the build performance. It caches the layers of the images and doesn't perform rebuild unless something is changed in the layer. So, it is recommended to move the frequently changing commands to the end of the Dockerfile so that we maximize caching benefits.
- Use official base image and update dependencies - There might be several versions of the same base image you are looking for offered by different maintainers, so, it is very important to verify the legitimacy of that maintainer and only use the images from verified maintainers in the hub. Also, it is important to update the packages in the base Docker image as the packages in the images might be outdated and prone to security vulnerabilities.
- Do not run as root - Whenever possible, run containers as non-root users to reduce the impact of security vulnerabilities. By default, the containers are run with root privilege, which poses significant security risks. To run container as non-root user, you may create a new user in your Dockerfile and switch to the user using
USER
directive. You may also need to adjust the file permissions if required. - Parameterize Dockerfile - Use
ARG
andENV
to parameterize your Dockerfile. TheARG
directive is used to define the<key>=<value>
arguments in the Dockerfile which allows user to pass the several arguments during the build process to change the behaviors. Also, theENV
directive is used to define the environment variables for the container that can be used by our application or other container components. TheENV
variables are defined in<key>=<value>
format, at the beginning of the Dockerfile. Furthermore, make the best use ofCMD
andENTRYPOINT
for more flexibility or control.
I've realized that this article has become a little longer 😜, so I think I should stop now. But remember that the Dockerfiles are a fundamental building block of Docker-based development and deployment workflows. By understanding their structure and following best practices, you can create efficient, reproducible, and secure Docker images for your applications. Experiment with Dockerfiles in your projects let me know your experience.
Happy Dockerizing!