Passing arguments to Dockerfiles

Introduction

When using docker as our virtualization software of choice to deploy our applications, sometimes we might want to build an image that depends on a variable parameter, for example when building images from a script and you have a changing deployment folder when constructing it.

Using the ENV keyword

The easiest way is to specify an environment variable inside the Dockerfile with the ENV keyword and then reference it from within the file. For instance, when you just need to update the version of a package and do some operations depending on that version:

FROM image  
ENV PKG_VERSION 1.0.0  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

This PKG_VERSION was used as a build time variable but it is important to know that containers will be able to access it in runtime, which may lead to problems.

Using the ARG keyword

The ARG keyword defines a variable that users can access at build time when constructing the image using the --build-arg <variable>=<value> option and then referencing it inside the Dockerfile. In the previous example the same result could be achieved by executing:

λ docker build -t my-image-name --build-arg PKG_VERSION=1.0.0 $PWD

Dockerfile:

FROM image  
ARG PKG_VERSION  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

And in this case, the PKG_VERSION variable would only live during the build process, being unreachable from within the containers. Additionally, it is also possible to set ARG with a default value in case no --build--arg is specified:

Dockerfile:

FROM image  
ARG PKG_VERSION=1.0.0  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

Of course, you could also define an environment variable that would depend on a value passed as an argument:

λ docker build -t my-image-name --build-arg PKG_VERSION=1.0.0 $PWD

Dockerfile:

FROM image  
ARG VERSION_ARG=1.0.0  
ENV PKG_VERSION=$VERSION_ARG  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

Now the passed argument VERSIONARG will be available as the PKGVERSION environment variable from within the container.

Moreover, if container's environment variables are preferred to be declared in runtime, it can be done easily when running it:

λ docker run -e ENV=development -e TIMEOUT=300 -e EXPORT_PATH=/exports ruby


Have fun!

Automatically loading json files to ElasticSearch

Introduction

Right now at work I am working in the (Big Data Europe) project, a joint effort or several organizations and enterprises accross Europe to create a platform that provides a set of software services that allow to implement big data pipelines, with minimal effort compared to other stacks and in an extremely cost effective way.

This is to make any company or organization that wants to make sense of their data using Big Data an easy starter to play around with the technologies that allow so.

One of the pieces I developed is mu-bde-logging, a standalone system that allows to log HTTP traffic from running docker containers and post it into an EL(K) stack in real time for further visualization.

Last requirement was to add the possibility to replay old traffic backups and post them into the ElasticSearch instance to visualize them offline in Kibana.

So I wrote a small script to scan for the transformed .ha files (json format) in a given folder and replay them into the ElasticSearch container.
Since ElasticSearch & Kibana containers are part of a docker-compose.yml project, I didn't care much about being generic and used the name the docker-compose script will give the containers, but it is easy to change & extend.

The Code

This is what I came up with:

#!/usr/bin/env bash

#/ Usage: ./backup_replay.sh <backups_folder>
#/ Description: Run ElasticSearch and Kibana standalone and post every enriched .har file in the backups folder to ElasticSearch.
#/ Examples: ./backup_replay.sh backups/
#/ Options:
#/     --help: Display this help message
usage() { grep '^#/' "$0" | cut -c4- ; exit 0 ; }  
expr "$*" : ".*--help" > /dev/null && usage

BACKUP_DIR="../backups/"

# Convenience logging function.
info()    { echo "[INFO]    $@"  ; }

cleanup() {  
  true;
}

# Poll the ElasticSearch container
poll() {  
  local elasticsearch_ip="$1"
  local result=$(curl -XGET http://${elasticsearch_ip}:9200 -I 2>/dev/null | head -n 1 | awk '{ print $2 }')

  if [[ $result == "200" ]]; then
    return 1 # ElasticSearch is up.
  else
    return 0 # It will execute as long as the return code is zero.
  fi
}

# Parse Parameters
while [ "$#" -gt 1 ];  
  do
  key="$1"

  case $key in
      -f|--folder)
      BACKUP_DIR="$2" # EXAMPLE
      shift
      ;;
      --default)
      default=YES
      ;;
    *)
    ;;
  esac
  shift
done


if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then  
    trap cleanup EXIT

    # Start Elasticsearch & Kibana with docker Compose
    which docker-compose >/dev/null
    if (( $(echo $?) == "0" )); then
      docker-compose up -d elasticsearch kibana
    else
      info "Install docker-compose!"
      exit -1
    fi

    # Poll ElasticSearch until it is up and we can post hars to it.
    elasticsearch_ip=$(docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mubdelogging_elasticsearch_1)

    info "ElasticSearch container ip: ${elasticsearch_ip}"

    while poll ${elasticsearch_ip}
    do
      info "ElasticSearch is not up yet."
      sleep 2
    done

    # Find all .trans.har files in the specified backups folder/
    # Per each one, pos to ElasticSearch.
    info "Ready to work!"
    info "POST all enriched hars "
    find ${BACKUP_DIR} -name "*.trans.har" | sed 's/^/@/g' | xargs -i /bin/bash -c "sleep 0.5; curl -XPOST 'http://$elasticsearch_ip:9200/hars/har?pretty' --data-binary {}"
fi  

Have fun!