Ember & nginx docker deployment with multi-stage builds

Introduction

At work we use docker as the virtualization technology of choice to perform our project's deployments. We follow a microservices architecture that allows us to do rapid development & testing, quickly trying new ideas and iterating on new functionality using a
modular approach, choosing the best language/framework that best adapts to our needs for each
particular use case.

The Problem

Our fronted stack consists of Ember.js happily running in an nginx server inside a docker container. The initial building & deployment process that we had was effective but a little cumbersome. I will use TenForce's webcat repository as example.

Initially the Ember application is built via command line (ember build --prod), generating a dist.zip file. The file is then uploaded to the repository releases with a new tag assigned.

Afterwards, when building the nginx docker image, from the Dockerfile we detect the current version of the frontend reading it from the package.json file and fetch it from the github releases, unpacking the zip file contents into the nginx serving directory.

The Dockerfile is self explanatory:

FROM semtech/mu-nginx-spa-proxy

MAINTAINER Aad Versteden <madnificent@gmail.com>

RUN apt-get update; apt-get upgrade -y; apt-get install -y unzip wget;
COPY package.json /package.json
RUN mkdir /app; cd /app; wget https://github.com/tenforce/webcat/releases/download/v$(cat /package.json | grep version | head -n 1 | awk -F: '{ print $2 }' | sed 's/[ ",]//g')/dist.zip
RUN cd /app; unzip dist.zip; mv dist/* .
RUN rm /app/dist.zip package.json

Now this has two problems:

  • We have to manually build the ember application and upload it to the github releases url.
  • Builds are not deterministic since each person has their own node, npm, bower & ember-cli combination. This has already accounted for some time lost looking on why seemingly identic builds some failed and some not.

The Solution

The solution came by using a combination of two new approaches:

  1. Using a docker image with node,npm, bower & ember-cli installed, therefore guaranteeing that every build would be with the same versions.
  2. Using Docker's multi-stage builds. Simply put, it allows to use the output of a given image as the input of the next one , avoiding fat images and simplifying the building process.

The first part is achieved by using the docker-ember image, ensuring fixed versions for the build tools:

FROM ubuntu:16.04
MAINTAINER Aad Versteden <madnificent@gmail.com>

# Install nodejs as per http://askubuntu.com/questions/672994/how-to-install-nodejs-4-on-ubuntu-15-04-64-bit-edition
RUN apt-get -y update; apt-get -y install wget python build-essential git libfontconfig
RUN wget -qO- https://deb.nodesource.com/setup_7.x > node_setup.sh
RUN bash node_setup.sh
RUN apt-get -y install nodejs
RUN npm install -g bower@1.7.9
RUN echo '{ "allow_root": true }' > /root/.bowerrc
RUN npm install -g ember-cli@2.14.0

WORKDIR /app

The second part is achieved by using the multi-stage build in the process, building the ember app and copying the resulting dist output folder inside nginx's serving directory.

FROM madnificent/ember:2.14.0 as ember
MAINTAINER Esteban Sastre <esteban.sastre@tenforce.com>

COPY . /app
RUN npm install && bower install
RUN ember build

FROM semtech/mu-nginx-spa-proxy
COPY --from=ember /app/dist /app

This way, all the building process is limited to a simple docker build .

Have fun!

Understanding HTML terminology

Introduction

I know, I know. "But Esteban, this article would have been useful 20 years ago, now it is a little outdated to say the least". I cannot disagree with that, but this kind of posts serve more as a reminder to me, and also perhaps to satisfy your curiosity. I don't have a comments section, but feel free to reach me out over twitter, email or linkedin.

History

When The Web was born, it did it as a system of internet servers that would allow to access documents via a Web Browser. From those documents you could access others via links, as well as other formats like graphics, or video, or audio, conforming a big network of interconnected documents called the web.

Web Browsers (at the time) were simple programs running on users' machines that would fetch the document from a web server, read it, and show it in user's screen.

HTML was born as a declarative way to structure those documents to tell the web browsers how they should paint and show the documents. HTML stands for HyperText Markup Language. Hypertext describes the ability to link to other documents from the current one, and markup defines the structure of a web page or a web application. An example document would be:

<html>
  <head>
    <title>My Website</title>
  </head>
  <body>
    <section>
      <p>My paragraph</p>
      <p>My other paragraph</p>
    </section>
  </body>
</html>

So the web browser would fetch that document from some server somewhere, and start interpreting it's content: "Oh hey, this is an HTML document! and.. oh yes, it has a title of My Title, so I will write that into the tab title, and also I see you have a document body, with a paragraph inside a section!, so I will paint that paragraph. But then I have another paragraph, so this means they must be separated by an empty line, since they are different blocks."

One important characteristic about HTML is that it is not strictly parsed. It means that in the event of receiving wrong code, for instance an unclosed tag, the web browser won't fail to load and show the page but will do the best it can to correct the mistake and paint the document.

So the author of the document wrote that "code" seen before, and the user who visited the web page of the author would see this:

It really is as simple as that. Without taking into account CSS to style those documents, giving them colours, shapes, you name it, and javascript to interact with them, websites were just are text documents written in a concrete way.

SGML

The Standard Generalized Markup Language came before HTML. One could say that HTML was derived from SGML although they were developed more or less in parallel. HTML would focus more on how the data reflected in the document looks. SGML is more generic, it is a (meta)language to define other markup languages, while in HTML you have a limited set of tags that define the structure of the document.

With SGML, you would need to specify:

  • The SGML declaration, enumerating the characters and delimiters that may appear in the application. You can find the charset declaration for HTML 4.0 here.
  • The Document Type Definition, defining the syntax of the markup constructs, for example:
<!DOCTYPE tvguide [
<!ELEMENT tvguide - - (date,channel+)>
<!ELEMENT date - - (#PCDATA)>
<!ELEMENT channel - - (channel_name,format?,program*)>
<!ATTLIST channel teletext (yes|no) "no">
<!ELEMENT format - - (#PCDATA)>
<!ELEMENT program - - (name,start_time,(end_time|duration))>
<!ATTLIST program
     min_age CDATA #REQUIRED
     lang CDATA "es">
<!ELEMENT name - - (#PCDATA)>
<!ELEMENT start_time - - (#PCDATA)>
<!ELEMENT end_time - - (#PCDATA)>
<!ELEMENT duration - - (#PCDATA)>
]>
  • A specification describing the semantics of the markup.
  • Document instances containing data and markup.

XML

XML was based on SGML and was designed to describe a set of rules that encode data in both a human readable and machine-readable formats. It was thought to focus primarily on what the data is, rather on how it is represented.

That is why it is used often as the exchange data format accross services over the internet. An example of XML would be: (examples taken from https://www.w3schools.com)

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>
<note>
  <to>John</to>
  <from>Smith</from>
  <heading>Reminder</heading>
  <body>Who cares right?</body>
</note>

The first line specifies the version and encoding. The rest of the document represents two notes, with information associated to them. A service can receive this xml document, parse it and do something with the data.

XML documents also can have a DTD just as the SGML documents, formalizing the structure of the document and describing a common ground to exchange the data for multiple users. You can add a DTD adding this line to the document: <!DOCTYPE note SYSTEM "Note.dtd">

Note.dtd:

<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>

XML documents have other features like XML Schema, XML namespaces or XPath to name a few, but they are specific mechanisms for the xml documents. More information on w3schools

XHTML

XHTML is simply HTML but expressed as valid XML. It has the same functionality, but is compliant with the most strict representations of the XML standard. This means that rules that were overlooked for HTML if they when not followed, must adhere to the strict set of rules of XML. For example:

  • In HTML you can write <br>, in XHTML it must be <br></br> or <br/> or <br />.
  • In HTML you can write <em><strong>Texto</em></strong>, in XHTML it has to be <em><strong>Texto</strong></em>, following the correct opening/closing order.

DHTML

This term was introduced by Microsoft when Internet Explorer 4 came out and has no clear meaning. DHTML (Dynamic HTML) encompasses the set of technologies that allow to create interactive and animated web sites. This means that a site made with HTML, styled with CSS and additional interactivity accomplished using Javascript would fall into the DHTML category.

Passing arguments to Dockerfiles

Introduction

When using docker as our virtualization software of choice to deploy our applications, sometimes we might want to build an image that depends on a variable parameter, for example when building images from a script and you have a changing deployment folder when constructing it.

Using the ENV keyword

The easiest way is to specify an environment variable inside the Dockerfile with the ENV keyword and then reference it from within the file. For instance, when you just need to update the version of a package and do some operations depending on that version:

FROM image
ENV PKG_VERSION 1.0.0
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip

This PKG_VERSION was used as a build time variable but it is important to know that containers will be able to access it in runtime, which may lead to problems.

Using the ARG keyword

The ARG keyword defines a variable that users can access at build time when constructing the image using the --build-arg <variable>=<value> option and then referencing it inside the Dockerfile. In the previous example the same result could be achieved by executing:

λ docker build -t my-image-name --build-arg PKG_VERSION=1.0.0 $PWD

Dockerfile:

FROM image
ARG PKG_VERSION
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip

And in this case, the PKG_VERSION variable would only live during the build process, being unreachable from within the containers. Additionally, it is also possible to set ARG with a default value in case no --build--arg is specified:

Dockerfile:

FROM image
ARG PKG_VERSION=1.0.0
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip

Of course, you could also define an environment variable that would depend on a value passed as an argument:

λ docker build -t my-image-name --build-arg PKG_VERSION=1.0.0 $PWD

Dockerfile:

FROM image
ARG VERSION_ARG=1.0.0
ENV PKG_VERSION=$VERSION_ARG
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip

Now the passed argument VERSION_ARG will be available as the PKG_VERSION environment variable from within the container.

Moreover, if container's environment variables are preferred to be declared in runtime, it can be done easily when running it:

λ docker run -e ENV=development -e TIMEOUT=300 -e EXPORT_PATH=/exports ruby

Have fun!

Automatically loading json files to ElasticSearch

Introduction

Right now at work I am working in the (Big Data Europe) project, a joint effort or several organizations and enterprises accross Europe to create a platform that provides a set of software services that allow to implement big data pipelines, with minimal effort compared to other stacks and in an extremely cost effective way.

This is to make any company or organization that wants to make sense of their data using Big Data an easy starter to play around with the technologies that allow so.

One of the pieces I developed is mu-bde-logging, a standalone system that allows to log HTTP traffic from running docker containers and post it into an EL(K) stack in real time for further visualization.

Last requirement was to add the possibility to replay old traffic backups and post them into the ElasticSearch instance to visualize them offline in Kibana.

So I wrote a small script to scan for the transformed .ha files (json format) in a given folder and replay them into the ElasticSearch container.
Since ElasticSearch & Kibana containers are part of a docker-compose.yml project, I didn't care much about being generic and used the name the docker-compose script will give the containers, but it is easy to change & extend.

The Code

This is what I came up with:

#!/usr/bin/env bash

#/ Usage: ./backup_replay.sh <backups_folder>
#/ Description: Run ElasticSearch and Kibana standalone and post every enriched .har file in the backups folder to ElasticSearch.
#/ Examples: ./backup_replay.sh backups/
#/ Options:
#/     --help: Display this help message
usage() { grep '^#/' "$0" | cut -c4- ; exit 0 ; }
expr "$*" : ".*--help" > /dev/null && usage

BACKUP_DIR="../backups/"

# Convenience logging function.
info()    { echo "[INFO]    $@"  ; }

cleanup() {
  true;
}

# Poll the ElasticSearch container
poll() {
  local elasticsearch_ip="$1"
  local result=$(curl -XGET http://${elasticsearch_ip}:9200 -I 2>/dev/null | head -n 1 | awk '{ print $2 }')

  if [[ $result == "200" ]]; then
    return 1 # ElasticSearch is up.
  else
    return 0 # It will execute as long as the return code is zero.
  fi
}

# Parse Parameters
while [ "$#" -gt 1 ];
  do
  key="$1"

  case $key in
      -f|--folder)
      BACKUP_DIR="$2" # EXAMPLE
      shift
      ;;
      --default)
      default=YES
      ;;
    *)
    ;;
  esac
  shift
done


if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then
    trap cleanup EXIT

    # Start Elasticsearch & Kibana with docker Compose
    which docker-compose >/dev/null
    if (( $(echo $?) == "0" )); then
      docker-compose up -d elasticsearch kibana
    else
      info "Install docker-compose!"
      exit -1
    fi

    # Poll ElasticSearch until it is up and we can post hars to it.
    elasticsearch_ip=$(docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mubdelogging_elasticsearch_1)

    info "ElasticSearch container ip: ${elasticsearch_ip}"

    while poll ${elasticsearch_ip}
    do
      info "ElasticSearch is not up yet."
      sleep 2
    done

    # Find all .trans.har files in the specified backups folder/
    # Per each one, pos to ElasticSearch.
    info "Ready to work!"
    info "POST all enriched hars "
    find ${BACKUP_DIR} -name "*.trans.har" | sed 's/^/@/g' | xargs -i /bin/bash -c "sleep 0.5; curl -XPOST 'http://$elasticsearch_ip:9200/hars/har?pretty' --data-binary {}"
fi

Have fun!

Sed tricked me!

Introduction

Today I had some time free at work since I am between projects and I wait for some additional information, and I took advantage of it to help a coworker that was new to Ember.js. For some reason all the calls to the backend (a Virtuoso Database) failed.

Taking a look together, we discovered that the middleware that transformed JSON-API calls into SPARQL queries and vice-versa, (the piece that was talking directly to the frontend) was consistently returning an error 500, and this happened because the triples that were introduced into the database using a script were generated wrong and had all the same id for every different model.

Let's say that you have a file with this structure:

<url1:concept1> <predicate1> <foo> ;
	<predicate2> <bar> ;
	<predicate3> <baz> .

<url1:concept2> <predicate1> <foo> ;
	<predicate2> <bar> ;
	<predicate3> <baz> .

And you want to detect each "foo" ocurrence and add a unique identifier, generated for example with the bash uuidgen utility, At the beginning this was the code:

blog λ cat example.txt | sed "s/foo/$(uuidgen)/g"
<url1:concept1> <predicate1> <66fa7661-889f-4ed5-b74d-540e18b9a83d> ;
        <predicate2> <bar> ;
        <predicate3> <baz> .

<url1:concept2> <predicate1> <66fa7661-889f-4ed5-b74d-540e18b9a83d> ;
        <predicate2> <bar> ;
        <predicate3> <baz> .

But then the uuid was generated only once, and substituted in all occurrences. We need to substitute each new "foo" appearance by a different uuid each time! Ah but sed allows you to pass an external command per each match, so you could do this in theory:

blog λ cat a.txt
foo
foo
c
d

blog λ cat a.txt | sed "s/foo/echo $(uuidgen)/ge"
8b8cc7ac-b089-4339-875c-76a5278b594a
8b8cc7ac-b089-4339-875c-76a5278b594a
b
c
d

Damn!, sed still only evaluates the command call once and does the substitution for each occurrence! But what if we manage to execute a command that depends on an external random source to generate the uuid? I found a little snippet here.

So we try adapting it to work with sed:

blog λ cat a.txt | sed "s^foo^cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-32} | head -n 1^ge"
13YFcSxzshlFocig6AdA7yEbHeSKYq4r
6jhDEL3x3yDUsOf6mqScrea29YNDDURy
b
c
d

Nice, now each occurrence of "foo" is replaced by a random string. So let's try with the original file example.txt:

blog λ cat example.txt | sed "s^foo^cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-32} | head -n 1^ge"
sh: 1: Syntax error: redirection unexpected

        <predicate2> <bar> ;
        <predicate3> <baz> .

sh: 1: Syntax error: redirection unexpected

        <predicate2> <bar> ;
        <predicate3> <baz> .

Argh, why the hell this happens?.. redirections are with the "<" character in the unix shell. Oh wait, could it be that sed is not only taking the exact match but the whole line? or the characters next to it? Let's verify it. Let's say that now this is a.txt:

blog λ cat a.txt
foo < /etc/passwd
foo
b
c
d

blog λ cat a.txt | sed "s/foo/cat/ge"
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
... etc ...
b
c
d

Yes, it is substituting "foo" by cat but receives also the "< /etc/passwd" and interprets not as text but as part of the commmand to execute inside sed.

The Solution

The solution came using awk. This is the line that did the trick, it will add a specific triple with a new uuid for each url:concept.

blog λ cat example.txt | awk '1;/foo/{command="uuidgen";command | getline uuidgen;close(command); print "\t<http://our.namespace.url> \"" uuidgen "\" ;"}'
<url1:concept1> <predicate1> <foo> ;
        <http://our.namespace.url> "802a44bd-c28f-4856-b275-e24c666308c8" ;
        <predicate2> <bar> ;
        <predicate3> <baz> .

<url1:concept2> <predicate1> <foo> ;
        <http://our.namespace.url> "6b8cbac2-70c4-4769-b065-5aa36af797a4" ;
        <predicate2> <bar> ;
        <predicate3> <baz> .

Special thanks to my colleague @wdullaer for asking me to help him out with Ember and we end up having fun with sed & awk.

Have fun!