Understanding HTML terminology

Introduction

I know, I know. "But Esteban, this article would have been useful 20 years ago, now it is a little outdated to say the least". I cannot disagree with that, but this kind of posts serve more as a reminder to me, and also perhaps to satisfy your curiosity. I don't have a comments section, but feel free to reach me out over twitter, email or linkedin.

History

When The Web was born, it did it as a system of internet servers that would allow to access documents via a Web Browser. From those documents you could access others via links, as well as other formats like graphics, or video, or audio, conforming a big network of interconnected documents called the web.

Web Browsers (at the time) were simple programs running on users' machines that would fetch the document from a web server, read it, and show it in user's screen.

HTML was born as a declarative way to structure those documents to tell the web browsers how they should paint and show the documents. HTML stands for HyperText Markup Language. Hypertext describes the ability to link to other documents from the current one, and markup defines the structure of a web page or a web application. An example document would be:

<html>  
  <head>
    <title>My Website</title>
  </head>
  <body>
    <section>
      <p>My paragraph</p>
      <p>My other paragraph</p>
    </section>
  </body>
</html>  

So the web browser would fetch that document from some server somewhere, and start interpreting it's content: "Oh hey, this is an HTML document! and.. oh yes, it has a title of My Title, so I will write that into the tab title, and also I see you have a document body, with a paragraph inside a section!, so I will paint that paragraph. But then I have another paragraph, so this means they must be separated by an empty line, since they are different blocks."

One important characteristic about HTML is that it is not strictly parsed. It means that in the event of receiving wrong code, for instance an unclosed tag, the web browser won't fail to load and show the page but will do the best it can to correct the mistake and paint the document.

So the author of the document wrote that "code" seen before, and the user who visited the web page of the author would see this:

It really is as simple as that. Without taking into account CSS to style those documents, giving them colours, shapes, you name it, and javascript to interact with them, websites were just are text documents written in a concrete way.

SGML

The Standard Generalized Markup Language came before HTML. One could say that HTML was derived from SGML although they were developed more or less in parallel. HTML would focus more on how the data reflected in the document looks. SGML is more generic, it is a (meta)language to define other markup languages, while in HTML you have a limited set of tags that define the structure of the document.

With SGML, you would need to specify:

  • The SGML declaration, enumerating the characters and delimiters that may appear in the application. You can find the charset declaration for HTML 4.0 here.
  • The Document Type Definition, defining the syntax of the markup constructs, for example:
<!DOCTYPE tvguide [  
<!ELEMENT tvguide - - (date,channel+)>  
<!ELEMENT date - - (#PCDATA)>  
<!ELEMENT channel - - (channel_name,format?,program*)>  
<!ATTLIST channel teletext (yes|no) "no">  
<!ELEMENT format - - (#PCDATA)>  
<!ELEMENT program - - (name,start_time,(end_time|duration))>  
<!ATTLIST program  
     min_age CDATA #REQUIRED
     lang CDATA "es">
<!ELEMENT name - - (#PCDATA)>  
<!ELEMENT start_time - - (#PCDATA)>  
<!ELEMENT end_time - - (#PCDATA)>  
<!ELEMENT duration - - (#PCDATA)>  
]>
  • A specification describing the semantics of the markup.
  • Document instances containing data and markup.

XML

XML was based on SGML and was designed to describe a set of rules that encode data in both a human readable and machine-readable formats. It was thought to focus primarily on what the data is, rather on how it is represented.

That is why it is used often as the exchange data format accross services over the internet. An example of XML would be: (examples taken from https://www.w3schools.com)

<?xml version="1.0" encoding="UTF-8"?>  
<note>  
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>  
<note>  
  <to>John</to>
  <from>Smith</from>
  <heading>Reminder</heading>
  <body>Who cares right?</body>
</note>  

The first line specifies the version and encoding. The rest of the document represents two notes, with information associated to them. A service can receive this xml document, parse it and do something with the data.

XML documents also can have a DTD just as the SGML documents, formalizing the structure of the document and describing a common ground to exchange the data for multiple users. You can add a DTD adding this line to the document: <!DOCTYPE note SYSTEM "Note.dtd">

Note.dtd:

<!DOCTYPE note  
[
<!ELEMENT note (to,from,heading,body)>  
<!ELEMENT to (#PCDATA)>  
<!ELEMENT from (#PCDATA)>  
<!ELEMENT heading (#PCDATA)>  
<!ELEMENT body (#PCDATA)>  
]>

XML documents have other features like XML Schema, XML namespaces or XPath to name a few, but they are specific mechanisms for the xml documents. More information on w3schools

XHTML

XHTML is simply HTML but expressed as valid XML. It has the same functionality, but is compliant with the most strict representations of the XML standard. This means that rules that were overlooked for HTML if they when not followed, must adhere to the strict set of rules of XML. For example:

  • In HTML you can write <br>, in XHTML it must be <br></br> or <br/> or <br />.
  • In HTML you can write <em><strong>Texto</em></strong>, in XHTML it has to be <em><strong>Texto</strong></em>, following the correct opening/closing order.

DHTML

This term was introduced by Microsoft when Internet Explorer 4 came out and has no clear meaning. DHTML (Dynamic HTML) encompasses the set of technologies that allow to create interactive and animated web sites. This means that a site made with HTML, styled with CSS and additional interactivity accomplished using Javascript would fall into the DHTML category.

Passing arguments to Dockerfiles

Introduction

When using docker as our virtualization software of choice to deploy our applications, sometimes we might want to build an image that depends on a variable parameter, for example when building images from a script and you have a changing deployment folder when constructing it.

Using the ENV keyword

The easiest way is to specify an environment variable inside the Dockerfile with the ENV keyword and then reference it from within the file. For instance, when you just need to update the version of a package and do some operations depending on that version:

FROM image  
ENV PKG_VERSION 1.0.0  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

This PKG_VERSION was used as a build time variable but it is important to know that containers will be able to access it in runtime, which may lead to problems.

Using the ARG keyword

The ARG keyword defines a variable that users can access at build time when constructing the image using the --build-arg <variable>=<value> option and then referencing it inside the Dockerfile. In the previous example the same result could be achieved by executing:

λ docker build -t my-image-name --build-arg PKG_VERSION=1.0.0 $PWD

Dockerfile:

FROM image  
ARG PKG_VERSION  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

And in this case, the PKG_VERSION variable would only live during the build process, being unreachable from within the containers. Additionally, it is also possible to set ARG with a default value in case no --build--arg is specified:

Dockerfile:

FROM image  
ARG PKG_VERSION=1.0.0  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

Of course, you could also define an environment variable that would depend on a value passed as an argument:

λ docker build -t my-image-name --build-arg PKG_VERSION=1.0.0 $PWD

Dockerfile:

FROM image  
ARG VERSION_ARG=1.0.0  
ENV PKG_VERSION=$VERSION_ARG  
RUN curl http://my.cdn.com/package-$PKG_VERSION.zip  

Now the passed argument VERSIONARG will be available as the PKGVERSION environment variable from within the container.

Moreover, if container's environment variables are preferred to be declared in runtime, it can be done easily when running it:

λ docker run -e ENV=development -e TIMEOUT=300 -e EXPORT_PATH=/exports ruby


Have fun!

Automatically loading json files to ElasticSearch

Introduction

Right now at work I am working in the (Big Data Europe) project, a joint effort or several organizations and enterprises accross Europe to create a platform that provides a set of software services that allow to implement big data pipelines, with minimal effort compared to other stacks and in an extremely cost effective way.

This is to make any company or organization that wants to make sense of their data using Big Data an easy starter to play around with the technologies that allow so.

One of the pieces I developed is mu-bde-logging, a standalone system that allows to log HTTP traffic from running docker containers and post it into an EL(K) stack in real time for further visualization.

Last requirement was to add the possibility to replay old traffic backups and post them into the ElasticSearch instance to visualize them offline in Kibana.

So I wrote a small script to scan for the transformed .ha files (json format) in a given folder and replay them into the ElasticSearch container.
Since ElasticSearch & Kibana containers are part of a docker-compose.yml project, I didn't care much about being generic and used the name the docker-compose script will give the containers, but it is easy to change & extend.

The Code

This is what I came up with:

#!/usr/bin/env bash

#/ Usage: ./backup_replay.sh <backups_folder>
#/ Description: Run ElasticSearch and Kibana standalone and post every enriched .har file in the backups folder to ElasticSearch.
#/ Examples: ./backup_replay.sh backups/
#/ Options:
#/     --help: Display this help message
usage() { grep '^#/' "$0" | cut -c4- ; exit 0 ; }  
expr "$*" : ".*--help" > /dev/null && usage

BACKUP_DIR="../backups/"

# Convenience logging function.
info()    { echo "[INFO]    $@"  ; }

cleanup() {  
  true;
}

# Poll the ElasticSearch container
poll() {  
  local elasticsearch_ip="$1"
  local result=$(curl -XGET http://${elasticsearch_ip}:9200 -I 2>/dev/null | head -n 1 | awk '{ print $2 }')

  if [[ $result == "200" ]]; then
    return 1 # ElasticSearch is up.
  else
    return 0 # It will execute as long as the return code is zero.
  fi
}

# Parse Parameters
while [ "$#" -gt 1 ];  
  do
  key="$1"

  case $key in
      -f|--folder)
      BACKUP_DIR="$2" # EXAMPLE
      shift
      ;;
      --default)
      default=YES
      ;;
    *)
    ;;
  esac
  shift
done


if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then  
    trap cleanup EXIT

    # Start Elasticsearch & Kibana with docker Compose
    which docker-compose >/dev/null
    if (( $(echo $?) == "0" )); then
      docker-compose up -d elasticsearch kibana
    else
      info "Install docker-compose!"
      exit -1
    fi

    # Poll ElasticSearch until it is up and we can post hars to it.
    elasticsearch_ip=$(docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mubdelogging_elasticsearch_1)

    info "ElasticSearch container ip: ${elasticsearch_ip}"

    while poll ${elasticsearch_ip}
    do
      info "ElasticSearch is not up yet."
      sleep 2
    done

    # Find all .trans.har files in the specified backups folder/
    # Per each one, pos to ElasticSearch.
    info "Ready to work!"
    info "POST all enriched hars "
    find ${BACKUP_DIR} -name "*.trans.har" | sed 's/^/@/g' | xargs -i /bin/bash -c "sleep 0.5; curl -XPOST 'http://$elasticsearch_ip:9200/hars/har?pretty' --data-binary {}"
fi  

Have fun!

Sed tricked me!

Introduction

Today I had some time free at work since I am between projects and I wait for some additional information, and I took advantage of it to help a coworker that was new to Ember.js. For some reason all the calls to the backend (a Virtuoso Database) failed.

Taking a look together, we discovered that the middleware that transformed JSON-API calls into SPARQL queries and vice-versa, (the piece that was talking directly to the frontend) was consistently returning an error 500, and this happened because the triples that were introduced into the database using a script were generated wrong and had all the same id for every different model.

Let's say that you have a file with this structure:

<url1:concept1> <predicate1> <foo> ;  
    <predicate2> <bar> ;
    <predicate3> <baz> .

<url1:concept2> <predicate1> <foo> ;  
    <predicate2> <bar> ;
    <predicate3> <baz> .

And you want to detect each "foo" ocurrence and add a unique identifier, generated for example with the bash uuidgen utility, At the beginning this was the code:

blog λ cat example.txt | sed "s/foo/$(uuidgen)/g"  
<url1:concept1> <predicate1> <66fa7661-889f-4ed5-b74d-540e18b9a83d> ;  
        <predicate2> <bar> ;
        <predicate3> <baz> .

<url1:concept2> <predicate1> <66fa7661-889f-4ed5-b74d-540e18b9a83d> ;  
        <predicate2> <bar> ;
        <predicate3> <baz> .

But then the uuid was generated only once, and substituted in all occurrences. We need to substitute each new "foo" appearance by a different uuid each time! Ah but sed allows you to pass an external command per each match, so you could do this in theory:

blog λ cat a.txt  
foo  
foo  
c  
d

blog λ cat a.txt | sed "s/foo/echo $(uuidgen)/ge"  
8b8cc7ac-b089-4339-875c-76a5278b594a  
8b8cc7ac-b089-4339-875c-76a5278b594a  
b  
c  
d  

Damn!, sed still only evaluates the command call once and does the substitution for each occurrence! But what if we manage to execute a command that depends on an external random source to generate the uuid? I found a little snippet here.

So we try adapting it to work with sed:

blog λ cat a.txt | sed "s^foo^cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-32} | head -n 1^ge"  
13YFcSxzshlFocig6AdA7yEbHeSKYq4r  
6jhDEL3x3yDUsOf6mqScrea29YNDDURy  
b  
c  
d

Nice, now each occurrence of "foo" is replaced by a random string. So let's try with the original file example.txt:

blog λ cat example.txt | sed "s^foo^cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-32} | head -n 1^ge"  
sh: 1: Syntax error: redirection unexpected

        <predicate2> <bar> ;
        <predicate3> <baz> .

sh: 1: Syntax error: redirection unexpected

        <predicate2> <bar> ;
        <predicate3> <baz> .

Argh, why the hell this happens?.. redirections are with the "<" character in the unix shell. Oh wait, could it be that sed is not only taking the exact match but the whole line? or the characters next to it? Let's verify it. Let's say that now this is a.txt:

blog λ cat a.txt  
foo < /etc/passwd  
foo  
b  
c  
d

blog λ cat a.txt | sed "s/foo/cat/ge"  
root:x:0:0:root:/root:/bin/bash  
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin  
... etc ...
b  
c  
d  

Yes, it is substituting "foo" by cat but receives also the "< /etc/passwd" and interprets not as text but as part of the commmand to execute inside sed.

The Solution

The solution came using awk. This is the line that did the trick, it will add a specific triple with a new uuid for each .

blog λ cat example.txt | awk '1;/foo/{command="uuidgen";command | getline uuidgen;close(command); print "\t<http://our.namespace.url> \"" uuidgen "\" ;"}'  
<url1:concept1> <predicate1> <foo> ;  
        <http://our.namespace.url> "802a44bd-c28f-4856-b275-e24c666308c8" ;
        <predicate2> <bar> ;
        <predicate3> <baz> .

<url1:concept2> <predicate1> <foo> ;  
        <http://our.namespace.url> "6b8cbac2-70c4-4769-b065-5aa36af797a4" ;
        <predicate2> <bar> ;
        <predicate3> <baz> .

Special thanks to my colleague @wdullaer for asking me to help him out with Ember and we end up having fun with sed & awk.

Have fun!

My experience in JSCONF Belgium 2017.

This Thursday 29th June the Jsconf.be conference was celebrated in the beautiful city of Brugge, and I had the chance of going there! I had already taken a look at the speakers and I was interested in at least four talks so it was more than worth going.

The conferences started in the afternoon and where structured in two tracks not following any topic in particular, so I had to sacrifice some.

Keynote

There was a big delay on the trains so I couldn't watch this piece in it's whole, the speaker Peter Paul Koch was already halfway through his talk when I arrived. The remaining of the keynote was extremely interesting.

He talked about the all well known obesity of the web, pinpointing how the indiscriminated use of frameworks, build tools, libraries, was contaminating the web development environment, generating heavier and heavier websites that would take several seconds to download and render in high end devices, making almost impossible to access these sites to a big part of the world's population, having access to low end devices, unstable and sloppy network connections and not the highest throughput environments that we are used to in first world countries.

All this bloat is partially originated due to the advent of web developers trying to emulate the fully functionality of native desktop applications using web technologies.

Also, the uncertainty that frontend developers have to deal with when developing software in such an aggressive and hostile environment as the web was brought to the table. Maybe the reason why the so called javascript fatigue is partly originated because we frontend developers try to be taken seriously by overcreating tooling and patterns that exponentially increase the complexity of software.

Every idea was reflected in a deep and encouraging attitude. Far from being a rant, it was inspiring and full of hope towards the web. I very much enjoyed it I am already following the speaker to see what he will be up to.

Reactive Programming By Example.

A high level introduction to Reactive Programming by a couple of speakers: Lander Verhack and his boss. Using a funny and engaging way of question-answer format between them they gave a very simple overview on what is reactive programming by creating a fake page that would provide a simple search engine to query the Spotify API and show a list of songs, artists and albums.

At work now I work primarily with Ember.js, so the concepts explained in this talk reminded me very closely of the ember observers and computed properties, built on top of observers.

The Era of Module Bundlers.

Arun DSouza provided a walkthrough accross task runners, build tools, and module bundlers, enumerating the most commonly used ones, either today or historically: grunt, gulp, browserify and webpack were several examples.

At the end he focused on webpack for being the most complete tool, incorporating task running, bundling, minifying, and several other tasks in a single environment. Despite of being intersting, this talk did not provide me anything new other than learning some specifics of Webpack in order to do this or that task.

How AI saved my life.

I really really enjoyed this talk. Nick Trogh, evangelist in Microsoft BeLux gave an introduction to the super cool Microsoft Cognitive Services API in Azure and using a website gave examples on how users can interact with this API.

The first example made use of the Text Analytics API to extract information about the sentiment, the subjective opinion of the emitter of the text. Being closer to zero negative and to one positive.

Then the Computer Vision API came along. Providing an initial small set of example images from a football team, the API detected if a new person belonged to that football team, also their age, emotion (Emotion API), gender, and a range of other parameters. All with a very close precision. It was amazing, there is a lot of potential on opening this kind of services to be used by third parties.

Enterprise Javascript.

I think one of my tweets pretty much summarizes the general idea: Oracle bundled jQuery+knockout.js+require.js+cordova = boom, OracleJET. We will see the usage in the future!

Yet another frontend javascript framework (Peter Koch warned us!) to add to our collection. Oracle jumped on the frontend train with an open source hotchpotch of existing technologies wanting to generate community around it.

It has several interesting points: it wants to adapt Web Components, it is responsive by default, it provides internationalization out of the box... Only time will say where this new tool falls in.

How I hacked my coffee machine with JS.

The time the speker spent talking was soaking wet in pure awesomeness. Dominik Kundel explained the empowering process of being initially bored and deciding to crack open your flatmate's coffee machine to reverse engineer the microcontroller and learn how the buttons worked and communicated within.

Once having done that he could hook a Tessel microcontroller to the machine and develop a small service to allow him to start brewing coffee remotely.

Really cool home hacking project!

Conclusions.

The overall level of the talks was introductory but I enjoyed almost all the talks very much. The one explaining Microsoft Cognitive Services API and the Coffee Machine Hacking were on the top, and also sparked my interest in Reactive Programming, I will have to look into that too.

Overall, it has been fun and enriching to attend, I bring several ideas to test at home. As a fun fact, looking around I didn't find a single linux user in the whole conference, I think I just saw a guy with emacs open! :P