Array of extended objects in python using list comprehensions and lambda functions.

Problem

It's been a while since I don't write any posts, so I thought that even though the idea might be initially quite silly, it will help me to kickoff again the habit by writing about a small problem I encountered the other day.

I was developing a very simple microservice that would receive a GET request with two parameters, issue a SPARQL query to a Virtuoso store and then, transform the returned array of objects by extending each object with the same additional meta information per object. Say:

res = [{ 'title': 'Oh boy' }, { 'title': 'Oh girl'}]

And then add some additional metadata like { 'meta': { 'author': 'Myself'}}

Ending up with

res = [ {
        'title': 'Oh boy',
        'meta':  {
          'author': 'Myself'
          }
        },
        {
          'title': 'Oh girl',
          'meta': {
            'author': 'Myself'
          }
        }]

Solution

I wanted to do something self contained and as functional as possible, by using list comprehensions for example. Unfortunately, there is no method in python to update a dictionary and return the new dictionary updated. The regular way is like:

a = { 'b': 3 }
a.update({'c': 5}) # Dict updated, does not return anything
print(a) # {'c': 5, 'b': 3}

Ultimately I came up with a small solution:

result = [(lambda x, y=z.copy(): (y.update(x), y))({ 'meta': { 'author': 'Myself' } })[1] for z in res]

Tada! Combining list comprehensions, lambda functions and the built-in dictionary copy() function we can return a new array with a copy of each object already extended.

By using a lambda function that accepts a tuple we can specify that the first argument is passed as a parameter and the second one will be a copy of each element in the array (assuming it is an object). Then, the object is extended with the argument and the newly extended object is returned as the second element of the tuple.

We could even bake this into a function:

def map_extend(array=[], ext={}):
  return [(lambda x, y=z.copy(): (y.update(x), y))(ext)[1] for z in array]
>>> res
[{'title': 'Oh boy'}, {'title': 'Oh girl'}]
>>> ext = { 'meta': {'author': 'Hola'}}                                                     
>>> map_extend(res, ext)
[{'meta': {'author': 'Hola'}, 'title': 'Oh boy'}, {'meta': {'author': 'Hola'}, 'title': 'Oh girl'}]
>>> map_extend(res, {})                                                                     
[{'title': 'Oh boy'}, {'title': 'Oh girl'}]
>>> map_extend([], {})                                                                      
[]

Have fun!

Clojure threading macros in ES6

While working in the functionality for getting a docker-compose path from the cursor position, I realized at some point I was writing constantly code like this:

let first_result = func_call1(val);
let second_result = func_call2(first_result);
let third_result = func_call3(second_result);

...etc...

This is not necessarily bad, it improves code readability and helps to reason about the flow of execution, normally better than calling those functions in a nested way:

func_call3(func_call2(func_call1(val))); // Phew.

But yet, it feels somewhat cumbersome to nest too many function calls that way. Some time ago I started looking into Clojure for fun and discovered threading macros. Thread first ('->') and Thread last(->>) macros pipe the result of the application of a function to a value to the next function in the list. The difference is that thread first adds the result as the first argument to the next function and thread last to the last one.

Wouldn't it be nice to have some kind of mechanism in Javascript?. Looking around I found someone had already written a nice article about it. The target is to have a function to be called like this:

let result = thread("->", "3", 
                        parseInt,
                        [sum, 3],
                        [diff, 10],
                        str); // "-4"

And what would be going on under the hood is this:

let sameresult = diff(sum(3, parseInt("3")), 10);

So I decided to reimplement the code taking advantage of the new features that came along with ES6(arrow functions, destructuring, rest parameters...).

const thread = (operator, first, ...args) => {
    let isThreadFirst;
    switch (operator) {
        case '->>':
            isThreadFirst = false;
            break
        case '->':
            isThreadFirst = true;
            break;
        default:
            throw new Error('Operator not supported');
            break;
    }
    return args.reduce((prev, next) => {
        if (Array.isArray(next)) {
            const [head, ...tail] = next;
            return isThreadFirst ? head.apply(this, [prev, ...tail]) : head.apply(this, tail.concat(prev));
        }
        else {
            return next.call(this, prev);
        }
    }, first);
}

So when executing the code using the thread first operator for example:

let result = thread("->", "3", 
                        parseInt,
                        [sum, 3],
                        [diff, 10],
                        str); // "-4"

console.log(result); // -4 

and using the thread last operator:

let result = thread("->>", "3", 
                        parseInt,
                        [sum, 3],
                        [diff, 10],
                        str); // "-4"

console.log(result); // 4 

Have fun!

Links:

Getting docker-compose path from cursor position

Introduction

The Stack Builder application as part of the Big Data Europe platform is a system that helps in the process of building docker-compose.yml files. You can drag & drop existing docker-compose files from the project into a textarea to be shown and also search for other repositories in the big data europe github organization, composing a whole new system by taking useful pieces of systems and putting them together.

stackbuilder1

Additionally, it provides small hinting functionality for example by showing a dropdown menu with already existing containers to link them. The idea is to add more intelligence to this hinting process, by being able to know the context on where the user is situated while editing the docker-compose file, and therefore knowing what kind of information may be suitable for them.

  web:
    image: nginx
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    command: [nginx-debug, '-g', 'daemon off;']

Say that the user has the cursor in the - ./nginx.conf:/etc/nginx/nginx.conf:ro line. If we know that the user is situated in the web.volumes path we can add hints into additional volume mount paths that are commonly used for nginx containers.

The problem is, how do we know where in the docker-compose.yml file is the cursor placed?

Implementation

To see all the code just check the repository, I will simplify those pieces that are not needed. The initial scenario is simple: the docker-compose file is loaded into a textarea and parsed into a yaml object:

<div>
  <div class="input-field">
    {{textarea id="textarea-autocomplete" value=value label=label}}
    <label id="textarea-label-{{label}}" >{{label}}</label>
  </div>
</div>
yamlObject: Ember.computed('value', function() {
  try {
    const yaml = this.yamlParser(this.get('value'));
    this.setProperties({
      yamlErrorMessage: '',
      yamlError: false
    });
    return yaml;
  }
  catch (err) {
    this.setProperties({
      yamlErrorMessage: err,
      yamlError: true
    });
    return null;
  }
})

This will return a javascript object with the parsed YAML. Now, every time the cursor moves in the textarea either by pressing arrow keys or writing into the textarea we want to know the path in the yaml object where it is placed, for example giving a point-separated path, (i.e: if the cursor is placed in the first link of the identifier service: services.identifier.links.0).

The first thing we need is to have a way of getting the line where the cursor is placed (for example, - identifier:identifier inside a links object). Since the whole docker-compose.yml is stored as a string inside the textarea, a way of doing it is getting the "context string" starting from the cursor's position and adding characters both left and right until you find "stop characters", involving those that represent a line break or a tabulation in the YAML file.

getCursorYmlPath() {
  const text = this.get('value');
  const cursorPosition = Ember.$('#textarea-autocomplete').prop("selectionStart");
  const stringLeft = this.stringPad('left');
  const stringRight = this.stringPad('right');
  const contextString = `${stringLeft(text, cursorPosition).text.trim()}${stringRight(text, cursorPosition).text.trim()}`;
}

Function stringPad returns the padding characters of a string starting from the cursor index until it finds a stop character.

stringPad(direction, write) {
  return function (text, cursor) {
    let stopChars = ['\n', '\t'];
    let i = cursor;
    let predicate = write ? () => stopChars.indexOf(text[i-1]) : () => stopChars.indexOf(text[i]);
    while (predicate() === -1 && i > 0 && i < text.length) {
      if (direction === 'right') {
        i = i + 1;
      }
      else if (direction === 'left') {
        i = i - 1;
      }
      else {
        break;
      }
    }
    if (direction === 'right') {
      return {
        text: text.slice(cursor, i),
        index: i
      };
    }
    else if (direction === 'left') {
      return {
        text: text.slice(i, cursor),
        index: i
      };
    }
    else {
      return { text: "", index: -1 };
    }
  };
}

At the end, printing the contextString you get the whole line: "- dispatcher:dispatcher".

The next step is to know where in the docker-compose.yml you can find the contextString. Since you can find the previous mentioned line in several services inside a docker-compose, I create a list of object paths that have the context string as a match:

Array.prototype.flatten = function() {
  let arr = this;
  while (arr.find(el => Array.isArray(el))) { arr = Array.prototype.concat(...arr); }
  return arr;
};

getCursorYmlPath() {
  (...prev...)
  const pathMatches = this.getYmlPathMatches(contextString, this.get('yamlObject')).flatten();
}


getYmlPathMatches(contextString, yaml, currentPath) {
if (yaml && yaml !== null) {
  var currentPath = currentPath || "root";

  return Object.keys(yaml).map((key) => {
    if (typeof yaml[key] === "object" && yaml[key] !== null) {
      if (contextString.includes(key)) {
        return [`${currentPath}.${key}`].concat(this.getYmlPathMatches(contextString, yaml[key], `${currentPath}.${key}`));
      }          
      else {
        return this.getYmlPathMatches(contextString, yaml[key], `${currentPath}.${key}`);
      }
    }
    else {
      // Key is not of numeric type (so we are not inside an array)
      if (isNaN(key)) {
        if (contextString.includes(key) || contextString.includes(yaml[key])) {
          return `${currentPath}.${key}`;
        }
        else return [];
      }
      else {
        if (contextString.includes(yaml[key])) {
          return `${currentPath}.${key}`;
        }
        else return [];
      }
    }
  });
}
else return [];
}

Using root as the root object path, the result is a list of object paths like this:

["root.services.identifier.links.0", "root.services.dispatcher"]

Lastly, I retrieve the index in the pathMatches array that correspond to the closest match to the cursor's position.

getCursorYmlPath() {
  (...prev...)
  const tramo = text.length / pathMatches.length;
  const probableIndex = Math.floor(cursorPosition / tramo);
  return pathMatches[probableIndex];
}

There may be edge cases that I have not taken into account, but so far it is working nicely.

Have fun!

I have added Disqus comments to the blog

Hello! I moved my blog to Ghost as the previous one was hosted in a Wordpress free plan and I linked it from my personal website, and I wanted more control over that and also having a more uniform user interface, provided that my former site's design was a modified free bootstrap & some jquery template and it was frankly ugly.

The ghost theme I chose for this blog may not be the fanciest on the internet but it is damn sure it is simple and does not draw attention away from the main point of the site, which is writing posts and giving some information.

However this solution didn't come with out-of-the-box comments functionality, and I really want users who want to engage with my content to have a way to do it other than pinging me over twitter or sending me an email.

I chose Disqus because taking a look at all the alternatives, it provides an stupid simple way of integrating the widget, just registering on Disqus' site, a sprinkle of html and javascript and you have it up & running.

Additionally, the fact of utilizing an external service relieves me of the hassle of hosting comments myself, and gives multiple advantages for free: security, social buttons, sharing, and synchronizing with comments made in other disqus-powered comment boards.

I see two drawbacks though: on one side, it involves an additional http request that user's browser must make to activate the comment board. On the other side, Disqus does not allow anonymous comments, but perhaps that's a good thing right? I don't really need XSS attempts and viagra advertisements all over my posts.

Have fun!

Big Data Integrator (BDI) Integrated Development Environment (IDE)

In the Big Data Europe framework, the Big Data Integrator is an application that can be thought as a "starter kit" to start working and implementing big data pipelines in your process. It is the minimal standalone system so you can create a project with multiple docker containers, upload it & make it run using a nice GUI.

Architecture

You can think of the Big Data Integrator as a placeholder. It acts as a "skeleton" application where you can plug & play different big data services from the big data europe platform, and add and develop your own.

At it's core it is a simple web application that will render each different service's frontends inside it, so it is easy to navigate between each system providing a sense of continuity in your workflow.

The basic application to start from is constituted of several components:

  • Stack Builder: this application allows users to create a personalized docker-compose.yml file describing the services to be used in the working environment. It is equipped with hinting & search features to ease discovery and selection of components.
  • Swarm UI: after the docker-compose.yml has been created in the Stack Builder, it can be uploaded into a github repository, and from the SwarmUI users can clone the repository and launch the containers using docker swarm from a nice graphical user interface, from there one can start, stop, restart, scale them, etc...
  • HTTP Logger: provides logging of all the http traffic generated by the containers and pushes it into an elasticsearch instance, to be visualized with kibana. It is important to note that containers to be observed must run always with the logging=true label activated.
  • Workflow Builder: it helps define a specific set of steps that have to be executed in sequence, as a "workflow". This adds functionality like docker healthchecks but more fine-grained. To allow the Workflow Builder to enforce a workflow for a given stack (docker-compose.yml), the mu-init-daemon-service needs to be added as part of the stack.

That service will be the "referee" that imposes the steps defined in the workflow builder. For more information check it's repository.

Systems are organized following a microservices architecture and run together using a docker-compose script, some of them sharing microserviecs common to all architectures, like the identifier, dispatcher, or resource. This is a more visual representation of the basic architecture:

bdi-arch--1-

Installation & Usage

  • Clone the repository
  • Per each one of the subsystems (stack builder, http logger, etc..) used, check their repository's README for it may be some small quirks to take into account before running each piece.
  • Run the edit-hosts.sh script. This is to assign url's to the different services in the integrator.
  • docker-compose up will run the services together.
  • Visit integrator-ui.big-data-europe.aksw.org to access the application's entry point.

How to add new services

  • Add the new service(s) to docker-compose.yml, it is important to expose the VIRTUAL_HOST & VIRTUAL_PORT environment variables for the frontend application of those services, to be accessible by the integrator (e.g):
  new-service-frontend:
    image: bde2020/new-service-frontend:latest
  links:
    - csswrapper
    - identifier:backend
  expose:
    - "80"
  environment:
    VIRTUAL_HOST: "new-service.big-data-europe.aksw.org"
    VIRTUAL_PORT: "80"
  • Add an entry in /etc/hosts to point the url to localhost (or wherever your service is running) (e.g):
127.0.0.1 workflow-builder.big-data-europe.aksw.org
127.0.0.1 swarm-ui.big-data-europe.aksw.org
127.0.0.1 kibana.big-data-europe.aksw.org
(..)
127.0.0.1 new-service.big-data-europe.aksw.org
  • Modify the file integrator-ui/user-interfaces to add a link to the new service in the integrator UI.
{
  "data": [
    ...etc .. ,
    {
      "id": 1,
      "type": "user-interfaces",
      "attributes": {
        "label": "My new Service",
        "base-url": "http://new-service.big-data-europe.aksw.org/",
        "append-path": ""
      }
    }
  ]
}

Have fun with it!