Oh no, I forgot my PGP private key's passphrase

Introduction

After being quite disconnected from the technology world for months and pretty much limiting my interactions to my working schedule, I decided to go back into setting up my personal workstation with the minimal memory footprint and using as little privacy-invading technologies as possible.

One step of this process meant setting up again my GPG keys to be used while signing my emails. Thus, I retrieve my private key from an offline medium I keep safe and check whether the corresponding generated public key matches the one I have broadcasted publicly. (like here in https://estebansastre.com/about)

.ssh λ ssh-keygen -y -e -f secret-key.asc
Enter passphrase:

I try every single password combination I can think of and nothing. F*ck. So, what do I do now?

Steps to take

1. Identify where is my public key available.

When a user creates a new key pair, they can choose to publish the public key to what is called a key server, which is one or a group of computers on the internet in charge of making them available for other users to send encrypted messages.

By default, the GPG application uploads them to keys.gnupg.net. This doesn't mean that a key is in a single computer. Servers normally take part in pools and synchronize their databases, say for example the SKS or the PGP Global Directory

.ssh λ gpg --search-keys esteban.s.f0@gmail.com
gpg: searching for "esteban.s.f0@gmail.com" from hkp server keys.gnupg.net
(1)     Esteban <esteban.s.f0@gmail.com>
          4096 bit RSA key 1E117998, created: 2018-05-07, expires: 2023-05-06

Taking a look at the first page of google of available keyservers, I see my key has been replicated into pretty much all of them.

2. Create revocation certificate for the keys.

Since I forgot my private key's passphrase, it is safe to assume it is equivalent to it being compromised and must no longer be taken into account for sending messages with the public key or verify any signature. The next step to take is generating a revocation certificate and upload it to the keyserver, then until it synchronizes and invalidates the other ones:

.ssh λ gpg --output revocation-certificate.asc --gen-revoke 1E117998

sec  4096R/1E117998 2018-05-07 Esteban <esteban.s.f0@gmail.com>

(..etc..)

You need a passphrase to unlock the secret key for
user: "Esteban <esteban.s.f0@gmail.com>"
4096-bit RSA key, ID 1E117998, created 2018-05-07

Enter passphrase: 

F*ck, again. So, I need to know my passphrase in order to create the revocation certificate in order to revoke the private key I had initially forgotten the passphrase to. What are the options I have left?

3. Generate new key pair

.ssh λ gpg2 --full-gen-key

And this time do remember the passphrase!

4. Generate revocation certificate

.ssh λ gpg2 --output revocation-certificate.asc --gen-revoke 38DF1841

5. Export private key

.ssh λ gpg2 --armor --export-secret-keys 38DF1841 > myprivkey_38DF1841.priv

The --armor option allows the private key to be encoded in plain text and thus it can be sent over an email for example.

6. Save both private key & revocation certificate

Now I have created the new key pair, added to my keyring and successfully exported the private key along with the revocation certificate I need to store them somewhere safe and accessible offline.

7. Upload again my new public key to public keyserver

.ssh λ gpg2 --send-keys 38DF1841
gpg: sending key 38DF1841 to hkp://keys.gnupg.net

8. Change my public key my own website: estebansastre.com.


Conclusions

PGP is a very valuable tool to encrypt your communications, but it comes with the never ending hassle of key management. I have now four takeaways for the next time:

  • Every time you generate a new key pair, automatically generate the revocation certificate with it just in case.
  • Always keep your private key & revocation certificate in a safe place.
  • Don't forget your passphrase...
  • Use a password manager once and for all. I've heard great things about pass.

Have fun!

References

Making POST requests work with Django tests

It is the second time at work that I spent some minutes wondering why I was not properly receiving POST arguments in a view when testing it from django.

Let's have a little more context, imagine a very simple Django view where you want to print the value of a POST parameter in the console and return it.

class MyView(views.APIView):
    def post(self, request):
        param = request.POST.get('param', None)
        print(param)
        return param

So I manually test this with curl:

curl -X POST -d 'param=fiesta' 'https://my.local.url/myview/'

It works, so now I just want to write a simple test using django test framework:

def test_my_view(self):
	data = {
		'param': 'fiesta'
	}
	response = self.client.post(reverse('my-view), data)
	assertEqual(response, 'fiesta')	

However, the assert fails and the print(param) line always yields None, while when I was testing it with curl the parameter was always properly received. How come is that?

Explanation

Turns out when you send data in curl without specifying the Content-Type header, by default it sends the data in the application/x-www-form-urlencoded format. If you want to send json data in your request you have to set the -H "Content-Type: application/json" header properly.

This means that the view is accessing the POST object of data only received in the application/x-www-form-urlencoded format, but the django test client, while calling the post in the line:

response = self.client.post(reverse('my-view), data)

Is sending the data by default in json format, and thus the view is not able to access it.

Solution

There are two ways to overcome this:

  1. Changing the content_type parameter in all tests to send application/x-www-form-urlencoded data, like:
response = self.client.post(reverse('my-view), data, content_type='application/x-www-form-urlencoded')
  1. Accessing the raw data or the request instead of the form data of the POST object.
param = request.data.get('param', None)

Have fun!

Writing a simple Inverted Index in Python

Introduction

Nowadays is not uncommon that web applications include full text search features. There are already well known solutions working out-of-the-box that provide the needed functionalities, such as ElasticSearch or Apache Solr.

Having used ElasticSearch at work a couple of times I wondered how it achieved fast searches and what mechanism empowered that, so reading up a little on the topic, the Inverted Index appears as the cornerstone of full text search algorithms.

Thus, what better way to understand how something works than writing my own toy one?

What is an Inverted Index?

The Inverted Index is the data structure used to support full text search over a set of documents. It is constituted by a big table where there is one entry per word in all the documents processed, along with a list of the key pairs: document id, frequency of the term in the document.

How does it work?

Let's imagine we have two documents with different texts:

  1. The big sharks of Belgium drink beer.
  2. Belgium has great beer. They drink beer all the time.

In order to create the Inverted Index, each text is sliced into different units or terms. The rule is to use whitespace as the natural separator between words, although it can be changed.

Additionally, per each term there is a list of pairs (document id, occurrences), showing the document's ID where the term is found, and the number of times the term appears in the text.

Therefore, the Inverted Index after processing the previous two documents would be:

Term Appearances (DocId, Frequency)
The (1, 1)
big (1, 1)
sharks (1, 1)
of (1, 1)
Belgium (1, 1) (2, 1)
drink (1, 1)
beer (1, 1) (2, 2)
has (1, 1)
great (1, 1)
They (1, 1)
all (1, 1)
the (1, 1)
time (1, 1)

As seen, the term Belgium appears once in both documents, while the term beer appears once in the first and twice in the second one. Whenever a search is issued, the index will be looked up and the corresponding documents retrieved automatically.

This in turn makes processing the documents (indexing) and thus creating & updating the index a slow process, since each document needs to be parsed, sliced and analyzed. Conversely, once the index is created search becomes a really cheap operation since it only entails looking up an entry in a table.

As it happens with everything, this mechanism is not a silver bullet and it has it's quirks and drawbacks, being some of them:

  • Case: The words Belgium and belgium are indexed as two different terms in the table, while a user writing "belgium" in lowercase letters most likely would want to see all the occurrences regardless case.
  • Stopwords: keywords considered irrelevant and deprived on any additional sense, like prepositions, articles or conjunctions.
  • Stemming: Reducing derivations of a word to their common root (e.g: shark and sharks to shark).
  • Synonyms: Gathering terms that share a common meaning to a single one, in order to prevent an explosion in Inverted Indexe's size. Terms like car, automobile, plane and truck could be mapped into a single vehicle word in the index.

Example

The Inverted Index can be understood as a simple key/value dictionary where per each term we store a list of appearances of those terms in the documents and their frequency.

Thus, an Appearance class represents a single Appearance of a term in a document:

class Appearance:
    """
    Represents the appearance of a term in a given document, along with the
    frequency of appearances in the same one.
    """
    def __init__(self, docId, frequency):
        self.docId = docId
        self.frequency = frequency

        
    def __repr__(self):
        """
        String representation of the Appearance object
        """
        return str(self.__dict__)

The Database class is a fake in-memory DB used to persist the documents after they have been indexed.

class Database:
    """
    In memory database representing the already indexed documents.
    """
    def __init__(self):
        self.db = dict()


    def __repr__(self):
        """
        String representation of the Database object
        """
        return str(self.__dict__)

    
    def get(self, id):
        return self.db.get(id, None)

    
    def add(self, document):
        """
        Adds a document to the DB.
        """
        return self.db.update({document['id']: document})


    def remove(self, document):
        """
        Removes document from DB.
        """
        return self.db.pop(document['id'], None)

And finally, the InvertedIndex class.

class InvertedIndex:
    """
    Inverted Index class.
    """
    def __init__(self, db):
        self.index = dict()
        self.db = db


    def __repr__(self):
        """
        String representation of the Database object
        """
        return str(self.index)

        
    def index_document(self, document):
        """
        Process a given document, save it to the DB and update the index.
        """
        
        # Remove punctuation from the text.
        clean_text = re.sub(r'[^\w\s]','', document['text'])
        terms = clean_text.split(' ')
        appearances_dict = dict()

        # Dictionary with each term and the frequency it appears in the text.
        for term in terms:
            term_frequency = appearances_dict[term].frequency if term in appearances_dict else 0
            appearances_dict[term] = Appearance(document['id'], term_frequency + 1)
            
        # Update the inverted index
        update_dict = { key: [appearance]
                       if key not in self.index
                       else self.index[key] + [appearance]
                       for (key, appearance) in appearances_dict.items() }

        self.index.update(update_dict)

        # Add the document into the database
        self.db.add(document)

        return document

    
    def lookup_query(self, query):
        """
        Returns the dictionary of terms with their correspondent Appearances. 
        This is a very naive search since it will just split the terms and show
        the documents where they appear.
        """
        return { term: self.index[term] for term in query.split(' ') if term in self.index }

In order to test the execution of the index, I just create a couple documents and perform some searches.

def highlight_term(id, term, text):
    replaced_text = text.replace(term, "\033[1;32;40m {term} \033[0;0m".format(term=term))
    return "--- document {id}: {replaced}".format(id=id, replaced=replaced_text)


def main():
    db = Database()
    index = InvertedIndex(db)

    document1 = {
        'id': '1',
        'text': 'The big sharks of Belgium drink beer.'
    }

    document2 = {
        'id': '2',
        'text': 'Belgium has great beer. They drink beer all the time.'
    }

    index.index_document(document1)
    index.index_document(document2)
    
    
    search_term = raw_input("Enter term(s) to search: ")
    result = index.lookup_query(search_term)

    
    for term in result.keys():
        for appearance in result[term]:
            # Belgium: { docId: 1, frequency: 1}
            document = db.get(appearance.docId)
            print(highlight_term(appearance.docId, term, document['text']))
        print("-----------------------------")    
    
main()

Doing a couple searches we can see the result:

inverted1

and another one.

inverted2

I hope this served as a good introduction on how the Inverted Index works.

Have fun!

References

Sabbatical five months update

Wow time flies! Three months since the last review of my time spent during sabbatical have gone by. I thought it would be useful to take a whole afternoon to reflect on the steps I've taken and also think about the direction I want to take.

Things I've done & learnt so far

  • I got my A2 motorbike permit :D
  • I learnt to make a couple new recipes with toasts.
  • I finally threw in the beach a couple Chinese lanterns with a friend in our hometown holiday, we used to forget every year!
  • I completed the PADI Open Water Diver Course, it is the first begginer's certification in scuba diving. There are some others worldwide recognized as SSI. It was also the first time I did scuba diving and I loved it.
  • I tried Surf for the first time but it was not the most pleasant experience. I have been snowboarding for years and thought I could transfer some knowledge but they turn to be two quite different worlds.
  • I have done Yoga for the first time. I really enjoyed it!
  • I learnt how to say "Rock, Paper, Scissors" in Chinese and I am quite proud of it.
  • I learnt how Singapore went for a fisherman's village to the technological & economical hub it is today.
  • I learnt about the arrival of Islam to Malaysia.
  • I learnt about the history of Bali and it's colonization past with the Dutch.
  • I started a small programming project involving MindMaps, but don't really devote much time to it.

Things I would like to do

I think it is better to keep this to myself, there are so many things that I am afraid that if I share all them and don't do at least the majority, I come out as a failure. Better this way.

Conclusion

This months have given me quite a good perspective about life, myself and traveling. I've come to realize that success means different things to different people and that traveling as a means to achieve some kind of inner peace is often irrevelant, as that peace must come from whithin. I am definitely still working on that.

Have fun!

How to choose your next tech side project

Introduction

Besides the day-by-day work, I like to always have an ongoing side project to keep my skills sharp. These don't have to be necessarily useful or directly linked to my current job, but serve as a distraction to throw some spare hours into.

Normally those projects have a bunch of common traits:

  • They are purposely small.
  • They focus on a single defined task.
  • I try to add at least one element of novelty on them (language, framework, programming paradigm).
  • Although this not always happens, I need to see the point of working on it, otherwise boredom easily arises.

I remember during college I started writing an ELF file parser (much like the readelf binary) and realized later that it was way better to use memory mapping instead of opening the file with file descriptors.

Another example was a simply e-shop webapp using Ember.js & Laravel 5. The reason for choosing Laravel was because I was borrowing a friend's hosting so I was limited to a LAMP environment, and despite having a shell available an thus being able to install node.js or whatever I needed (without root, extra difficulty), the backend was simple enough to stick with PHP.

The point of giving these examples is to state that over the course of the years I've had better or worse ideas (many completely useless), but I've greatly enjoyed my invested time.

Then, the moment of complete despair comes when I run out of them, when I dismiss all those waiting on my Possible Projects backlog and can't find anything worth starting. If it is not useful enough, it is way too complex to invest so many hours; if it is interesting, it might have no direct application, and thus creating anxiety (e.g: ReasonML is very cool, but I should really learn Typescript properly..); if it is real-life "useful", it may be utterly boring (e.g: writing better unit tests in python/selenium).

It is not uncommon to spend a considerable amount of time thinking on which project to embark next. Since this is a recurring problem I have eventually decided to sketch out a mind map describing the process I go through when looking for something new to do.


project-choice

I have divided the potential Idea Triggers in six groups, colored in different shades of green. Once a choice is made, the Possible Actions are in different shades of orange, all but the "Choose Stack" choice, which I'll explain later.

Each Idea Trigger represents a specific area where you want to focus when it comes to find an idea for a project, and each Possible Action is the action you take accordingly. Notably, not all combinations make a lot of sense.

Idea Triggers

Useful for your current job

You want to invest your free time into skills that have a direct impact in the job you are currently pursuing, so the contribution pays off directly. The potential areas I tend to pay attention to are:

  • Identify stack & languages used at work: as a developer we use a cluster of technologies we are may not be very acquainted with (you can't know everything).

    Your backend might be an array of microservices running in Docker containers written in Ruby, Python and Go, each of them running in an Nginx server, using RabbitMQ for message passing, one of them using Redis for some caching and the frontend written in React.js & Typescript. It is unlikely even as a "Fullstack Developer" that you have well versed expertise in all these technologies, hence making each one of the bolded words a possible challenge to tackle.

  • Identify repetitive or annoying tasks that can be automated: in a developer's (or any IT job) daily tasks there's often plenty of room for automating tasks, providing you a good ground for identifying weak spots in the processes (e.g. inefficient build process) that you'd otherwise neglect, given this examples from my own experience:

    • npm link'ing multiple repositories at once in a single project or having nested npm link inside a single repo. You can write a tool for that!

    • Cloning the same docker barebone repository and having to change over and over the code here and there to customize it. You can have a docker template and extend it with custom code!

    • Doing some Excel columns manipulation manually, over and over again. You can write a script for that!

      And I am sure there are plenty of these examples you are already starting to realize..

  • Identify a process related to your job you don't quite yet understand: you might be a frontend developer who writes commits, pushes them and forgets what are the further steps until the feature is shipped to the final product. You know someone writes the Q&A automated testing, and that a thing called TravisCI perform automatic tests & building, but never scratched the time to understand it. This can be a good entry point to learn steps in the software shipping process you are never directly involved.

Useful for a prospective job

You might be looking for a new job that can be either related to your current one, or diverts a little, or tackles an entirely new domain you have never worked with. Despite computer science related jobs normally share a common trunk of knowledge, there are so many specializations that it would become impossible to become proficient in all of them.

However, it is not uncommon for a developer to slide from one to another. Let's take a few examples:

  • You are a frontend developer working in a web application, you realize there are many security holes in the code & architecture and you decide to fix them. You become increasingly interested in multiple ways attackers can take advantage of a web application and therefore you start to know it's correspondent security measures (XSS, SQL injection, RFI, LFI, etc.. vs. CSP, CORS, Character filtering, etc..). After some joyful playing you decide you find more enjoyment and purpose in web security and want to jump to a Security Consultant/Architect position.
  • You are a backend developer and your next task is to find out why the DB queries are so slow. You start to enjoy data modeling and database administration and start to research alternative databases: NoSQL, GraphQL, Triplestores, etc.. and decide you want to move into a DBA role.
  • You are a developer who is bored of waiting so much for the CD/CI cycle and decide to create your own set of scripts and try to tweak Travis, Jenkins or whatever your organization has running. You realize you love scripting and DevOps and want to jump into a role to do that.

The list can go on, those are just some examples that popped into mind to illustrate the leap that might come after curiosity brings us outside our regular role.

Hot Topic

There is a constant stream of new technologies, languages, frameworks, etc.. Most of them fade away, but good ideas tend to stick around and often they originate from old computer science papers that are rescued from their dusty bookshelves (functional programming?).

In your daily job you may just perform mundane development tasks, but some topics in hacker news or lobste.rs appear quite often and start to resonate with you. If they are around for that long, they might be worth giving a look. Topics like (my personal list):

  • Blockchain/Ethereum
  • IPFS
  • WebAssembly
  • Natural Language Processing
  • Machine Learning (applied to whatever)
  • Serverless

I personally find this Idea Trigger one of the hardest to follow, since I need to see an immediate practical application to them and also enjoy playing with. As an example I really like the idea of WebAssembly but I don't see real projects made with it yet and I am concerned with the potential security implications it may bring along (like Flash did).

Specific tech

Stricty related to the previous Hot Topic section, but instead of paying attention to technologies that appear often in the media and thus having a direct job finding correlation, the Specific Tech section encompasses the technologies you feel curious about and have been staying in the back of your head for a long time, despite not having necessarily a direct application, more like a general "usefulness" vibe.

Some items of my personal list are:

  • Lisp
  • ReasonML
  • Elm
  • Exploit development
  • Reactive programming
  • Malware analysis
  • Advanced Shell Scripting (sed, awk, etc)
  • Understand the OS booting process and how programs are loaded into memory

Again, although enjoyment is paramount, sticking to these projects is hard since I often don't see a direct return of investment in time & effort.

Solve a real problem

By far the easiest to stick to. If you are lucky enough to come accross a real problem you realize you could automate or build a software solution for it, you hit a gold mine. You have the motivation to craft something new with the added advantage of being useful for yourself or a group of people.

An example of this could be that you participate in a cultural center in your neighborhood, and you notice they are having some trouble managing the activities, letting people know about the offer and providing a way of proposing new ones. Hence, you roll up your sleeves and decide to create a web application to cover those use cases.

To get inspiration for people who solve real problems you can check Indie Hackers, they have plenty of nice projects to showcase.

Interesting Topic

Instead of looking for a given technology to learn, this method has the inverse approach. I make a list of my different areas of interest besides computers, for example:

  • Privacy
  • Freedom of Speech
  • Motorbikes
  • Learning new spoken languages
  • Different forms of education

And from that list I would look for a project that tackles a given topic or a combination of several.

Possible Actions

Per each of the Idea Triggers there are one or many possible ways to start building knowledge as you can see in the overview picture. Those are the Possible Actions to be taken.

Online Course

A simple and straightforward way of getting up and running with a topic is to follow an online course. This is good if you want filtered, curated and structured content, since other people have taken care of that for you. The downsides of this approach are that you still have to filter whether it is useful and worth following, and not all worthy courses are for free.

Normally you can either opt for the regular MOOCs offered by multiple universities and platforms online or access a more specific courses platform. Some platforms I've used in the past are:

MOOC

Other Platforms

Github Project

Github provides and excellent repository to hunt for ideas, especially if you don't want to start a project from scratch and would be better off by contributing to an already started one. By clicking in "explore" you can find the latest trending repositories in the Open Source community, as well as being able to filter them by the language, framework and topic you're interested in.

Let's say I am initially interested in security and I want to learn Python by applying it to that topic. By the time I am writing the article, I see there are some interesting projects that fall into this category I would be quite thrilled to contribute to:

  • wifiphisher: a security tool that allows to create a rogue access point and force clients to authenticate & phish credentials.
  • AutoSploit: enables automation of massively exploiting remote hosts through metasploit.
  • Maltrail: malicious traffic analysis system.

As seen, Github provides a great "open source supermarket" where you can search through a wide range of different options and hopefully find one you resonate with and are willing to contribute.

Related Jobs

If you are clear about the topic or technology you want to jump to, a great way of finding a pragmatical use for that knowledge would be to look for actual startups & companies that make use of them, or simply focus on the topic you are interested in (e.g: Farming, Privacy, E-Health, you name it).

There is a big offer of job portals out there, so I will just give a small example: imagine you are sure you want to learn the Haskell language, and you are very interested in making a career in health care. A great way of starting is to look in StackOverflow and in AngelList for jobs with haskell or functional programming keywords and if then start to fine tune the search.

Luckily there will be some interesting projects to look at, and might give you ideas for yours.

Code Challenges

Participating in code challenges, code jams and whatnot is a good exercise to get comfortable with the quirks of the language, but they normally don't represent real problems or situations you might encounter in a job. It can be cool to know how to implement a double linked list, or a breadth-first tree lookup, but most of those problems have been solved time ago and are now part of standar libraries.

This is not to undervalue their utility, they have proven to be incredibly good to keep my skills sharp in many times, but I'd rather do projects that try to solve a real problem, even though it is a well-known solution (say a host port-scanner).

There are plenty of them, but some of the code challenge websites I've used to practice are:

However there is a small one I've seen to give this pragmatical approach for code learning and I really like it: Hackattic.

Write a Blog Post

I use to do this often for small ideas I have a general sense of what they are about but never took the time to organize and understand them clearly, at least the basic principles. Good examples or this are undestanding requestAnimationFrame, understanding PGP or understanding routing in javascript.

This principle can be applied to anything you want to understand: if you're able to explain it means you get it.

Misc

There are many other ways to start a side project, such as reading books, but those have simply proven not to be useful for me at the beginning. When I take a book about a topic it has to be one I am fairly familiar with and towards which I already profess genuine interest. Alas, I lose interest quite fast.

Conclusion

I am sure I am leaving a lot of good information there, any remarks and constructive criticism is more than welcome in the comment section.

Have fun!