Ocaml Packaging: Down the Rabbit Hole

It probably doesn’t come as a surprise that I am a big fan of the Ocaml language. I’m using it for most of my scientific research as can be seen on Github. When I started out working with Ocaml back in the old days, there was not much more than the Ocaml compiler. This required you to write extensive and essentially – what I like to call write-only – Makefiles like this one here.

Also, there was no good way of handling dependencies of packages which required us at the time to use git submodules of some sort without any particular support which didn’t make distribution, dependency management and compilation any easier.

So when Martin Lange and I decided to give PGSolver a major overhaul recently, I decided to see if time has caught up with Ocaml. Boy was I in for an adventure.

First, I wanted to figure out compilation. Instead of writing these extensive Makefiles, there is a tool called Ocamlbuild that can do most of the compilation for you, particularly finding all required source files and compiling and including them in the right order. Now how do you install Ocamlbuild? Looking at their Github page, it seems like you need another tool called OPAM in order to install Ocamlbuild.

OPAM turns out to be a package manager for Ocaml, so it is reasonable to install that.

Just using Ocamlbuild seemed good enough for updating PGSolver, as I didn’t want to change too many things in one go. Since package managers often want your git repositories to be somehow in sync with the package registry, I didn’t wanna mess with the submodules just yet by turning them into packages.

However, there is a single repository, Camldiets, that seems like the perfect test case for figuring out how to create and publish a package. It only has a single module and no dependencies on its own. Should be very straight-forward to create a package for this, right?

Well, not so fast. There also seems to be the so-called OASIS project that, according to their website, allows you to integrate a configure, build and install system in your Ocaml projects. While this is vague enough that I’m not completely sure what that means exactly, it does sound awesome somehow, so what the heck.

So let’s set up a OASIS project from scratch, starting out with our Camldiets repository. After installing OASIS (via OPAM, of course), we let it generate a bunch of boilerplate files by calling oasis quickstart and answering a few questions. When it asks you which plugins you wanna install, you have the choice of installing METAStdFiles and DevFiles. Well, no idea, probably none, so let’s continue. It then asks whether – among other things – I wanna create a library or a source repository. Well, no idea either, as long as my package is properly compiled and linked to a depending project, I don’t care. Let’s try library for now.

This creates an _oasis file that essentially contains all the information you just supplied during quickstart. Next, you’re supposed to call oasis setup. This now generates a whole slew of files. Hard to say whether they should be gitignored or not, but I guess I’ll worry about it later. Then, you’re supposed to call ocaml setup.ml -configure which seems to check all kinds of things. Moving on. Finally, we call ocaml setup.ml -build which seems to use ocamlbuild to build our library. Whatever that means exactly.

By now we have so many generated files in the repository that I’m starting to worry. Some of them must be ignored, I’m sure, particularly the _build folder that is being generated by ocamlbuild. I’m now including the “official” gitignore file for OCaml.

So what did I want to do again? Package everything up in OPAM, right. Does OPAM accept my OASIS-enriched repository? Unfortunately it doesn’t, and it requires it’s own set of proprietary meta data. Great. Luckily, there is another tool called oasis2opam that is supposed to convert your freshly created OASIS project into an OPAM-compatible packaging. So we install oasis2opam via OPAM and run it via oasis2opam –local. It immediately complains about our OASIS configuration leaving much to be desired, so we shamefully try to add some more key-value pairs to it and run it again.

This results in a syntax error, and – as it is common with OCaml – the error message is not particularly useful. It is just telling us that there is a syntax error somewhere in the OASIS file. Sigh. Once this is cleared up after some googling, the OPAM files are being generated.

Let’s see if this at least works locally now. According to the docs, we can do this by calling opam pin add camldiets . -n. Now we want to include the package somewhere else. We change into an empty folder and try to install it via opam install camldiets –verbose, but not so fast: Cannot find file ‘src/META’ for findlib library Camldiets

What? But why? Apparently, there is another tool called findlib that needs its own meta data describing the repository. Christ. Down the rabbit hole once again. Okay. So how do I now get the META file? After doing some research, it seems like adding META to the Plugins section of OASIS might seem to do the trick. We’ll go through the chain of generating commands again, and try to install the package one more time, and now it seems to work. We can even include it in a local test project. Success!

Now how do we get this package into OPAM? Obviously, we need another tool for this, opam-publish, that we install via OPAM. How deep can this hole be? Publishing repositories via OPAM seems to require a couple of steps for sure.

First, tag the repository on Github. Then call opam-publish prepare https://github.com/tcsprojects/camldiets/archive/v0.2.tar.gz which generates an awkward subfolder called Camldiets.0.2. You’re then supposed to submit this folder by calling opam publish submit ./Camldiets.0.2 which in turn asks you for your full Github credentials. Yikes, that’s giving me the chills! According to most sources I was able to find online, this call usually fails in any case. And sure enough it failed. Would have been too easy, wouldn’t it?

The manual way is to fork the full OPAM registry repository, add the generated folder to it and issue a pull request. Jeez. Okay. Once this is done, a variety of continuous integration tests kick in and you’re left waiting for them to finish and for somebody from OPAM to have a look at it. Fingers crossed that this can be pulled in as is. I’ll update this post once the pull request is updated.

Publishing and packaging things with OCaml seems much less straight-forward than most build-and-packaging systems for other languages, but I suppose that’s just the way it is. After all, this is definitely an improvement over the fugly Makefiles and submodules we were using a decade ago. Hopefully, updating packages will be less of a fuss than creating a new one from scratch.

Turning multi-factor-authentication into multi-people-authentication using a Slackbot

Once upon a time, I had a blog… oh, wait, I still have one.

I’m never really finding the time to write. Or, as Calvin put it so eloquently:

There is never enough time to do all the nothing you want.

So let’s get right to it.

Imagine you’re working with an engineering team, and you need to distribute access for manipulating your deployment environment. For security reasons, you don’t want to give everybody who should be able to access it, full access.

But even limiting the access, e.g. by using IAM roles on AWS, doesn’t fully do the trick, because in order for your team to do anything meaningful, they do need reasonably strong permissions, and hence, are able to do reasonable damage.

What if we could at least limit the time they have access to the system? What if they at least need to ask every time they wanna access the system?

Modern authentication systems usually do not support this, but something very similar: two-factor authentication. The idea behind this is that, in order to sign into an account, you need two factors, usually “traditional” account details like username and password plus a so-called one-time token that is being generated by another device like a smartphone.

Let’s quickly look at how these one-time tokens are being generated. In most cases, these are time-based one-time passwords. They are based on a unique string that is being generated upon setting up the account, usually presented as a QR code, that is e.g. scanned and saved by an application on a smartphone. The application then computes, given the unique string s and the current time t (truncated to intervals of e.g. 60 seconds), a deterministic function f(s,t) that usually returns a 6-digit number.

This 6-digit number then has to be provided alongside the traditional account-details to sign in. Since this 6-digit number is only valid for about 60 seconds, it cannot be reused again. Clever.

So how can we turn this multi-factor-authentication into multi-people-authentication? Well, instead of storing the unique string on the team member’s device, we store the string (symmetrically encrypted, if we want to) in a database. Now when the team member wants to sign in, he/she provides the traditional account details and when being asked for a one-time token, the team member queries one of the administrators, and the administrator computes the time-based one-time password for the team member. Boom.

Now while this sounds cool in theory, how can we make sure that this is not an organizational pain in practice? By turning it into a Slack bot. Since your team is already on Slack, why don’t your team members just ask the slack bot to generate a one-time token for them, and the slack bot in turn queries the administrators whether to grant access to the token generation function for that specific account.

Well, here it is, open-sourced on Github. Let me know what you think.

And apologies for being “offline” on this blog for so long.

Right time, right phase

There are different activities that we do throughout the day, every day, that need to get done. Some of them we enjoy more, some of them we enjoy less.

The ones we enjoy less can sometimes turn into a vicious cycle of procrastination and guilt, ultimately resulting in spending even more time in a rather unproductive and unsatisfying manner. We have all been there, and there is no way to avoid it.

One of the things that I enjoy the least is going through emails and answering them. Similarly doing phone calls. Which makes me sound like a sociopath, but that’s not it. Emails, at some point, become a pain, for everybody.

So how can we improve on our email-reply-to-procrastination ratio? I’m sure there are behavioural change systems that finally redirect your procrastination efforts to cleaning your kitchen in chapter 10, but I have found that for me, the ratio really depends on the time of the day. I am surprisingly efficient in going through my emails in the morning (where morning is defined as shortly after getting up) while I tend to suck at it the later it gets.

And this observation holds true for many other things as well. My takeaway is to observe yourself at what time you are your best self for doing the different things you need to do and schedule your day around it.

Perfect Tools Part 2: Actionable Remembering on How to do Things

Have you ever googled for a quick tutorial on how to scp files from one computer to another? Or searched for the right commands to create and switch to a new git branch? What are the perfect tools to do that job? You might say, it depends on what the objective is – scp is fairly different from git.

And you are right. But there are two more things to it. First, you have to find out how it works in order to perform the task. Google is usually your friend here, and I assume that you know how to phrase successful search queries. So what’s the second thing? Performing the task a second or a third time. And forgetting how you’ve done it before. I’m sure that happened to you countless times.

So what’s the perfect tool for the job? It’s actually your favourite text editor, and using it for this job is more of a habit than of an expertise: whenever you found out how to perform a specific task, write it down. Not a long text, just as much as you need to understand as quickly as possible how to do it again. As a bonus, you might want to write a shell script instead that performs the task for you.

The way I’m doing it, is two-fold. I have one directory with text files, describing how to do things. Here are two examples:

  •  Git Branching Workflow
git checkout -b NEW_BRANCH
...
git add .
git commit -a -m "Finished Work"
git checkout master
git merge NEW_BRANCH
git branch -d NEW_BRANCH
git push
  •  Configure Apache on Mac OS-X
sudo apachectl start
cd /etc/apache2/other
sudo nano test.conf
-------------------
NameVirtualHost *:80
<VirtualHost *:80>
    ServerName localhost
    DocumentRoot /YOUR_WEBSERVER_ROOT
    <Directory /YOUR_WEBSERVER_ROOT>
        Allow from all
        AllowOverride AuthConfig
        Options FollowSymLinks
    </Directory>
</VirtualHost>
---------------------
sudo apachectl -k restart

But even better than that is making such notes actionable by writing scripts that do the work for you. I have all my utility scripts in one directory that is included in the PATH variable of my terminal. In fact, all scripts sit in a DropBox folder, so all my machines are always up to date. All scripts are prefixed by “cmd-“, so I can easily find and execute all of them by simply typing “cmd-” in my terminal and then auto-completing the specific task by hitting tab.

Here are a couple of examples:

  • Limit your downstream bandwidth (cmd-bandwidth-limit-downstream.sh)
#!/bin/sh
ipfw add pipe 1 all from any to any in  
ipfw pipe 1 config bw $1Kbit/s delay $2ms
  • Convert video to mp4 (cmd-ffmpeg-mp4.sh)
#!/bin/sh
ffmpeg -i $1 $1.mp4
  • Find file globally (cmd-find-globally.sh)
#!/bin/sh
find / -name $1
  • Find file recursively (cmd-find-recursively.sh)
!/bin/sh
grep -r "$1" .
  • Overwrite your mac address (cmd-mac-addr-set.sh)
#!/bin/sh
sudo ifconfig en0 ether $1

I have about 30 text files and 50 scripts of this kind now. The additional time that you need to write up these little documents when encountering the task for the first time is nothing compared to the time that you’ll need to find out how it works again for the second or third time. Not to mention your frustration about feeling repetitive.

Perfect Tools Part 1: Selection and Consumption of Posts

If you are like me, then you are always searching for ways to improve on the efficiency of your daily work flow. I feel like I’m already closing in on – at least – local optima in some areas, and since it took me some time to find and combine the right tools to do the job, I might as well share it here.

I am an avid reader of many specific different blogs and news sites, but like most people, I only read a tiny fraction of all articles that are posted per day. So the first task I have to do every day is filtering: deciding which posts are of interest to me. For me, the process of filtering is completely separate from reading the actual articles.

Since I don’t want to browse 20 different websites each day to find new posts, I use an aggregator that pulls in all articles and presents the new ones to me in a uniform fashion that makes it easy to go over the different posts. There are essentially two types of aggregators: intelligent and dumb ones.

Intelligent aggregators try to guess which sources and which posts are of interest to you and only present you with that selection. Dumb aggregators in turn show you the sources that you have selected and within those sources all the posts. While most people probably decide to use intelligent aggregators, I decided to use dumb aggregators. For one, I don’t want to see a semi-random selection of sources, since I have hand-picked my own sources, and I always have the feeling that I might be missing out on interesting articles if a machine-learning algorithm is selecting articles for me. I am particularly in doubt whether serendipitous discovery wouldn’t be prohibited by an intelligent aggregator. (If you are looking for intelligent aggregators, I’d recommend to check out Prismatic and Flipboard.)

So which tool am I using for dumb aggregation? It’s called Feedly and allows you to select the sources you’re interested in and presents all new articles as lists. Perfect for filtering. You can also use it for reading, but that’s not what I’m doing.

My workflow starts by going to my Feedly subscriptions and scanning over them. If I’m interested in an item, I open it in a new tab and continue to scan. I will not start to read any of the posts until I’m finished scanning posts. (Hint: the short cut for opening a link in a new tab without automatically changing to the tab on the Mac is Cmd + Click) I call this first pass the scanning phase.

After scanning all new articles, I close Feedly and go through the tabs. I decide which articles to read now, to read later and which ones to discard. When I decide to discard an article, I immediately close the tab. When I decide to read an article now, I immediately read it. These are particularly posts that have a short timely relevance and need to be read on that day. The articles that are of interest to me but of no immediate timely relevance are marked to be read later (which I will describe next). More than 50% fall in that category. I call this overall second pass the triage phase.

How do I save articles for reading them later? I was using Instapaper for a long time but now switched to Pocket. Pocket installs a browser extension that allows you to mark posts with one click to read them later. That’s exactly what I do with interesting articles that are of no immediate timely relevance.

I read most of the pocket articles on the go using the smartphone or tablet app that allows you to read all your saved articles even when being offline. So whenever I’m in the sub, a cab, a train, a plane, in the gym etc., I read the articles from there. I call this third pass the reading phase.

When I find that one article is so interesting that I need to take action based on the content later (i.e. send it to a friend, check out links etc.), I “heart” the article. I’ll check that “heart category” from time to time when using my computer to go through that list. I call this fourth pass the action phase.

But there is still room for improvement. What do you do when you are literally on the go? Reading while walking slows down your walking, and since walking is your primary task, there is no point in this trade-off. Wouldn’t it be great if you could listen to the articles you’ve saved? You can, and the text-to-speech is actually quite nice and auto-detects the right language for each article (in my case, that’s English and German). The app that I’m using for that is called Lisgo, and synchronises with all your saved Pocket articles. And that’s also the primary reason I’ve switched from Instapaper to Pocket since there was no text-to-speech extension for it.

I am pretty happy with the combination of Feedly, Pocket and Lisgo right now, and don’t see much room for improvement. How do you consume your daily news from the web? For me, it’s broken up into scanning, triage, reading and action phase. Which is, to some extent, pretty similar to how I treat my email inbox.

Why do people prefer to archive their life instead of living it?

When young-ish people do activities nowadays, like going to a party, to a concert or whatever, you’ll see many of them perceiving most of the event through the display of their phones, busy taking videos and pictures. It seems like it is more important to many folks to document what they’re doing than actually doing it.

Why do people rather record videos and photos of their activities than just enjoying the moment? If you ask them, most will pretend that they want to be able to remember what happened years later, and archiving it digitally is the best way to not loose track of what was going on. While that is true, undeniably, I suspect that most people will never look at most of their old photos and videos ever again. The predominant behaviour on the other hand is to immediately post and share the videos and photos, most likely in order to improve the digital depiction of one self. In other words it might be more important to people to improve on the way others perceive them digitally than really living what they’re pretending to be doing.

What is even more surprising than archiving real life instead of living it is that many people rather talk (writing messages) to other people on their phone than to the people they’re socialising with. Sure, whenever I get an email or a text, I feel slightly pressured too, to answer as quickly as possible. But only very few of my incoming messages actually have to be answered within one hour or so. And that won’t be much different for other people, so that can’t be the only reason.

Is it that we usually meet up with people that are less important than other folks we’re texting with? I don’t think that’s the reason either. So what is then? I think part of the answer might be the decreasing attention span of people. It might just be too hard for people to devote their attention for an extended period to the folks hanging out with them. The other reason might be social efficiency. Since we are already spending time with the people we’re hanging out with (even if we’re texting instead), we can socialise with other people on our phone at the same time. We can, in other words, satisfy the socialising needs of more people at the same time, even if we’re sacrificing the direct interaction with the people next to us. But since most of them are doing the same thing, it can’t be that harmful, right?

Do I do it? Well, certainly not the life-archiving part, but that’s mostly because I don’t really enjoy taking photos too much. How about communicating with other people on the phone while socialising in real life? I try not to. Call me old school, but I still find it a little insulting. However, I sometimes do it if I really feel that I have to answer certain texts in a timely manner. I also start doing it when the people I’m socialising with are doing it, basically to hold a mirror up to them; needless to say that no one ever gets my subtle behavioural critique though.

BetaEdit

Victor Lingenthal and myself have just launched a new toy project called BetaEdit that allows you to rapidly test web software online. It is based on our JavaScript Framework BetaJS (for both client and server) and serves as a running example for that as well. While the current version of the software really only is a starting point, stay tuned for major updates to come soon. We’d love to hear your feedback@betaedit.com.

All new announcements will be made on our BetaEdit blog.

JavaScript Video Player

We just added a video player plugin for BetaJS that allows you to stream videos to many different devices. It is a meta player that currently includes JWPlayer 6 and JWPlayer 5. Depending on the given device and the given video sources, the player decides which player and which source matches your device best. For instance, JWPlayer 6 does not work with older version of iOS, etc.

It also includes a couple of useful player views that allow you to have embedded videos, embedded videos showing a video popup, and video popups by direct invocation.

People should not be afraid of their governments. But actually they couldn’t care less.

After all has been said and done over and over, it still took me a fair amount of time to realize what bothered me the most about the NSA affair. Have we been really surprised by what the NSA is doing? Not so much. Is Edward Snowden a hero? Probably, but hero is too much of a militaristic term for my taste. Should we be okay with being spied on? Of course not. Isn’t the data’s content much more important than the meta-data that the NSA is tracking? Not at all. But those who have nothing to hide have nothing to fear? Quite on the contrary.

The idea of spying massively on people is not new. Most classic dystopian stories like 1984 are centered around an omnipresent all-seeing eye kind of state. Even in our own recent history we had governments like the German Democratic Republic in Eastern Germany that massively spied on their people with severe consequences for major parts of the population. So not much about what we’re seeing today is a genuinely new phenomenon.

The graphic novel V for Vendetta also draws the picture of a dystopian state, and the main character that tries to liberate the oppressed people states at some point one of the most iconic sentences about dystopian states: “People should not be afraid of their governments. Governments should be afraid of their people.”

I always liked that sentence, and I still do. Now interestingly, while it should apply to the current situation, it doesn’t. It’s actually the other way around, although the government is spying on the people.

Edward Snowden himself closed with the following words in his famous interview with the Guardian, speaking about what he fears to come out of all of this:

The greatest fear that I have regarding the outcome for America of these disclosures is that nothing will change. People will see in the media all of these disclosures. They’ll know the lengths that the government is going to grant themselves powers unilaterally to create greater control over American society and global society. But they won’t be willing to take the risks necessary to stand up and fight to change things to force their representatives to actually take a stand in their interests.

But people are not only not willing to stand up, they, by and large, couldn’t care less. To me, that’s the most interesting thing about the whole thing. And even I feel fairly detached about the matter. But haven’t I been idealistic once? Haven’t I swore to myself to being one of the first to stand up against any form of state-sanctioned oppression or state-sanctioned undermining of civil rights? And yet, here I am, shrugging my shoulders. Is this what getting old feels like? You give up on your ideals? Maybe. But even if so – sadly enough -, this can only be half the truth.

Because I’m not the only one. There are millions of young people that must have had the same thoughts that I had at that age. But apparently they don’t care either. Is it the openness of the Internet that transformed our innermost perception of privacy and of the importance of privacy? Do we first need to see with our own eyes how our governments might use our own data against us?

I am not trying to be apologetic here. I am just surprised by how less people care, including me.