Preface

It is the most beautiful day of the year so far.

― Jozef Colpaert.

When I started studying engineering, I thought data management was in a far more advanced state. I could not imagine that still, in 2007, manual intervention was needed to find the right data source and to integrate it into your own application. I could also not imagine that it would be illegal to – in your spare time – create a more mobile friendly webpage for accessing the time schedules of the Belgian railway company. Nonetheless, the iRail project still received a cease and desist letter, claiming a breach of Intellectual Property Rights (ipr). This sparked my interest in data availability in general, as much as it sparked my personal interest into ipr, as I was desparate to understand whether creating such a website was indeed illegal.

IRail did not stop the project. Instead, a non profit organization was set up in 2010 to foster creativity using mobility data, bringing together a community of enthusiasts – I was one of these – after the story hit the media. The organization released an Application Programming Interface (api) which would allow third parties to integrate railway data within their own services. The api is still online as an open-source project, accepting contributions from other transport data enthusiasts. Thanks to the iRail project, I was able to access most interesting research data in primetime. The query logs of this api for instance, would prove themselves priceless.

A visualization of the popularity of the route planning api of the iRail project today. Every time a line appears, someone looked up a journey through our data interface. Test this yourself by looking something up on iRail.be (the frame will refresh each minute).

In 2011, I wrote my master’s thesis on extending the Open Data publishing framework The DataTank – which I started earlier that year – with a queryable interface over http. The goal of the thesis was to offer a better experience to developers that wanted to use governmental datasets in their own software. While we did design a query language for small in-memory documents, we did not take into account the scalability of these server interfaces, neither did we study the effects on the information system as a whole. We only tested the overal query execution time, which would show an increasing response time when an increasing amount of datasets would be combined. The querying interface on top of The DataTank later disappeared again from the stable release, yet the ambition to make Open Data more used and useful remained.

Challenged by officials whom I wanted to prove there was indeed commercial value in Open Data, I co-founded the start-up FlatTurtle with Yeri Tiete, the founder of iRail, and Christophe Petitjean, the excentric business owner of rentalvalue that came up with the idea to sell information displays to professional real estate owners. These information displays would show the latest information about for example public transport, weather, news, or internal affairs. With FlatTurtle, we were unable to reuse datasets by relying on basic building blocks of the Web: at the time, for example, the servers were not using proper cache headers, legal conditions were not clear, and identifiers would conflict and change across data updates.

The company did not make me financially rich, yet what I have been able to learn in terms of running a business in this period was invaluable. Furthermore, it taught me how – while there are a lot of people – there is still a limit to the amount of people you can meet in one day. While trying to change the world for the better, whether it is with a product you sell, with a research proposal, or with a general idea – such as the one of Open Data –, your impact will be as big as the quality of your pitch to explain the solution. With this in mind during my time pursuing a PhD, I tried my best when I gave one of the many invited talks explaining Linked Open Transport Data and its importance. Having to explain this subject over and over again influenced the first chapters of this book heavily.

At the same time, the iRail non-profit merged together with other initiatives such as Open Street Map, Creative Commons and Open Access, into Open Knowledge Belgium. Still today I am part of the board of directors of that non profit organization, trying to create a world where knowledge creates power for the many, not the few.

When starting my PhD in November 2012, I also thought the field of Web Engineering would be in a more advanced state. In the field of Web apis for example, new vague paradigms are still today popping up without clear comparisons between their advantages and disadvantages. In this PhD, a modest contribution is done to measure data publishing interfaces for the purpose of public transit route planning. Instead of only measuring response times, I measured the impact of this interface on the information system as a whole, measuring cost-efficiency of a server interface, cacheability for a certain mix of queries, and described non-measurable benefits such as flexibility for developers or privacy by design.

Overall over the last four years, I have been happy. Being able to come home in the evening with a feeling that you are contributing to a better world is how I would describe my dream job. When confronted with these kind of life questions late at night in a bar with friends, I would – like a geek does – with great pleasure and in great length explain the Kardashev scale. This scale in 3 levels is a method of measuring a civilization’s level of technological advancement. Today, humanity is at level zero, not being able to survive a natural disaster and not being able to be independent for its energy source from the host planet. Crucial to becoming a type 1 civilization – and it is unsure whether humanity will become a type 1 civilization – is to have an information system in which each individual can contribute to the civilization’s knowledge and use it to make informed decisions. Such information systems will more than ever in the next years play a crucial role in – just to name a few – education, science, decision making, and politics.

Belgium in particular has been an interesting country to do research into governmental organizations. It is a dense country, where for research into governmental organizations, as in a small geographic area, different governmental levels, from local to European, and companies of different scale can be studied. This also makes using a decentralized approach to data governance a necessity: the Belgian federal government’s datasets have to be interoperable with the datasets from departments and agencies of the Flemish government, as well as with the databases of all local governments. In our work with these organizations, we had an interdisciplinary team in which I worked together with people from among others mict and smit. While it is impossible to mention everyone, I owe a big thank you to two people with whom I without doubt have collaborated the most so far. Nils and Mathias, I have had a blast studying the road to Open Data together. I look forward to, for the next few years, to make the region of Flanders a leading example in real-time Open Data publishing, and to grow our interdisciplinary team of Open Data researchers as there is still a lot of work left undone.

Ruben, it is a true honor to be able to be part of your team. You not only set the bar high for yourself and the team, you also know how to guide the team towards success and impact. I look forward to continue working on Knowledge on Web-Scale under your supervision as a postdoctoral researcher. Erik, thanks for always protecting my back.

To my – old and new, close and distant – colleagues, project partners, and people encountered in local, regional, federal, and European government organizations: thank you for being an infinite source of inspiration. You will undoubtedly recognize parts of this dissertation that were the result of a discussion we may have had or problem you confronted me with.

Mom and dad, you often refer to me as the optimist (I optimistically call it realism). I am happy to be able to live with this trait. As the quote used to introduce this preface goes, I am sure this is not merely caused by genetics, but that this was also caused by the way how I was raised. From my perspective – that is all I can speak for – all went well in this process: thank you!

Finally, Annelies, thank you for reminding me from time to time there is more to life than Open Data.


Pieter