Chapter 4
Raising Interoperability of Governmental Datasets

We want to see a world where data creates power for the many, not the few.

― Open Knowledge International.

Maintaining datasets decentrally is a challenging job constantly weighing trade-offs: will you invest more in keeping the history of your dataset? Or will you invest more in creating specific materialized versions of your specific data? As researchers within our team at imec, we were able to study different organizations from the inside out to study the barriers to publish data as Open Data. We discuss three clusters of datasets that we have collaborated on to shape the Open Data policy. First, we introduce a proof of concept for Local Council decision as Linked Data, a project that gets further developed in 2017–2018. Then we discuss the datasets of the Department of Transport and Public Works – for whom we did an assessment of their Open Data strategy – and their history. Finally we discuss the governance of data portals and how we can discover datasets through their metadata. We found that raising the interoperability within these organization is not easy, yet at a certain moment we were able to suggest actions to improve on the state of the art. In each of the three use cases we formulate a list – supported by an organization – of next actions to take. Datasets necessary for the use case of route planning often originate from these datasets that initially were not built with route planning in mind. Yet, still, we would like that route planners ideally take into account council decisions in order to update their maps.

As we study datasets governed within Europe, we need to understand the policy background. Three European directives – the Public Sector Information (psi) directive, the Infrastructure for Spatial Information in the European Community (inspire) directive, and the Intelligent Transport Systems (its) directive – regulate how respectively public sector information, geospatial information, and transport data need to be shared. These directives still leave room for implementation.

First, the psi directive states that each policy document that has been created in the context of a public task, should be opened up. Thanks to this directive, open became the default: instead of having to find a reason to make a document publicly available, now a good reason needs to be made public in order to keep certain datasets private.

Next, the inspire directive regulates how to maintain and disseminate geospatial data. It defines the implementation rules in order to set up Spatial Data Infrastructure (sdi) for the European Community. inspire has a holistic approach, defining rules for among others metadata, querying, domain models or network services. Although it originated for purposes of EU environmental policies and policies or activities which may have an impact on the environment, the 34 themes in inspire, can be used within other domains as well. Take for example the domain of public transport, where entities such as the “railway station of Schaerbeek” or administrative units such as “the city of Schaerbeek” also need to be described using spatial features.

Finally, the its directive focuses on the transport domain itself. It is in its essence not a data sharing directive. It has delegated acts in which the sharing of data in an information system is described.

In this chapter, we will chronologically run through three periods. The first period was when the psi directive gained traction and when the first Open Data Portals were set up. I just started my research position and was trying to create a framework to study Open Data, its goals, and how we could structure the priorities when implementing such a data portal. Next, we discuss transport datasets within the Flemish Department of Transport and Public Works (dtpw). There, the overlap between the three directives becomes apparent. Finally, we discuss an opportunity to publish the content of local council decision as the authoritative source of a variety of facts.

Data portals: making datasets discoverable

The DataTank

The first project I helped shaping at imec was The DataTank [1]. It is an open-source project that helps, in its essence, to automate data publishing, and bring datasets closer to the http protocol and the Web. It contains tools to republish unstructured data in common Web-formats. Furthermore, it automatically gets the best out of the http protocol, adding:

  • An access-control-allow-origin: * header, which allows this resource to be used within a script on a different domain.
  • A caching header, which allows clients to know whether the resource updated or not.
  • Paging headers when the resource would be too large.
  • Content negotiation for different serializations.
In order to allow a quick overview of the data in the resource, the software also provides a best effort html visualization of the data.

When creating The DataTank, we built it around the 5 stars of Linked Open Data [2]. For each star, our goal was to provide a http interface that would not overload the back-end systems. The data source interoperability can be lifted on the legal level, where the metadata also includes a uri to a specific license. Also on the technical level, it published a dataset over http and added headers to each response. Furthermore, syntactically, it provides each dataset in a set of common serializations. Semantically, The DataTank can read, but does not require, data in rdf serializations as well. Using the tdt/triples package, one can automatically look up the documentation of a certain entity using http uri dereferencing [3]. This requires the dataset itself to use uris and an rdf serialization.

5 stars of Open Data Portals

When making the requirements analysis of the future roadmap of The DataTank back in 2013, we published the 5 stars of data portals [4]. The idealistic goal was that a data portal should allow the data to be maintained as if it was a common good. The five stars of Open Data Portals are to be interpreted cummulatively and were defined as follows:

★ A dataset registry
A list of links towards datasets that are openly licensed
★★ A metadata provider
Make sure the authentic sources inside your organization are adding the right metadata fields (e.g., according to dcat). This list of datasets should in its turn be licensed openly, so that other portals can aggregate or cherrypick datasets.
★★★ A cocreation platform
Support a conversation about your data.
★★★★ A data provider
Make sure the resources inside your organization are given a unique identifier and that you have interoperable access to the datasets themself, not only its metadata.
★★★★★ A common datahub
Have a way for third parties to contribute to your dataset.
The DataTank’s functionality focuses on the second star and development on further stars were never persued. Data projects such as Wikidata, Open Corporates, Wikipedia, or Open Street Map come closest to what was envisioned with the fifth star.

Open Data Portal Europe and the interoperability of transport data

The European Data Portal brings together all data that is available on national Open Data Portals in Europe. The portal is available at https://www.europeandataportal.eu/ Analysis of the current 11,003 datasets can create an overview of data interoperability in Europe.

On a legal level, we notice that the majority of datasets use standard licenses, which increases legal interoperability. However, the Belgian public transport datasets are still not on the European data portal as they do not have an Open License. Technically, while 85.5% of the datasets are directly accessible through a url, 1,584 (14.5%) dataset descriptions only link to an html page with instructions how to access the dataset. On a semantic level, only 295 (2.7%) datasets use an rdf serialization.

Queryable metadata

Data describing other datasets, typically maintained by a data portal, need to be available for reuse as well. We can thus apply the theory of maximizing reuse of datasets to dcat data. ckan is at the time of writing the most popular data portal software. The standard license within The DataTank and within ckan is the Creative Commons Zero waiver, which provides for full legal interoperability with any other dataset. The data is technically available over http and is typically available in one or more rdf serializations. However, we still notice some interoperability issues with data portal datasets today.

One of the problems happens – despite using rdf – on the semantic level. For instance, when harvesting the metadata from European member states’ data portal on the European data portal, new uris are created per dataset instead of reusing the existing ones.

high availability high bandwidth high server cost low bandwidth DCAT data dump SPARQL (query language) endpoint
The Linked Data Fragments (ldf) axis applied to metadata.

One of the goals behind dcat is to allow metadatasets with similar semantics to be maintained decentrally. However, the querying itself still happens centrally. We proposed an in-between solution with tpf. [5] This way, a European data portal would not have to replicate all datasets anymore, but would only be the authoritative registry for national data portals. Thanks to the tpf interface and extra building blocks such as a full text search exposed on the level of each member state, the same user interface would be able to be created. Furthermore, the harvesting that takes place today would be automated using http caching infrastructure.

Open Data in Flanders

In 2015, we had the opportunity study a common vision throughout all the suborganizations within the Flemish Department of Transport and Public Works (dtpw) in Flanders. We studied the background of several datasets, and without taking a position ourselves, we would interview all data maintainers and department directors on their experience with “Open and Big Data”. On the basis of this input, we created recommendations for action that would help forward the Open Data policy within the dtpw.

In this section, we first describe the background of different key datasets in Flanders, to then discuss the specific datasets at the dtpw. We then report on the workshops and the challenges that came out of the interviews and explain our recommendations for action.

A tumultuous background

In Flanders, the Road Sign Database (rsd) project started in 2008 as a result of an incident the year before where a trucker was jammed under a bridge, as the driver was unaware the truck was too high. The investigation after the incident showed there were no traffic signs notifying drivers of a maximum height for vehicles. As a result, the traffic in the city was jammed for several hours. As this was not the first time something like this happened, the rsd was born by ministrial decree, building a complete central database of traffic signs in Flanders. However, the rsd has been the subject of data management research [6], as the database did not live up to its expectations.

The responsibility to implement this database was split: the Agency for Roads and Traffic (art) would have to maintain the traffic signs on the regional roads, while the Flemish Department of Transport and Public Works (dtpw) would have to maintain the traffic signs on the local roads. A company took 360° pictures of all roads in Flanders, and using these pictures, all road signs were indexed in a register. Two years later, by August 2010, this inventory was complete for all 308 municipalities and the dtpw made the rsd of the municipalities available via a software application. In order to keep the database up to date, the munipalicaties were asked to update it through this application, and requested to sign an agreement. Some municipalities refused to sign the agreement, and others, who signed the agreement, complained this initial application was too complex and untrustworthy. Quickly, less than 30% of the municipalities kept using the rsd [6].

In 2011, for its own needs, art lauched a roads database instead of the rsd, which now includes specific information on the roads. In the same year, also Google Street View was launched in Flanders, launching an application to view the 360° pictures Google took of all streets, which local councils found easier to use than the rsd application. In March 2013, a new, more stable and user-friendly application of the rsd was launched, yet the adoption of the application remained low. The original cost of the rsd was estimated to be €5 million, yet by Augstus 2010, this estimate has risen to €15 million, and by March 2013, when the new more user-friendly version should have launched, had risen to €20 million. In April 2014, yet another database was launched at the Agency for Geographic Information: the General Information Platform for the Public Domain. The database collects all road works in order not to block roads that are currently part of a deviation. Up to date, local governments are legally bound to filling out numerous database of the higher government, yet in practise, only few databases remain well maintained.

In August 2014, a new minister of mobility was appointed. While first the new minister was looking into making it also legally binding to fill out the database, he later looked at limiting the scope of the rsd to speed signs, defeating its initial purpose, yet making sure third parties would show the right speed limit in in-car navigation systems. He would also demand a study to see what other mobility datasets could be shared as Open Data for the purpose of serving route planner developers, which in its turn funded the research reported in this paper.

This aligns well with the European “Once Only” principle. In order to off-load local governments, one way that is currently looked into is to make local council decisions available as Linked Data. As part of the psi directive, local administration need to open up the decisions taken in the local council. The numerous databases local governments have to fill out today, would this way be able to fetch the fragments of the local council decisions they are interested in themselves. Today, this approach is being evaluated. Perhaps this approach can be rolled out over all 308 municipalities. We will discuss this project in more detail in the next section.

In order to lower the cost for adoption for third parties to integrate the data that will be published, we look into raising the interoperability of these data sources. In the next chapter, we describe the method in order to find the challenges the dtpw still sees that need to be overcome in order for all agencies to publish their data for maximized reuse.

Discussing datasets at the Department of Transport and Public Works

Out of 27 interviews with data owners and directors working in the policy domain of mobility, we collected a list of datasets potentially useful for multimodal route planning. Definitions of a dataset however diverged depending on who was asked. Data maintainers often mentioned a dataset in the context of an internal database, used to fulfil an internal task, or used to store and share data with another team. When talking to directors, a dataset would be a publicly communicated dataset, e.g., a dataset for which metadata can be found publicly, a dataset that would be discussed in politics, or a dataset the press would write stories about. In other cases, a dataset would exist informally as a web page, or as a small file on a civil servant’s hard drive.

A data register of the mobility datasets that are part of the Open Data strategy can now be found at http://opendata.mow.vlaanderen.be/. The list consists of publicly communicated datasets as well as informal data sources published on websites. During the interviews, we were able to gather specific challenges related to specific datasets useful for multimodal route planning, summarized in the following table: for a dataset to be truly interoperable, all boxes need to be ticked.

Selection of studied datasets with their interoperability levels as of October 2016
Datasetlegaltechsyntaxsemanticquerying
Traffic Events open license yes xml no uris file
Roads database open license yes xml no uris no
Validated statistics open license yes csv no uris no
Information websites no yes html no uris linked documents
Public Transit timetables closed license over FTP ZIP no uris dump
Road Signs no license yes none no uris no
Address database open license yes xml PoC dump and service
Truck Parkings open license yes xml no uris file
Metadata catalogue open license yes xml yes dump

Traffic events on the Flemish highways

This dataset is maintained by the Flemish Traffic Center, has an open license and is publicly available. It decribes the traffic events, only on the highways, to which the core tasks of the traffic center is limited. The datasets can be downloaded in xml. For the semantics in this xml, two versions in two different specifications (otap and datex2) are available, for which the semantics can be looked up manually. The elements described in the files are not given global identifiers however, making it impossible to refer to a similar object in a different dataset. The dataset is small and is published as a dynamic data dump. As the dataset is small enough to be contained in one file, it can be fetched over http regularly, as well as the updates. The http protocol works well for dynamic files, as caching headers can be configured in order not to overload the server when many requests happen in a short time. The file, except for the semantic interoperability, thus provides also as a good dataset for federated route planning queries.

Road database for regional roads

The road database for the regional roads is maintained by the art. It is a geospatial dataset and already has to comply to the inspire directive. Its geospatial layers are thus already available as web services on the geospatial access point of Flanders: http://geopunt.be. The roadmap in 2016 was to also add an open license and to also publish the data as linked files using the tn-its project’s specification (http://tn-its.eu/).

Validated statistics of traffic congestion on the Flemish highways

Today, validated statistics of traffic congestions on the Flemish highways are published under the Flemish Open Data License by the Flemish Traffic Center. A website was developed, which allows someone that is interested to create charts of the data, as well as export the selected statistics as XLS or csv. The legal, technical, and syntactic interoperability are thus fully resolved. Yet when looking at the semantic interoperability, no global identifiers are used within the dataset. Furthermore, when looking at the querying interoperability, machines are even discouraged from using the files, as a test for whether you are a human (a captcha) is used to prevent machines from discovering and downloading the data automatically. When requesting a csv file, the server generates the csv file with historic data on the fly from the database.

Information Websites

Examples of such datasets are the real-time dataset of whether a bicycle elevator and tunnel is operating (http://fietsersliften.wegenenverkeer.be/), a real-time dataset of whether a bridge is open or not (http://www.zelzatebrug.be/) shows when a bridge north of the city of Ghent will open again (when closed), or a dataset of quality labels of car parks next to highways (http://kwaliteitsparkings.be). The three examples mentioned can be accessed in html. Nevertheless, this as well is a valuable resource for end-user applications, as when the page would be openly licensed and when the data would be annotated with web addresses, the data can be extracted and replicated with standard tools and questions can be answered over these different data sources. These three examples are always only technologically and syntactically interoperable, as they use html to publish the data, yet there are no references to the meaning of the words and terms used. Furthermore, there is no open license on these websites, not explicitly allowing reuse of this data. Finally, as the data can easily be crawled by user-agents and thus replicated, we reason that in a limited way, the data would be able to be used in a federated query.

Public transit time tables maintained by De Lijn

Planned timetables, as well as access to a real-time webservice, can be requested through a one-on-one contract. This contract results in an overly complex legal interopability. First, a human interaction needs to request access to the data, which can be denied. Furthermore, in the standard contract, it is not allowed for a third party to sublicense the data, which makes republishing the data, or a derived product, impossible. The planned timetables can be retrieved in the gtfs specification, which is an open specification, making the dataset syntactically interoperable. The identifiers used within this dataset for, e.g., stops, trips, or routes do not have a persistency strategy. Therefore, the semantic interoperability cannot be guaranteed. As a dump is provided, potential reusers have access to the entire data source reliably. The querying interoperability could be higher when the dataset would be split in smaller fragments.

Road Sign Database (rsd)

The database, in October 2016, is still only available through a restricted application. It is a publicly discussed dataset, as its creation was commissioned by a decree. On a regional level, the rsd is in reality two data stores: one database for regional road signs, managed by art, and a database which collected the local road signs, managed by the department itself. Some municipalities would however also keep a copy of their own road signs on a local level, leading to many interoperability problems when trying to sync. Sharing this data with third parties however only happens over the publicly communicated rsd, which is only accessible through the application of the rsd itself.

Address database

A list of addresses is maintained as well by another agency, called Information Flanders. The database has to be updated by the local administrators , just like the rsd. Thanks to the simplicity of the user-interface and the fact that it is mandatory to update the database while changing, removing, or adding addresses, the database is well adopted by the local governments. It is licensed under an open license, and it is published on the Web in two ways: a data dump is updated regularly, and a couple of web services, which work on top of the latest dataset are provided. Currently, Information Flanders is creating a Proof of Concept (PoC) to expose the database as Linked Data, as such every address will get a uri.

Truck parkings on the highways

This dataset needs to be shared with Europe, which in its turn makes this dataset publicly available at the European Union’s data portal (http://data.europa.eu/euodp/en/data/dataset/etpa). The dataset is available publicly, under an open license, as xml, using the datex2 stylesheet. The file however does not contain persistent identifiers, thus it is impossible to guarantee the semantic interoperability. As with the traffic events, the file allows for querying by downloading the entire file.

Open Data portal’s metadata

In order for datasets to be found by, e.g., route planning user agents, they need to be discoverable. The metadata from all datasets in Flanders are available at http://opendata.vlaanderen.be in rdf/xml. The metadata is licensed under a Creative Commons Zero license, and for each dataset and its way to be downloaded (distribution), a uri is available. In order to describe the dataset, the uri vocabulary dcat is used, which is a recommendation by the European Commission in order to describe data catalogues in an interoperable way. However, within inspire, another metadata standard was specified for geospatial data sources. Geodcat-AP is at the time of writing being created to align inspire and dcat (https://joinup.ec.europa.eu/node/139283). It is thus far, the only dataset that complies in an early form to all the interoperability levels introduced in this paper.

Challenges and workshops

We organised two workshops: one to validate the outcomes of the interviews with the different governmental organisations, the other to align the market needs with the governmental Open Data roadmap. In the first workshop, we welcomed a representative of each organisation within the dtpw that we had already met during a one on one interview. In the first half, we had an introductory program where we summarised the basics of an Open Data policy: the open definition, the implementation of the psi directive in Flanders, and the interoperability layer model. Furthermore we also gave a short summary of the results of the interviews with the market stakeholders. The key challenges were listed, discussed, initially identified by the heads of division of the dtpw. In order to identify these challenges, all interviews were first analysed in search of arguments both pro and con an open data policy. In the second half of this workshop we had three parallel break-out sessions in which we discussed unresolved questions that came out of the interviews. The arguments that returned most often were bundled and summarised into ten key challenges:

Should data publishing be centralised or decentralised within the department and what process should be followed?

This challenge refers to how data should reach the market and the public. A variety of scenarios can be envisaged here, each with benefits and disadvantages. This is not only a very practical challenge, but also one that relates to responsibility, ownership, and the philosophy behind setting up an open data policy. Potential scenarios that may resolve this challenge are also dependent on political decisions and the general vision for data management at the policy level. The workshop showed that a lot of political and related organisational aspects come into play in relation to this challenge. Attention to the balance between what is strategically possible and technically desirable is key in tackling this challenge.

Ensuring reusers interpret the data correctly

This refers to the fact that the context in which data are generated within government needs to be very well understood by potential reusers. Certain types of data require a certain domain expertise to be interpreted in a correct manner. While good and sufficient metadata can partially answer this challenge, in very specific cases a meeting with related data managers from the opening organisation will be required to avoid misinterpretation.

Acquiring the right means and knowledge on how to publish open data within our organisation

Setting up an open data policy also requires internal knowledge, particularly in larger and complex organisations. This means that the right people need to be identified internally, giving them access to training, while also giving them responsibility and clearing them of other tasks. In other cases, there may be a need to attract external knowledge on the topic that can then be internalized. In any event, proper training (and for example a train-the-trainer programme) is key in developing and executing a successful open data strategy.

Knowing what reusers want

If the goal of opening up is to maximize reuse of data, it is important to understand what potential reusers are looking for, not only in terms of content but also in terms of required standards, channels and interactions. Various forms of interaction can be used to gain this insight and the most appropriate one will depend on the organisations involved and their goals. One-to-one meetings are preferably avoided to alleviate concerns of preferential treatment, but co-creation workshops, conferences, and study days, can be a potential solution to this challenge.

Influencing what reusers do with the data

This is a challenge that governments certainly struggle with: providing open data means giving up control over what happens with that data. The question captured here is how governments can guide, nudge, or steer the reuse of data so that the resulting applications, services, or products still support the policies defined by them. While illegal reuse of data is by definition out of the question, the main challenge here – from the government’s perspective – is how to deal with undesired reuse. Again, consultation and dialogue are key: if the market understands the logic behind certain datasets as well as the reasons behind the government opening them up, undesired reuse becomes less of a potential issue. If on the other hand the market has ideas that government had not anticipated, a dialogue can take place on the practical implications of that reuse.

Supporting evidence based policy-making

Open data does not only serve reuse outside of the opening organisation, but can also be put to use within different departments and divisions for example. The challenge defined here is how to make optimal use of data to shape policies, based on real-life evidence. To resolve this, an internal department or cell that follows up all data-related activities, acts as single point of contact and defines data policies could play a role in examining how data can contribute to policy-making.

Creating responsibility

The main question here is where the role of government stops and to which extent it should further enhance or improve datasets beyond its own purposes. Furthermore, the basic minimum quality should be defined and explicated towards potential reusers. Tackling this challenge means clearly defining a priori where the role of government ends. As this is also a political discussion to some extent, having a clearly-communicated policy is key.

Raising government’s efficiency

This challenge deals with the potential gains that open data can mean for the Department as an organisation of organisations, but also for the Department as an actor within the policy domain of Transport and Public Works. Internal processes need to be established to ensure that an efficiency gain at the level of the own organisation is enabled.

Ensuring sustainability once a dataset is published

Next to covering short-term initiative, long-term processes also need to be set up within the organisation so that the open data policy is sustainable both for the government organisation as well as the outside world. This means a smart design of such processes and guidelines, which are also constantly evaluated and tested against practice.

Ensuring the technical availability of datasets

This final challenge questions the basic guarantees that government should provide towards the publication and availability of the datasets. At which point does this become a service that does not necessarily need to be provided by government for free and what is the basic level of support (e.g., is a paid sla provided for 24/7 data availability and tech support)? Again this decision is a political one, but clearly communicating to stakeholders and the market what they can expect is most important. This means having an internal discussion to define these policies.

These challenges were discussed in smaller groups during the workshop in order to formulate solutions. By giving answers or providing “ways out” of these questions, the participants were challenged to think together and develop a solution that is carried by everyone in the organisation.

In the second workshop, we invited several market players reusing Flemish Open Data. As a keynote speaker, we invited CityMapper (http://citymapper.com), which outlined what data they need to create a world-wide multimodal route planner.

Recommendations for action

The three directives (psi, inspire and its) were often regarded as the reference documents to be implemented. The best-practices for psi, as put forward by the “Interoperability solutions for public administrations, business and citizens” (isa²) programme, focus on Linked Data standards for semantic interoperability. However, the inspire directive for geospatial data, brings forward a national access portal for geospatial data services, in which datasets are made available through services. There is a metadata effort, called geodcat-ap, which brings the metadata from these two worlds together in one Linked Data specification. The its directive also puts forward their own specifications, such as netex at http://netex-cen.eu/, datex2 at http://www.datex2.eu/ and siri at http://www.siri.org.uk/. These specifications do not require persistent identifiers, and do not make use of uris for the data model. We advised the department to first comply to the isa² best practices, as getting persistent, autodocumented identifiers is the only option today to raise the semantic interoperability on web-scale. For datasets that already complied to the inspire or its directive, the department would also make these available as data dumps (e.g., as with the roads database).

The Flemish government has style guidelines for their websites. We advised to implement extra guidelines for the addition of structured data, e.g., with rdfa. Next, a conclusion from the first workshop was to invest in guidelines for the creation of databases. This should ensure each internal and externally communicated dataset is annotated with the right context.

In order to overcome the many organisational challenges, recommendations for action were formulated and accepted by the board of directors:

  • Keeping a private data catalogue for all datasets that are created (open and non-open);
  • All ICT policy documents need to have references to the Open Data principles outlined in the vision document;
  • The department of dtpw is responsible for following up these next steps, and will report to the board of directors;
  • Opening up datasets will be part of the roadmap of each sub-organisation within dtpw;
  • On fixed moments, there will be meetings with the Agency Information Flanders to discuss the Open Data policy.
Finally, also specific recommendations to data owners, as exemplified in the table, were given.

Local Decisions as Linked Open Data

Probably one of the most ambitious project I have been part of must have been on Local Decisions as Linked Open Data [7]. The core task of a local government is to make decisions and document them for the many years to come. Local governments provide the decisions, or minutes, from these meetings to the Flemish Agency for Domestic Governance as unstructured data. These decisions are the authoritative source for – to name a few – the mandates and roles within the government, the mobility plan, street name changes, or local taxes.

Base registries are trusted authentic information sources controlled by an appointed public administration or organization appointed by the government. The rsd and the address database – as discussed earlier – are good examples of such base registries. As these examples illustrate, maintaining a base registry comes with extra maintenance costs to create the dataset and keep it up to date. Could we circumvent these extra costs by relying on a decentral way of publishing the authoritative ?

In other countries, we see prototyping happening with the same ideas in mind. OpenRaadsInformatie publishes information from 5 local councils in the Netherlands as Open Data, as well as the OParl project for local councils in Germany. Each of these projects use their own style of json api. The data from the municipalities is collected through apis and by scraping websites and transformed to Linked Open Data. According to the Dutch project’s evaluation, the lack of metadata at the source causes a direct impact on the cohesion between the different assets because they can’t be interlinked. Next, the w3c Open Gov community group is discussing and preparing an rdf ontology to describe, among others, people, organizations, events, and proposals. Finally, in Flanders, the interoperability program of the Flemish Government, “Open Standards for Linked Organizations” also referred to as oslo², focuses on the semantic level and extends the isa core Vocabularies to facilitate the integration of the Flemish base registries with one another and their implementation in business processes of both the public and private sector.

We interviewed local governments on how they register and publish Local Council Decisions. We then organized three workshops which formulated the input for a proof of concept: two workshops were organized for creating a preliminary domain model, and one workshop was organized to create wireframes on how Local Council Decisions would be created and searched through in an ideal scenario. The domain concepts were formalized into two Linked Data vocabularies: one for the metadata and one for describing public mandates, formalized in https://lblod.github.io/vocabulary. The proof of concept consists of four components:

  1. an editor for local decisions,
  2. an html page publishing service responsible for uri dereferencing,
  3. a crawler for local decisions, and
  4. two reuse examples on top of the harvested data.

We introduced a virtual local government called VlaVirGem for which we can publish local decisions. The editor at lblod.github.io/editor is a proof of concept of such an editor, which reuses existing base registries. You can choose to fill out a certain template for decisions that often occur, such as the resignation of a local counselor or the installation of a new one. When filling out the necessary fields, the editor will help you: for example, it will autocomplete people that are currently in office. You will then still be able to edit the official document, which contains more information such as links to legal background, context and motivation, and metadata. When you click the publish button, the decision is published as a plain html file on a file host. The uris are created as hash-uris from the document’s url.

A harvester is then set up using The DataTank. By configuring a rich snippets harvester, html files are parsed and some links are followed to discover the next to be parsed document. The extracted triples are republished for both the raw data as an overview of the mandates. This data is the start of two reuse demos at http://vlavirgem.pieter.pm: the first for generating an automatic list of mandates, and the second is a list of local decisions.

Although Local Council Decisions contain high quality information in the form of non-structured data, the information in the authoritative source for local mandates today does not. In order to reduce the workload to share this information (e.g., a newly appointed counselor) with other governments or the private sector, the local decision can be published as a Linked Open Data document at the source.

Conclusion

In the previous chapter, I mentioned three ways to measure interoperability. The first method was to quantify the interoperability by measuring the similarity based on user-input. This method remained conceptual and remains untested in a real-world scenario today. On the basis of the projects that got funding over the next years, I hypothesize that for government data today, more obvious big steps towards raising interoperability can be taken that do not require a quantified semantic intoperability approach.

The second way was to study the effects of a publishing strategy. When more reuse could be noticed in services and applications, the better the balance between the cost for adoption and the benefits will be, and thus the more interoperable a dataset is. You can see the current reuse cases at the European data portal: https://www.europeandataportal.eu/en/using-data/use-cases As from the transport datasets published on the European data portal, only a limited amount of reuse can be found, for which the high impact datasets still have to be discovered. Also at the dtpw our team interviewed reusers, which would indicate that reuse of governmental datasets at this moment was limited [8].

The third way was to qualitatively study organizations on the basis of the interoperability layers. The first approach for this is through desk research. Again, a quick scan through the European or Flemish data portal reveils that the overall interoperability of these datasets is low. A second approach was to study the datasets qualitatively by means of an interview. This is what our team did at the dtpw, which revealed a list of current issues and recommendations for actions. The list of current issues could be a useful means for comparing the results of these qualitative studies too.

Finally, we elaborated on a proof of concept built for local council decisions as Linked Data. Council decisions annotated with Linked Data have the benefit of less manual work and that civil servants can search easier through current legislation. Our team also noticed a potential quality gain in editing due to correct legal references (even referencing to decisions of their municipality) and the use of qualitative factual data (e.g., addresses linked to the Central Reference Address Database). Finally, there are also efficiency gains in the publication of the decisions that are automatically published on the website of the local government, in the codex, and without additional efforts suitable for reuse by third parties. The Local Decisions as Linked Data today is a typical example of how the Flemish government can stimulate a decentralized data governance, yet offer centralized tools for local governments that are not able to follow up on European data standards.

In the city of Ghent in Flanders for one, it would take more than 2 months to update all route planning systems when updates happen. How long would it take a local council decision to get into a route planning system? Still today, someone from a local, regional or national government would need to contact route planning organizations and provide them with the right updates in the format they need. It is an engineering problem to automate the process of data reuse on the various interoperability issues, but also a policy problem to bring a better data culture within a large organization. Today, within the Flemish government, we notice a change towards “assisted decentralization”, in which data systems are architectural decentral, but where the regional government provides services to the underlying governments.

In these three use case, I started from the perspective of specific data publisher and worked my way up in the organization to see what would be needed for a higher interoperable dataset. In the next chapter, we will dive deeper into the field of transport data in specific.

References

[1]
Colpaert, P., Dimou, A., Vander Sande, M., Breuer, J., Van Compernolle, M., Mannens, E., Mechant, P., Van de Walle, R. (2014). A three-level data publishing portal. In proceedings of the European Data Forum.
[2]
Colpaert, P., Vander Sande, M., Mannens, E., Van de Walle, R. (2011). Follow the stars. In proceedings of Open Government Data Camp 2011.
[3]
Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R. (2014). Painless uri dereferencing using The DataTank. European Semantic Web Conference (pp. 304–309).
[4]
Colpaert, P., Joye, S., Mechant, P., Mannens, E., Van de Walle, R. (2013). The 5 Stars Of Open Data Portals. In proceedings of 7th international conference on methodologies, technologies and tools enabling e-Government (pp. 61–67).
[5]
Heyvaert, P., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R. (2015, June). Merging and Enriching dcat Feeds to Improve Discoverability of Datasets. Proceedings of the 12th Extended Semantic Web Conference: Posters and Demos (pp. 67–71). Springer.
[6]
Van Cauter, L., Bannister, F., Crompvoets, J., Snoeck, M. (2016). When Innovation Stumbles: Applying Sauer’s Failure Model to the Flemish Road Sign Database Project. IGI Global.
[7]
Buyle, R., Colpaert, P., Van Compernolle, M., Mechant, P., Volders, V., Verborgh, R., Mannens, E. (2016, October). Local Council Decisions as Linked Data: a proof of concept. In proceedings of the 15th International Semantic Web Conference.
[8]
Walravens, N., Van Compernolle, M., Colpaert, P., Mechant, P., Ballon, P., Mannens, E. (2016). Open Government Data’: based Business Models: a market consultation on the relationship with government in the case of mobility and route-planning applications. In proceedings of 13th International Joint Conference on e-Business and Telecommunications (pp. 64–71).