Automated Generation of POI Maps from Open Data

de en

The website of the Chaos Computer Club features a map of Germany and its surrounding countries, where hackerspaces who consider themselves as part of the CCC are marked.

Up until now, this map (shown in Figure 1) had been created and updated manually in a time consuming process. Not only did this cause the map to be almost always outdated, but the process was also error-prone, with many hackerspaces being misplaced.

The old, manually created map of hackerspaces.
Figure 1: The old, manually created map of hackerspaces, last updated 2018.

The desire to automate the creation of this map had been voiced multiple times over the last few years, but all previous attempts have never been finished. Since I was one of the last persons asking for the map to be updated, I went ahead and started my own attempt at generating this map automatically.

My plan was to generate the map entirely from Open Data sources (apart from the actual hackerspace locations, where the "single source of truth" is a password-protected MediaWiki server). I also planned to generate the map background, rather than using a preexisting map, in order to avoid dealing with map projection issues.

Country Borders from Wikidata

In order to generate the map's background layer, the borders of Germany's federal states as well as surrounding countries were retrieved from Wikidata. Geographic regions in Wikidata are (or can be) equipped with the «geoshape» property, describing the outline of the region as a GeoJSON file.

For the federal states, the SPARQL query to retrieve these geoshape files is fairly simple:

SELECT DISTINCT ?item ?map WHERE {
  # ?item is instance of "Federal State of Germany"
  ?item wdt:P31 wd:Q1221156.
  # ?item has geoshape ?map
  ?item wdt:P3896 ?map.
}

Unfortunately, the same can't be said for the query to retrieve the geoshapes of all European countries, due to data being somewhat inconsistent:

  • When multiple properties of the same type are defined for a subject, they may carry different ranks. If one of these properties has a preferred rank, Wikidata will consider it «truthy» and not return any other properties of the same type.

    Some countries had multiple geoshape properties, differing in resolution («zoom level»), with the most accurate geoshape being considered preferred. However, in order to properly render a map, all countries borders needed to have the same resolution and come from the same data source, so that the borders of two neighbouring countries would coincide. To achieve this, the query needed to be crafted so that even non-preferred geoshapes were returned.

  • The type (preferred «instance of» property) of some countries was "instance of country", while others were considered "instance of sovereign state". Both requirements needed to be included in the query.

  • While some countries are labeled as "part of Europe", some others are labeled as e.g. "part of Central Europe", which would then again be "part of Europe". The "part of Europe" constraint thus needed to be declared transitively.

  • Some countries were labeled neither "part of Europe" nor "on continent Europe" at all. However, coincidentally, all the countries where these relationships were not defined, were labeled as "part of the European Economic Area".

Finally, taking all these inconsistencies into consideration, the query to retrieve the borders of all European countries looked like this:

SELECT DISTINCT ?item ?map WHERE {
  # ?item is instance of country or sovereign state (see filter below)
  ?item wdt:P31 ?stateclass.
  # ?item is transitively (+) part of Europe (Contintent) or EEA (see filter below)
  ?item wdt:P361+ ?euroclass.
  # ?item has geoshape ?map (including all non-deprecated results)
  ?item p:P3896 ?st.  # ?item has geoshape statement ?st
  ?st ps:P3896 ?map;  # ?st has geoshape value ?map
  MINUS { ?st wikibase:rank wikibase:DeprecatedRank }  # Exclude results where the ?st rank is "deprecated"
  # ?stateclass is "Country" or "Sovereign State"
  FILTER (?stateclass = wd:Q6256 || ?stateclass = wd:Q3624078).
  # ?euroclass is "Europe (Continent)" or "European Economic Area"
  FILTER (?euroclass = wd:Q46 || ?euroclass = wd:Q8932).
}

This query returned all the countries required for the map. It also included some of these countries' overseas territories, but those were then filtered at a later stage by comparing them to the chosen viewport of the generated map.

Hackerspace Names and Addresses from Semantic MediaWiki

There is a CCC-internal MediaWiki instance which makes heavy use of the Semantic MediaWiki (SMW) extension. Among lots of other things, every hackerspace which considers itself part of CCC is listed there, along with their properties (e.g. street addresses).

SMW makes these properties machine-readable and queryable through different interfaces. For example, queries can be formulated directly within MediaWiki pages to automatically generate and update tables or lists based on these properties. Another interface is a collection of HTTP API endpoints, making SMW queryable from the outside.

Among these API endpoints, I ended up using the Semantic Search JSON API to retrieve a list of all hackerspaces along with their addresses, display names (there can be multiple) and website URLs:

GET /index.php?title=Spezial:Semantische_Suche&x=[[Category:Erfa-Kreise]][[Chaostreff-Active::wahr]]/?Chaostreff-City/?Chaostreff-Nickname/?Chaostreff-Physical-Address/...&format=json HTTP/1.1

{
    "printrequests": [...],
    "results": {
        "Chaos Computer Club Basel": {
            "printouts": {
                "Chaostreff-City": [
                    "Basel"
                ],
                "Chaostreff-Physical-Address": [
                    "Birsfelderstrasse"
                ],
                "Chaostreff-Physical-Housenumber": [
                    "6"
                ],
                "Chaostreff-Physical-Postcode": [
                    "4132"
                ],
                "Chaostreff-Physical-City": [
                    "Muttenz"
                ],
                "Chaostreff-Country": [
                    "Schweiz"
                ],
                "Public-Web": [
                    "https://www.ccc-basel.ch/"
                ],
                "Chaostreff-Longname": [
                    "Chaos Computer Club Basel"
                ],
                "Chaostreff-Nickname": [
                    "CCC Basel"
                ],
                "Chaostreff-Realname": [
                    "Chaos Computer Club Basel"
                ]
            },
            "fulltext": "Chaos Computer Club Basel",
            "fullurl": "https://doku.ccc.de/Chaos_Computer_Club_Basel",
            "namespace": 0,
            "exists": "1",
            "displaytitle": ""
        },
        ...
    },
    "serializer": "SMW\\Serializers\\QueryResultSerializer",
    "version": 2,
    "rows": 16
}

Of course the request params would be URL-encoded, in this example they are unencoded for better readability.

Now that we have the full list of hackerspaces, there is one more step required to display them on a map: Only the street addresses are entered in this wiki. In order to display them on a map, the addresses need to be geocoded, i.e. translated into coordinates. One such geocoding service is Nominatim. It uses OpenStreetMap to look up addresses and return WGS84 coordinates.

However, not all the street addresses returned by SMW are resolvable: Some hackerspaces encode additional information (e.g. the building level) in unstandardized, non-machine-readable formats. Others temporarily don't have a street address at all, e.g. because they are moving between places. In order to handle these special cases, less accurate addresses are queried, if a more accurate address fails to resolve. For example, in order to resolve the location of CCC Basel, the following address formats would be considered:

  1. Birsfelderstrasse 6, 4132 Muttenz, Schweiz
  2. 4132 Muttenz, Schweiz
  3. Muttenz, Schweiz

Automatic Label Layouting

Finally, we have all the data to render the map. In a first step, both the country borders and the hackerspace locations are transformed from their polar 3D longitude/latitude representation to a 2D Cartesian projection suitable for display on a map.

However, the map should also display labels for some of the hackerspaces (so-called Erfas), and these labels should neither overlap with each other, nor should they obstruct other hackerspaces marked on the map. On the old map, these labels had been laid out manually to avoid overlap. For the new map, this too should be automated.

For each label to be placed, the text is rendered using the chosen font, and measured to compute its bounding box. From this bounding box, a discrete set of candidates for placement of this label is generated. Figure 2 depicts a set of bounding box candidates for CCC Berlin, placed around the marker of CCC Berlin.

Bounding boxes of all label position candidates around CCC Berlin.
Figure 2: Bounding boxes of all label position candidates around CCC Berlin.

Weights are assigned to each candidate based on multiple factors. The exact weights are magic numbers obtained through trial and error, but it mainly boils down to these factors:

  • Prefer candidates closer to their marker
  • Prefer candidates furthest away from other markers and (already placed) labels
  • Prefer candidates to the left and right over those at the top or bottom
  • Prefer candidates to the right over those to the left (unless close to the right-hand side of the map)
  • Penalize candidates overlapping with other elements with a huge weight

The actual label placement happens in three steps:

  1. If a label has at least one candidate that doesn't overlap with any other label candidate or marker, this candidate is chosen. Since this step eliminates the other candidates of the label, it may render candidates of other labels usable again, if they were were obstructed before.
  2. If there are still some unplaceable labels, they are placed where they cause the least overlap.
  3. All weights are recomputed and already placed labels are optimized based on these weights. This will reduce possible overlap and spread labels further apart, making the map better readable.

This algorithm may not be perfect, but it seems to find a locally-optimized solution which at least to the human eye does not look too bad.

Conclusion

After all the labels are laid out, an SVG image is generated and written to a file. A CSS stylesheet is embedded in the SVG file so it can be used standalone, either for directly displaying it in a web browser, or for further processing. The markers and labels in the SVG image are amended with clickable links, leading the user to each hackerspace's website.

Additionally, a PNG image is generated, alongside with a HTML imagemap file (with the same links as in the SVG) so that the map can be used interactively in web browsers with limited or no SVG support.

The new, fully automatically created map, as it is now also available on ccc.de.
Figure 3: The new, fully automatically created map, as it is now also available on ccc.de.

The new map (as seen in Figure 3) has now replaced the old one on ccc.de. The entire script to generate the map can be found in my Gitea repo. However, in order to actually use this script as-is, you need to know the Basic Auth credentials for the CCC MediaWiki instance. If you just want to check it out, you can use the example data in the repo, as described in the README.