Random Post: get_results("SELECT ID,post_title,guid FROM $wpdb->posts WHERE post_status= \"publish\" ORDER BY RAND() LIMIT 1"); $p=$post[0]; echo ('' . $p->post_title . ''); ?>
RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About

    OpenStreetmap and Linked Geodata

    October 14th, 2010

    I’ve been travelling overmuch for the last six weeks, but met lots of lovely people. Most recently, during a trip this week to discuss the Open Knowledge Foundation‘s part in the LOD2 consortium project, had a long chat with Jens and Claus, the developers and academics behind Linked Geo Data, the Linked Data version of the OpenStreetmap data.

    linked geodata browser

    The most interesting bit for Unlock is the RESTful interface to search the data; by point, radius, and bounding box, by feature class and by contents of labels assembled from tags. So it looks like Opensearch Geo as much as Unlock’s place search api does.

    Claus made up a mapping between tags and clusters of tags in OpenStreetmap, to a simple linkedgeodata.org ontology. Here’s the mapping file – warning, it is quite large – OSM->linkedgeodata mapping rules. Pointed him at Jochen Topf’s new work on OSM tag analysis and clustering, Taginfo.

    As well as the REST interface, there is a basic GeoSPARQL endpoint using Virtuoso as a Linked Data store – we ran containment queries for polygons returning polygons with reasonable performance. There is a fracturing in the GeoSPARQL world both in proposed standards and in actual implementation.

    So we want to be able to return links to LinkedGeodata.org URLs in the results of our search. Right now Unlock’s place search returns original source identifiers (from geonames, etc) as well as our local identifiers, for place-names and shapes. In fact Unlock could help with the mapping across of Linkedgeodata.org URLs to geonames URLs, which are quite widely used, an entry point into the bigger Linked Data web.

    Another very interesting tool for making links between things on the Linked Data web is SILK, by Chris Bizer, Anja Jentsch and their research group at the Freie Universitat Berlin. The latest (or still testing?) release of SILK has some spatial inference capacity as well as structural inference. So we could try it out on, for example, the Chalice data just to see what kind of links can be made between URLs for linkedgeodata things and URLs for historic place-names.

    We’ve been setting up an instance of OpenStreetmap for Unlock and other purposes at EDINA recently. Our plan with this is to start working from Nominatim, which has a point-based gazetteer for place-names down to street address level, and attempt to extract and/or generalise shapes as well as points corresponding to the names. We’re doing this to provide more/richer data search, rather than republishing original datasets in some more/differently interpretable form. So there’s lots of common ground and I hope to find ways to work together in future to make sure we complement and don’t duplicate.

    Search and retrieve bounding boxes and shapes

    August 20th, 2010

    So we have a cool project running called Chalice, text-mining and locating historic placenames to build a historic gazetteer stretching back beyond Domesday, for a few areas of England and Wales. Claire Grover from LTG had some questions about using a shape based rather than point based gazetteer during “geographic information retrieval”, I thought it worth posting the answers here, as Unlock Places is able to do a lot more in public since the addition of Ordnance Survey Open Data.

    http://unlock.edina.ac.uk/features/Edinburgh – now by default returns info from OS Open Data sources including Boundary-Line as well as Meridian2 which have bounding boxes and detailed shapes for things like counties, parishes, though note they are all contemporary.

    (The above is just an alias for
    http://unlock.edina.ac.uk/ws/nameSearch?name=Edinburgh )

    So that’s a way to get bounding boxes and shapes for places that are in geonames, by comparing with other sources. The default search results have bounding boxes attached, one must look up a link to see the detailed geometry.

    Here’s how then to filter the query for place-names to a specific bounding box:

    We have ‘search for names inside the shape which has this ID’ on our todo list but don’t yet have a pressing use case – for many things bounding boxes are enough, one even wants that bit of extra inclusion (e.g. Shropshire’s bounding box will contain a lot more than Shropshire, but as Shropshire’s boundary has changed over time, some approximation about the shape is actually helpful for historic geocoding).

    Note that all place-names for UK will have county containment information – we added this for Digimap – one day they may start using it!

    You may also be interested to play around with http://mapit.mysociety.org/ – it has all the same OS Open Data sources and mostly the same set of queries but in places does a little more – it doesnt have geonames integrated, though.

    Lasma did some work on conflating different mentions of places based on point-polygon relationships (e.g. if a shape and a point have the same name, and the shape contains the point, the name is “the same thing”). However this was an experiment that is not really finished. For example –
    http://unlock.edina.ac.uk/ws/uniqueNameSearch?name=Edinburgh – i see this returns a shape in preference to a point – and wonder if it always will, if a shape is available. However this is not much use when you actively want a set of duplicate names, as you do while geoparsing. It would be good to revisit this, again, with concrete use cases. And of course it would be good to do this for much wider than the UK, with shapes extracted from OpenStreetmap. Investigating…

    Your questions answered, @klokancz from Oldmapsonline.org

    July 30th, 2010

    Klokan Petr Pridal, the creator of the wonderful Old Maps Online and MapTiler, has been using Unlock Places in some collaborative project work with the National Library of Scotland. He had some technical questions for us, and some questions about the intended usage future of the service, so I thought it worth-while republishing the answers here on the Unlock blog.

    First, I like a lot the API… It is well documented, with examples. Easy to use. [Thanks!]

    It is a bit confusing that you use “name” parameter instead of “q” (according the OpenSearch.org), but otherwise it is very nice. I was testing it with the Google Closure UI.AutoComplete, which is using JSONP and the callback function – it is similar to the jQuery module.

    Right, our use of the “name” parameter for query is a legacy thing – it comes from Unlock’s predecessor, GeoCrossWalk. There’s been a lot of development in OpenSearch Geo since then, it would be worth our while to support it. However, I see OpenSearch as mainly for collections of geo-referenced things (datasets or documents) – not for the georeferences themselves – though of course it could be used to do both.

    There’s also the quicklinks API which was a thought experiment. It looks a lot more like the new MapIt API, which we’re also thinking about implementing in front of Unlock.

    It is great that you have bbox for the results and external link for the detailed footprint. The API gives anybody access to your combined geonames database with other source of data like OSM or OS. Geometry or at least bounding box is something I horribly miss at GeoNames API – and you have solved this problem!

    Ordnance Survey have solved this problem for us with Open Data by releasing sources of shapes that can be used outwith academic publications! (We’ve always had this in the academic-use-only version of Unlock, formerly GeoCrossWalk). We’re now looking at adding OpenStreetmap data to derive the same kind of bounding box and optional detailed shape, for Europe rather than just mainland UK.

    In this moment we are especially interested in usage of your Gazetteer via the “nameAndFeatureSearch” for “populated places” database. I am considering to link EDINA Unlock API from our Georeferencer.org service, instead of GeoNames.org API, which was planned originally. We can’t use Google Maps GeoCoding API because of the TOS. I expected that if we use your service and save the coordinates from GeoNames in our database, it is legal, same as if we would use
    directly GeoNames.org.

    For geonames data, “This work is licensed under a Creative Commons Attribution 3.0 License“. We preserve the attribution in our search results. If you’re republishing the coordinates then you should ideally keep the source data and make the attribution too – that goes for all the different data source attributions we make.

    BTW Georeferencer.org is going to be used also on the National Library of Scotland maps later this year…

    I’m really looking forward to seeing this, and I’m hoping to see more use made of the NLS Maps API in projects here at EDINA.

    I have a couple of questions related to the API:

    • – Is utf-8 input supported? I was not able to find records for “Nürnberg” or “Paříž” while query like “Nurnberg” or “Pariz” gives correct results. Is utf-8 encoded query passed automatically (urlencoded) to your service or are there any special parameters necessary?

    I passed this question on to Joe and Lasma; they went into a huddle, and a couple of hours later, Lasma sent this:

    Indeed, it was only doing ascii search. Joe just deployed a fix.
    Now you can do utf8 search.

    So utf8 search should now be behaving correctly as you would have expected it to. Thanks for pointing this out and helping us to improve the search service.

    – Is a combined query with country or administrative area possible? Something like “London, USA” or “Leith, Edinburgh”?

    Currently, if you do this sort of query – a comma-separated list of names – you see all Londons, and all USAs – as in this query: http://unlock.edina.ac.uk/features/London,USA?format=json

    The various Londons that are, in fact, in the USA, will be marked with a country element ‘United States’.

    But, i think what you’re asking for isn’t this – you’d like the Unlock Places search to pick out the Londons-contained-by-USA and just return those. We could do this, but don’t expose this sort of query via the API. We could change the meaning of comma-separated lists of names to do this, but that might break other peoples’ worlds. So the best answer I can give you is, we’ll think about how best to implement it and look at the access logs to see if we can reasonably change the meaning of the current API function.

    – What are the Terms and Conditions of the online service? Is it completely free for anybody or are there set already some limits on the number of requests, usage from website which are behind password, commercial web services, derived data, etc?

    So there are two versions of the Unlock Places gazetteer search service. One is completely open, built on various open data sources, and can be used by anyone for any purpose. We don’t have throttling or quotas on the API.
    If persistent or demented-looking requests ever become a problem, we’ll think about throttling requests from particular hosts. I like the approach that OpenStreetmap’s Nominatim search service takes here – to say, “if you’re planning really heavy traffic, please talk to us first, we can schedule it at a quiet time or you can install your own instance of Nominatim”.

    In the past I’ve fired off a million requests without any pause, to search through the 1881 census microdata placenames for UKDA, and this happily didn’t affect the performance of the service.

    The second gazetteer search service is limited to UK academic institutions that subscribe to the Digimap Ordnance Survey Collection, and the ways in which the data can be re-used are limited to academic services.
    The Archaeology Data Service, for example, uses Unlock Places in some of its services in this way. They don’t require a login, but they do have terms of use of their service, and don’t expose the Unlocked data directly.

    – Do you plan to release (make available for download) your Gazetteer database? If not, would you be willing to submit (later on?) at least the database with GeoNames.org IDs and the bboxes back the Mark Wick of GeoNames.org, so the great work you did is preserved also in the official free GeoNames database. You have much more to offer then bbox, but at least that would be excellent for the community.
    I feel that release of the database is important for sustainability…

    Right, everything in the open data side of Unlock is built from publicly available sources which are open licensed. One thing we could try is putting together a data package – using Open Knowledge Foundation’s datapkg project, for example – that would automate the process of rebuilding a database that looks like Unlock’s, from these different sources.

    – Are you going to support the service in the future?

    Unlock (Places, and Text) is a service supported by JISC, which manages technology funding for research and innovation in the UK. It’s hosted at the EDINA National Datacentre at the University of Edinburgh, which is also mostly supported directly by JISC.

    So EDINA has a service level agreement with JISC to maintain Unlock with maximum 10 hours of downtime in a year – I think we’re close to that.

    Our current agreement with JISC to support and develop the Unlock service at EDINA runs until July 2011. Its ongoing existence after that depends whether we, and JISC, can convincingly make the case that Unlock is creating “impact and value” in academia and beyond (museums, libraries and archives nearest by).

    One of the best ways we can make the case is to get more feedback from people like you, Petr – what you like about the service, what you wish it did, what it’s offering to research that commercial or government services cannot reach. Some more thoughts about that are at the bottom of my last post discussing MySociety’s MapIt service.

    Thank you a lot for you online service!

    Thank you a lot for your long email, Petr, and I hope it helps encourage others to write.

    An appreciation of MySociety’s MapIt service

    July 27th, 2010

    Impressed by the new MySociety service for doing interesting things with Ordnance Survey OpenData – MapIt. The API is well thought out and quick and clean, the documentation fits onto one page, the backend is free software.

    I will confess to mild chagrin, because as well as having all these wonderful properties, MapIt does almost everything that Unlock Places does for Boundary-Line and Code-Point. Compare, contrast:

    A simple search for records about a place beginning with a name, returning the results in JSON:

    The detail of the shape describing that place, in GeoJSON (in both cases the ID to be looked up is taken from the JSON results of the previous request):

    MapIt does things that are still on our todo list – such as exposing ST_Touches geometry query over web-based API:

    Matthew Somerville, MapIt’s creator writes that “MaPit is really just an extension of the service we have always run internally for our own purposes” – MySociety services like Fix My Street, Write To Them and the renowned They Work For You.

    It’s great to see a service that looks so much like Unlock emerge from the internal needs of an organisation with a track record of geospatially aware, simple useful web tools.

    However I pause to think, what are we providing with Unlock Places search through OS Open Data that MapIt isn’t doing at least as well?

    Well, we have a few more data sources, so a more comprehensive gazetteer search; MapIt is directed towards building applications around government data and assumes the client will probably know the “right” names or codes. We could implement a neat “Give me the official names and shapes for this more vernacular name” wrapper, perhaps.

    We have geonames mirrored in Unlock too – only point data, but global coverage – and are working on adding OpenStreetmap (probably just for Europe) to the cross-search. But I wonder, quite hard, how much we would gain from improving and adding to the MapIt codebase instead of persevering with our own gazetteer API code.

    A future focus for Unlock Places (from the New Year on) is adding historic place-names to the gazetteer, so we can do historic place-name text mining with Unlock Text – incorporating the data coming out of the CHALICE project – as this is a common request for researchers, and not something that’s currently being done commercially.

    The Unlock Text service remains a bit more novel. This does text mining across documents (plain text, HTML or XML metadata), extracts likely placenames and uses the gazetteer search to pick the most likely locations. The text miner looks for other entities too – personal and organisational names, references to dates – but we only expose the placename part over our the web API.

    What else we’ve been up to lately

    July 22nd, 2010

    The Unlock blog has been quiet for a couple of months; since we added Ordnance Survey Open Data to the gazetteer search the team members have mostly been working on other things.

    Joe Vernon, our lead developer, has been working on the backend software for EDINA’s Addressing History project. This is a collaboration with the National Library of Scotland to create digitised and geocoded versions of historic post office directories. The sneak preview of the API is looking promising – though i agree with the commenter who suggests it should all be Linked Data!

    Lasma Sietinsone, our database engineer has been working on new data backends for Geology Roam, the new service within Digimap. She’s now finally free to start work on our OpenStreetmap mirror and adding search of OpenStreetmap features to Unlock’s open data gazetteer.

    I’ve been putting together a new project which has just started – CHALICE, short for Connecting Historical Authorities with Links, Contexts and Entities. This is a collaboration with several partners – Language Technology Group, who do the text mining magic behind the Unlock Text service; the Centre for Data Digitisation and Analysis in Belfast; and the Centre for e-Research at KCL. The CHALICE project arose from discussions at the wrap-up workshop on “Embedding GeoCrossWalk” (as Unlock was once known). It will involve text mining to create a historic gazetteer for parts of the UK in Linked Data form.

    I also worked with Yin Chen on a survey of EDINA services with an eye to where use of Linked Data could be interesting and valuable; then took a long holiday.

    So we are overdue for another burst of effort on the Unlock services, and there should be lots more to write about here on the blog over the coming weeks and months.

    Unlock service status and plans

    July 20th, 2010

    Note, this post was originally written in May 2010

    This is an attempt to set out the status of the Unlock services and describe the future roadmap, written in response to much missed EDINA colleague David Medyckyj-Scott who’s now shaking up the New Zealand scene at Landcare Research.

    Right now Unlock provides a gazetteer search service and a placename text-mining service. Unlock Places searches across different sources of data for references to placenames, returning “footprints” where possible and points (e.g. latitude,longitude coordinates) where not.

    Unlock Text uses the gazetteer to locate places extracted from text documents and XML metadata using natural-language processing tools provided by our colleagues at the Language Technology Group, School of Informatics, University of Edinburgh.

    Now we have good open data sources of footprints from the Ordnance Survey. This helps to justify implementing those parts of the Unlock Places API that deal with searching using shapes. For example searching for a name or a type of place within a detailed shape, or drawing a buffer around a shape and searching within that (e.g. “Within five miles of the boundary of the City of Edinburgh”).

    Before too long we plan to add the ability to search through OpenStreetmap data and run an equivalent to OSM’s Nominatim service so researchers can do large volume batch geocoding.

    We plan to create an open source release of the gazetteer and geoparser, beginning summer 2010 if possible.

    Longer term plans:

    • Add more historic placename and footprint data. Leading bid to extract placename records from authoritative source for England and Wales.
    • Separate the *geotagger* from the *georesolver* and re-implementing the latter using shapes rather than points where possible.
    • Add a temporal reference parsing service which works similarly to the geoparser but for temporal event references – “Unlock Time”.
    • Separate out the personal name parsing part of the geoparser into a distinct service

    Use cases:

    • Cross-search between different coordinate reference systems – for example, you know a postcode and want to search by latitude, longitude. The Archaeology Data Service uses Unlock Geocodes for this.
    • Use KML output from the gazetteer to project local area statistics onto Google Earth.
    • “GeoPrints” plugin to EPrints extracts locations from documents uploaded to institutional repository
    • Geotagging large reference collections of documents, such as the proceedings of the parliament of Northern Ireland

    Notes from Linking Geodata seminar at CeRch

    July 20th, 2010

    Note, this blog entry was originally published in May 2010.

    While on a bit of a road trip, had the chance to give a short seminar at the Centre for e-Research at Kings College London. This was informal, weren’t expecting much of a showing, so there are no slides, here is a quick summary.

    Introduced by Dr Stuart Dunn, and i talked about project ideas we had just been discussing – the attempt to mine the English Place Name Survey for its structure, now called CHALICE – mining archaelogical site records and artefact descriptions and attaching them to entities in OpenStreetmap using LinkedGeodata.org – mining key reference terms from documents in archives, attempting to link documents to reference data.

    Linked Geodata seems like a good place to start, pick out a sample entry and walk through the triples, at this point a bit of jumping about and graph-drawing on the whiteboard.

    There’s a list of mappings between items in Linked GeoData and in dbpedia.org, and likely thus through to geonames.org and other rich sources of Linked Data. Cf. Linked Geodata Datasets. Via sameas.org geographic links can be traversed to arrive at related media objects, resources, events.
    geonames.org has its 8m+ points and seems to be widely used in the academic geographic information retrieval community, due to its global coverage and open license.

    The text mining process used in the Edinburgh geoparser and elsewhere is two-phase, the first is the extraction purely looking at the text, of entities which seem likely to be placenames; the second phase is looking those names up in a gazetteer, and using relations between them to guess which of the suggested locations is the most likely.

    Point data, cartographic in origin. Polygon geoparsing.
    Machine learning approaches to both phases.

    We looked at UK-postcodes.com and the great work @pezholio has done on the RDF representations of postcodes there, with links across to some of the statistical area namespaces from data.gov.uk – along with the work that Ordnance Survey Research
    have in hand
    , there’s lots of new Linked Open Geodata in the UK.

    Historic names and shapes, temporal linking, these are areas where more practical, and open research has yet to be done.

    Unlock Places API — version 2.2

    April 21st, 2010

    The Unlock Places API was recently upgraded to include Ordnance Survey’s Open data. This feature rich data from Code-Point Open, Boundary-Line and the 1:50,000 gazetteer includes placenames and locations (points, boxes and shapes) and is now open for all to use! You can just get started with the API.

    We’ve also added new functionality to the service, including an HTML view for features, more feature attributes, the ability to request request results in different coordinate systems as well as the usual speed improvements and bug-fixes.

    The new data and features are available from Tuesday, 20th April 2010. Please visit the example queries page to try out some of the queries.

    We welcome any feedback on the new features – and if there’s anything you’d like to see in future versions of Unlock, please let us know. Alternatively, why not just get in touch to let us know how you’re using the service, we’d love to hear from you!

    Full details of the changes are listed below the fold.

    Read the rest of this entry »

    Linking Placename Authorities

    April 9th, 2010

    Putting together a proposal for JISC call 02/10 based on a suggestion from Paul Ell at CDDA in Belfast. Why post it here? I think there’s value in working on these things in a more public way, and I’d like to know who else would find the work useful.


    Generating a gazetteer of historic UK placenames, linked to documents and authority files in Linked Data form. Both working with existing placename authority files, and generating new authority files by extracting geographic names from text documents. Using the Edinburgh Geoparser to “georesolve” placenames and link them to widely-used geographic entities on the Linked Data web.


    GeoDigRef was a JISC project to extract references to people and places from several very large digitised collections, to make them easier to search. The Edinburgh Geoparser was adapted to extract place references from large collections.

    One roadblock in this and other projects has been the lack of open historic placename gazetteer for the UK.

    Placenames in authority files, and placenames text-mined from documents, can be turned into geographic links that connect items in collections with each other and with the Linked Data web; a historic gazetteer for the UK can be built as a byproduct.


    Firstly, working with placename authority files from existing collections, starting with the existing digitised volumes from the English Place Name Survey as a basis.

    Where place names are found, they can be linked to the corresponding Linked Data entity in geonames.org, the motherlode of place name links on the Linked Data web, using the georesolver component of the Edinburgh Geoparser.

    Secondly, using the geoparser to extract placename references from documents and using those placenames to seed an authority file, which can then be resolved in the same way.

    An open source web-based tool will help users link places to one another, remove false positives found by the geoparser, and publish the results as RDF using an open data license.

    Historic names will be imported back into the Unlock place search service.


    This will leave behind a toolset for others to use, as well as creating new reference data.

    Building on work done at the Open Knowledge Foundation to convert MARC/MADS bibliographic resources to RDF and add geographic links.

    Making re-use of existing digitised resources from CDDA to help make them discoverable, provide a path in to researchers.

    Geonames.org has some historic coverage, but it is hit and miss (E.g. “London” has “Londinium” as an alternate name, but at the contemporary location). The new OS OpenData sources are all contemporary.

    Once a placename is found in a text, it may not be found in a gazetteer. The more places correctly located, the higher the likelihood that other places mentioned in a document will also be correctly located. More historic coverage means better georeferencing for more archival collections.

    Work in progress with OS Open Data

    April 2nd, 2010

    The April 1st release of many Ordnance Survey datasets as open data is great news for us at Unlock. As hoped for, Boundary-Line (administrative boundaries), the 50K gazetteer of placenames and a modified version of Code-Point (postal locations) are now open data.

    Boundary Line of Edinburgh shown on Google earth. Contains Ordnance Survey data © Crown copyright and database right 2010

    We’ll be putting these datasets into the open access part of Unlock Places, our place search service, and opening up Unlock Geocodes based on Code-Point Open. However, this is going to take a week or two, because we’re also adding some new features to Unlock’s search and results.

    Currently, registered academic users are able to:

    • Grab shapes and bounding boxes in KML or GeoJSON – no need for GIS software, re-use in web applications
    • Search by bounding box and feature type as well as place name
    • See properties of shapes (area, perimeter, central point) useful for statistics visualisation

    And in soon we’ll be publishing these new features currently in testing:

    • Relationships between places – cities, counties and regions containing found places – in the default results
    • Re-project points and shapes into different coordinate reference systems

    These have been added so we can finally plug the Unlock Places search into EDINA’s Digimap service.

    Having Boundary-Line shapes in our open data gazetteer will mean we can return bounding boxes or polygons through Unlock Text, which extracts placenames from documents and metadata. This will help to open up new research directions for our work with the Language Technology Group at Informatics in Edinburgh.

    There are some organisations we’d love to collaborate with (almost next door, the Map Library at the National Library of Scotland and the Royal Commission on Ancient and Historical Monuments of Scotland) but have been unable to, because Unlock and its predecessor GeoCrossWalk were limited by license to academic use only. I look forward to seeing all the things the OS Open Data release has now made possible.

    I’m also excited to see what re-use we and others could make of the Linked Data published by Ordnance Survey Research, and what their approach will be to connecting shapes to their administrative model.

    MasterMap, the highest-detail OS dataset, wasn’t included in the open release. Academic subscribers to the Digimap Ordnance Survey Collection get access to places extracted from MasterMap, and improvements to other datasets created using MasterMap, with an Unlock Places API key.