RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    Exploring the Locator OS OpenData set

    January 21st, 2011

    Fiona Hemsley-Flint had a good look at the OS Locator dataset which is available from the Ordnance Survey Open Data portal. I thought a summary of her findings might be of use to others thinking about how to use this dataset.

    Overview

    OS Locator contains a list of all the road names in UK, “derived from a number of Ordnance Survey datasets [Meridian2, Road database, Locality dataset, Boundary-Line]. These include the roads database which contains information on road names and road numbers and is the latest generation of Ordnance Survey’s sophisticated and highly detailed geographic data”. OS recommend viewing it on top of mid-scale datasets such as 1:10k & 1:25k Raster and streetview (which is freely available via OS opendata).

    Geometries

    Each feature is geo-referenced by a centre point and a bounding box (although some of the bboxes are actually line features where the road segment of the feature is horizontal or vertical).
    OS Locator names shown on OS map
    Figure 1. Multiple occurrences of Ferry Road, differentiated by their locality.

    Attribution

    The roads have a name and/or a classification, where the classification represents a road number, (e.g. ‘A1’ or ‘B1243’). They also have an associated settlement (town), locality, county/region and local authority; the latter two are derived from Boundary-Line, it is unclear what is used to form the ‘Locality dataset’. Locality and settlement are likely to be the most useful of these attributes when displaying result sets. For roads which cross locality boundaries, a point is assigned for each separate locality, therefore one road may have more than one point associated with it, distinguished by its locality.

    Storage

    851505 rows of data were added to a development server.
    Multiple geometry columns have been added to take into account the different geometries available.
    A ‘tsvector’ column has also been added to implement Postgres text search functionality. An example query might be:
    select name, classification, locality, settlement from os.locator_nov_10 where search @@ to_tsquery(‘high & street & edinburgh’);

    Which returns the following result set:

    Name	Classification	Locality	settlement
    CORSTORPHINE HIGH STREET		Se Corstorphine	EDINBURGH
    HIGH STREET		Musselburgh Central	EDINBURGH
    HIGH STREET		Musselburgh North	EDINBURGH
    HIGH STREET		Holyrood	EDINBURGH
    HIGH STREET	A199	Musselburgh North	EDINBURGH
    HIGH STREET	A199	Musselburgh Central	EDINBURGH
    NORTH HIGH STREET		Musselburgh North	EDINBURGH
    NORTH HIGH STREET	A199	Musselburgh West	EDINBURGH
    PORTOBELLO HIGH STREET	B6415	Milton	EDINBURGH
    PORTOBELLO HIGH STREET	B6415	Portobello	EDINBURGH
    NORTH HIGH STREET	A199	Musselburgh North	EDINBURGH

    Overall, the dataset contains a comprehensive list of the roads names within the UK. Decisions will need to be made about how to treat multiple features that actually refer to the same real world road.

    The main limitation of this dataset is that it can only be used to show the user the general location of a road – it can’t be used as a precise address gazetteer since it only provides street names with no knowledge of building numbers.


    Search and retrieve bounding boxes and shapes

    August 20th, 2010

    So we have a cool project running called Chalice, text-mining and locating historic placenames to build a historic gazetteer stretching back beyond Domesday, for a few areas of England and Wales. Claire Grover from LTG had some questions about using a shape based rather than point based gazetteer during “geographic information retrieval”, I thought it worth posting the answers here, as Unlock Places is able to do a lot more in public since the addition of Ordnance Survey Open Data.

    http://unlock.edina.ac.uk/features/Edinburgh – now by default returns info from OS Open Data sources including Boundary-Line as well as Meridian2 which have bounding boxes and detailed shapes for things like counties, parishes, though note they are all contemporary.

    (The above is just an alias for
    http://unlock.edina.ac.uk/ws/nameSearch?name=Edinburgh )

    So that’s a way to get bounding boxes and shapes for places that are in geonames, by comparing with other sources. The default search results have bounding boxes attached, one must look up a link to see the detailed geometry.

    Here’s how then to filter the query for place-names to a specific bounding box:
    http://unlock.edina.ac.uk/ws/spatialNameSearch?format=json&name=Stanley&minx=-8&maxx=4&miny=53&maxy=64&operator=within

    We have ‘search for names inside the shape which has this ID’ on our todo list but don’t yet have a pressing use case – for many things bounding boxes are enough, one even wants that bit of extra inclusion (e.g. Shropshire’s bounding box will contain a lot more than Shropshire, but as Shropshire’s boundary has changed over time, some approximation about the shape is actually helpful for historic geocoding).

    Note that all place-names for UK will have county containment information – we added this for Digimap – one day they may start using it!

    You may also be interested to play around with http://mapit.mysociety.org/ – it has all the same OS Open Data sources and mostly the same set of queries but in places does a little more – it doesnt have geonames integrated, though.

    Lasma did some work on conflating different mentions of places based on point-polygon relationships (e.g. if a shape and a point have the same name, and the shape contains the point, the name is “the same thing”). However this was an experiment that is not really finished. For example –
    http://unlock.edina.ac.uk/ws/uniqueNameSearch?name=Edinburgh – i see this returns a shape in preference to a point – and wonder if it always will, if a shape is available. However this is not much use when you actively want a set of duplicate names, as you do while geoparsing. It would be good to revisit this, again, with concrete use cases. And of course it would be good to do this for much wider than the UK, with shapes extracted from OpenStreetmap. Investigating…


    Your questions answered, @klokancz from Oldmapsonline.org

    July 30th, 2010

    Klokan Petr Pridal, the creator of the wonderful Old Maps Online and MapTiler, has been using Unlock Places in some collaborative project work with the National Library of Scotland. He had some technical questions for us, and some questions about the intended usage future of the service, so I thought it worth-while republishing the answers here on the Unlock blog.

    First, I like a lot the API… It is well documented, with examples. Easy to use. [Thanks!]

    It is a bit confusing that you use “name” parameter instead of “q” (according the OpenSearch.org), but otherwise it is very nice. I was testing it with the Google Closure UI.AutoComplete, which is using JSONP and the callback function – it is similar to the jQuery module.

    Right, our use of the “name” parameter for query is a legacy thing – it comes from Unlock’s predecessor, GeoCrossWalk. There’s been a lot of development in OpenSearch Geo since then, it would be worth our while to support it. However, I see OpenSearch as mainly for collections of geo-referenced things (datasets or documents) – not for the georeferences themselves – though of course it could be used to do both.

    There’s also the quicklinks API which was a thought experiment. It looks a lot more like the new MapIt API, which we’re also thinking about implementing in front of Unlock.


    It is great that you have bbox for the results and external link for the detailed footprint. The API gives anybody access to your combined geonames database with other source of data like OSM or OS. Geometry or at least bounding box is something I horribly miss at GeoNames API – and you have solved this problem!

    Ordnance Survey have solved this problem for us with Open Data by releasing sources of shapes that can be used outwith academic publications! (We’ve always had this in the academic-use-only version of Unlock, formerly GeoCrossWalk). We’re now looking at adding OpenStreetmap data to derive the same kind of bounding box and optional detailed shape, for Europe rather than just mainland UK.

    In this moment we are especially interested in usage of your Gazetteer via the “nameAndFeatureSearch” for “populated places” database. I am considering to link EDINA Unlock API from our Georeferencer.org service, instead of GeoNames.org API, which was planned originally. We can’t use Google Maps GeoCoding API because of the TOS. I expected that if we use your service and save the coordinates from GeoNames in our database, it is legal, same as if we would use
    directly GeoNames.org.

    For geonames data, “This work is licensed under a Creative Commons Attribution 3.0 License“. We preserve the attribution in our search results. If you’re republishing the coordinates then you should ideally keep the source data and make the attribution too – that goes for all the different data source attributions we make.


    BTW Georeferencer.org is going to be used also on the National Library of Scotland maps later this year…

    I’m really looking forward to seeing this, and I’m hoping to see more use made of the NLS Maps API in projects here at EDINA.

    I have a couple of questions related to the API:

    • – Is utf-8 input supported? I was not able to find records for “Nürnberg” or “Paříž” while query like “Nurnberg” or “Pariz” gives correct results. Is utf-8 encoded query passed automatically (urlencoded) to your service or are there any special parameters necessary?

    I passed this question on to Joe and Lasma; they went into a huddle, and a couple of hours later, Lasma sent this:

    Indeed, it was only doing ascii search. Joe just deployed a fix.
    Now you can do utf8 search.

    So utf8 search should now be behaving correctly as you would have expected it to. Thanks for pointing this out and helping us to improve the search service.

    – Is a combined query with country or administrative area possible? Something like “London, USA” or “Leith, Edinburgh”?

    Currently, if you do this sort of query – a comma-separated list of names – you see all Londons, and all USAs – as in this query: http://unlock.edina.ac.uk/features/London,USA?format=json

    The various Londons that are, in fact, in the USA, will be marked with a country element ‘United States’.

    But, i think what you’re asking for isn’t this – you’d like the Unlock Places search to pick out the Londons-contained-by-USA and just return those. We could do this, but don’t expose this sort of query via the API. We could change the meaning of comma-separated lists of names to do this, but that might break other peoples’ worlds. So the best answer I can give you is, we’ll think about how best to implement it and look at the access logs to see if we can reasonably change the meaning of the current API function.

    – What are the Terms and Conditions of the online service? Is it completely free for anybody or are there set already some limits on the number of requests, usage from website which are behind password, commercial web services, derived data, etc?

    So there are two versions of the Unlock Places gazetteer search service. One is completely open, built on various open data sources, and can be used by anyone for any purpose. We don’t have throttling or quotas on the API.
    If persistent or demented-looking requests ever become a problem, we’ll think about throttling requests from particular hosts. I like the approach that OpenStreetmap’s Nominatim search service takes here – to say, “if you’re planning really heavy traffic, please talk to us first, we can schedule it at a quiet time or you can install your own instance of Nominatim”.

    In the past I’ve fired off a million requests without any pause, to search through the 1881 census microdata placenames for UKDA, and this happily didn’t affect the performance of the service.

    The second gazetteer search service is limited to UK academic institutions that subscribe to the Digimap Ordnance Survey Collection, and the ways in which the data can be re-used are limited to academic services.
    The Archaeology Data Service, for example, uses Unlock Places in some of its services in this way. They don’t require a login, but they do have terms of use of their service, and don’t expose the Unlocked data directly.

    – Do you plan to release (make available for download) your Gazetteer database? If not, would you be willing to submit (later on?) at least the database with GeoNames.org IDs and the bboxes back the Mark Wick of GeoNames.org, so the great work you did is preserved also in the official free GeoNames database. You have much more to offer then bbox, but at least that would be excellent for the community.
    I feel that release of the database is important for sustainability…

    Right, everything in the open data side of Unlock is built from publicly available sources which are open licensed. One thing we could try is putting together a data package – using Open Knowledge Foundation’s datapkg project, for example – that would automate the process of rebuilding a database that looks like Unlock’s, from these different sources.

    – Are you going to support the service in the future?

    Unlock (Places, and Text) is a service supported by JISC, which manages technology funding for research and innovation in the UK. It’s hosted at the EDINA National Datacentre at the University of Edinburgh, which is also mostly supported directly by JISC.

    So EDINA has a service level agreement with JISC to maintain Unlock with maximum 10 hours of downtime in a year – I think we’re close to that.

    Our current agreement with JISC to support and develop the Unlock service at EDINA runs until July 2011. Its ongoing existence after that depends whether we, and JISC, can convincingly make the case that Unlock is creating “impact and value” in academia and beyond (museums, libraries and archives nearest by).

    One of the best ways we can make the case is to get more feedback from people like you, Petr – what you like about the service, what you wish it did, what it’s offering to research that commercial or government services cannot reach. Some more thoughts about that are at the bottom of my last post discussing MySociety’s MapIt service.


    Thank you a lot for you online service!

    Thank you a lot for your long email, Petr, and I hope it helps encourage others to write.


    Work in progress with OS Open Data

    April 2nd, 2010

    The April 1st release of many Ordnance Survey datasets as open data is great news for us at Unlock. As hoped for, Boundary-Line (administrative boundaries), the 50K gazetteer of placenames and a modified version of Code-Point (postal locations) are now open data.

    Boundary Line of Edinburgh shown on Google earth. Contains Ordnance Survey data © Crown copyright and database right 2010

    We’ll be putting these datasets into the open access part of Unlock Places, our place search service, and opening up Unlock Geocodes based on Code-Point Open. However, this is going to take a week or two, because we’re also adding some new features to Unlock’s search and results.

    Currently, registered academic users are able to:

    • Grab shapes and bounding boxes in KML or GeoJSON – no need for GIS software, re-use in web applications
    • Search by bounding box and feature type as well as place name
    • See properties of shapes (area, perimeter, central point) useful for statistics visualisation

    And in soon we’ll be publishing these new features currently in testing:

    • Relationships between places – cities, counties and regions containing found places – in the default results
    • Re-project points and shapes into different coordinate reference systems

    These have been added so we can finally plug the Unlock Places search into EDINA’s Digimap service.

    Having Boundary-Line shapes in our open data gazetteer will mean we can return bounding boxes or polygons through Unlock Text, which extracts placenames from documents and metadata. This will help to open up new research directions for our work with the Language Technology Group at Informatics in Edinburgh.

    There are some organisations we’d love to collaborate with (almost next door, the Map Library at the National Library of Scotland and the Royal Commission on Ancient and Historical Monuments of Scotland) but have been unable to, because Unlock and its predecessor GeoCrossWalk were limited by license to academic use only. I look forward to seeing all the things the OS Open Data release has now made possible.

    I’m also excited to see what re-use we and others could make of the Linked Data published by Ordnance Survey Research, and what their approach will be to connecting shapes to their administrative model.

    MasterMap, the highest-detail OS dataset, wasn’t included in the open release. Academic subscribers to the Digimap Ordnance Survey Collection get access to places extracted from MasterMap, and improvements to other datasets created using MasterMap, with an Unlock Places API key.


    Places you won't find in any dictionary

    January 12th, 2010

    Tobar an Dualchais is an amazing archive of Gaelic and Scots speech and song samples. Under the hood, each of their records is annotated with places – the names of the village, or island, or parish, where the speaker came from.

    We’ve been trying to Unlock their placename data, so the names can be given map coordinates, and the recordings searched by location. Also, I wanted to see how much difference it would make if the Ordnance Survey 50K gazetteer were open licensed, thus enabling us to use it for this (non-research) project.

    Out of 1628 placenames, we found 851 exact matches in the 50K gazetteer and 1031 in the geonames.org gazetteer. Just 90 placenames were in the 50K but not in geonames. There’s a group of 296 placenames that we couldn’t find in any of our gazetteer data sources. Note that this an unusual sample, focused on remote and infrequently surveyed places in the Highland and Islands, but I had hoped for more from the 50K coverage.

    There are quite a few fun reasons why there are so many placenames that you won’t find in any dictionary:

    • Places that are historic don’t appear in our contemporary OS sources. Many administrative areas in Scotland changed in 1974, and current OS data does not have the old names or boundaries. Geonames has some locations for historic places (e.g. approximate centroids for the old counties) though without time ranges.
    • Typographical errors in data entry. E.g. “Stornooway” and “Stornaway” – using the gazetteer web service at the content creation stage would help with this.
    • Listings for places that are too small to be in a mid-scale gazetteer. For example, TAD data includes placenames for buildings belonging to clubs and societies where Gaelic sound recordings were made. Likely enough, some small settlements have escaped the notice of surveyors for OS and contributors to geonames.
    • Some places exist socially but not administratively. For example, our MasterMap gazetteer has records for a “Clanyard Bay”, “Clanyard House”, “Clanyard Mill” but not Clanyard itself. The Gazetteer for Scotland describes Clanyard as “a locality, made up of settlements” – High, Low and Middle Clanyards.
    • Geonames has local variant spellings as alternative names, and these show up in our gazetteer search, returning the more “authoritative” name.
    • Limitations in automated search for descriptions of names. For example, some placenames look like Terregles (DFS) see also Kirkcudbrightshire. I’m hoping the new work on fulltext search will help to address this – but there will always need to be a human confirmation stage, and fixes to the original records.

    It’s been invaluable to have a big set of known-to-be-placenames contributed in free-text fields by people who aren’t geographers. I would like to do more of this.

    I saw a beautiful transcript of an Ordnance Survey Object Name Book on a visit to RCAHMS. Apparently many for the English and Welsh ones were destroyed in the war, but the Scottish ones survived. But that is a story for another time.