Random Post: get_results("SELECT ID,post_title,guid FROM $wpdb->posts WHERE post_status= \"publish\" ORDER BY RAND() LIMIT 1"); $p=$post[0]; echo ('' . $p->post_title . ''); ?>
RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    Testing Unlock 3: new API, new features, soon even documentation

    May 18th, 2011

    This week we are public testing version 3 of Unlock – a fairly deep rewrite including a new simpler API and some more geometrical query functions (searching inside shapes, searching using a buffer). New data – providing a search across Natural Earth Data, returning shapes for countries, regions, etc worldwide. So at last we can use Natural Earth for search, and link it up to geonames point data for countries. We also have an upgraded version of the Edinburgh Geoparser so have date and event information as well as place-name text mining, in Unlock Text.

    The new search work is now on our replicated server at Appleton Tower and in a week or two we’ll switch the main unlock.edina.ac.uk over to the new version (keeping the old API supported indefinitely too). Here are notes/links from Joe Vernon. If you do any testing or experimentation with this we’d be very interested to hear how you got on. Note you can add ‘format=json‘ to any of these links to get javascript-useful results, ‘format=txt‘ to get a csv, etc.

    ‘GENERIC’ SEARCHING

    http://geoxwalk-at.edina.ac.uk/ws/search?name=sheffield

    http://geoxwalk-at.edina.ac.uk/ws/search?name=wales&featureType=european

    http://geoxwalk-at.edina.ac.uk/ws/search?featureType=hotel&name=Marriott&minx=-79&maxx=-78&miny=36&maxy=37&operator=within

    NATURAL EARTH GAZETTEER

    http://geoxwalk-at.edina.ac.uk/ws/search?name=lake&gazetteer=naturalearth&country=canada

    DISTANCE BETWEEN TWO FEATURES

    Distance between Edinburgh and Glasgow (by feature ID):

    http://geoxwalk-at.edina.ac.uk/ws/distanceBetween?idA=14131223&idB=11153386

    SEARCHING WITHIN A FEATURE – ‘SPATIAL MASK’

    United Kingdom’s feature ID is: 14127855

    Searching for ‘Washington’s within the United Kingdom…

    http://geoxwalk-at.edina.ac.uk/ws/search?name=Washington&spatialMask=14127855

    Also, note the difference between searching for within the bounding box of the UK, or adding the ‘realSpatial‘ parameter, which uses the polygon of the feature concerned.

    http://geoxwalk-at.edina.ac.uk/ws/search?name=Washington&spatialMask=14127855format=txt&maxRows=100&realSpatial=no

    http://geoxwalk-at.edina.ac.uk/ws/search?name=Washington&spatialMask=14127855&format=txt&maxRows=100&realSpatial=yes

    In this case, it picks up entries in Ireland if using the bounding box rather than the UK’s footprint.

    SPATIAL SEARCHING WITH A BUFFER

    8 hotels around the Royal Mile
    http://geoxwalk-at.edina.ac.uk/ws/search?featureType=hotel&minx=-3.2&maxx=-3.19&miny=55.94&maxy=55.95&operator=within

    75 within 2km
    http://geoxwalk-at.edina.ac.uk/ws/search?featureType=hotel&minx=-3.2&maxx=-3.19&miny=55.94&maxy=55.95&operator=within&buffer=2000

    FOOTPRINTS & POSTCODES

    …should still be there:
    http://geoxwalk-at.edina.ac.uk/ws/footprintLookup?identifier=14131223
    http://geoxwalk-at.edina.ac.uk/ws/postCodeSearch?postCode=eh91pr

    IMPLICIT COUNTRY SEARCHING

    http://geoxwalk-at.edina.ac.uk/ws/search?format=txt&gazetteer=geonames&featureType=populated place&name=louth
    vs
    http://geoxwalk-at.edina.ac.uk/ws/search?format=txt&gazetteer=geonames&featureType=populated place&name=louth, uk

    TIME BOUNDED SEARCH (still in development)

    http://geoxwalk-at.edina.ac.uk/ws/search?name=edinburgh&startYear=2000&endYear=2009

    http://geoxwalk-at.edina.ac.uk/ws/search?name=edinburgh&startYear=2000&endYear=2010

    Very happy with all this, bringing the Unlock service up to offering something usefully distinctive again, trying to restrain myself from saying (“if X was so easy why don’t we do Y?”)


    Exploring the Locator OS OpenData set

    January 21st, 2011

    Fiona Hemsley-Flint had a good look at the OS Locator dataset which is available from the Ordnance Survey Open Data portal. I thought a summary of her findings might be of use to others thinking about how to use this dataset.

    Overview

    OS Locator contains a list of all the road names in UK, “derived from a number of Ordnance Survey datasets [Meridian2, Road database, Locality dataset, Boundary-Line]. These include the roads database which contains information on road names and road numbers and is the latest generation of Ordnance Survey’s sophisticated and highly detailed geographic data”. OS recommend viewing it on top of mid-scale datasets such as 1:10k & 1:25k Raster and streetview (which is freely available via OS opendata).

    Geometries

    Each feature is geo-referenced by a centre point and a bounding box (although some of the bboxes are actually line features where the road segment of the feature is horizontal or vertical).
    OS Locator names shown on OS map
    Figure 1. Multiple occurrences of Ferry Road, differentiated by their locality.

    Attribution

    The roads have a name and/or a classification, where the classification represents a road number, (e.g. ‘A1’ or ‘B1243’). They also have an associated settlement (town), locality, county/region and local authority; the latter two are derived from Boundary-Line, it is unclear what is used to form the ‘Locality dataset’. Locality and settlement are likely to be the most useful of these attributes when displaying result sets. For roads which cross locality boundaries, a point is assigned for each separate locality, therefore one road may have more than one point associated with it, distinguished by its locality.

    Storage

    851505 rows of data were added to a development server.
    Multiple geometry columns have been added to take into account the different geometries available.
    A ‘tsvector’ column has also been added to implement Postgres text search functionality. An example query might be:
    select name, classification, locality, settlement from os.locator_nov_10 where search @@ to_tsquery(‘high & street & edinburgh’);

    Which returns the following result set:

    Name	Classification	Locality	settlement
    CORSTORPHINE HIGH STREET		Se Corstorphine	EDINBURGH
    HIGH STREET		Musselburgh Central	EDINBURGH
    HIGH STREET		Musselburgh North	EDINBURGH
    HIGH STREET		Holyrood	EDINBURGH
    HIGH STREET	A199	Musselburgh North	EDINBURGH
    HIGH STREET	A199	Musselburgh Central	EDINBURGH
    NORTH HIGH STREET		Musselburgh North	EDINBURGH
    NORTH HIGH STREET	A199	Musselburgh West	EDINBURGH
    PORTOBELLO HIGH STREET	B6415	Milton	EDINBURGH
    PORTOBELLO HIGH STREET	B6415	Portobello	EDINBURGH
    NORTH HIGH STREET	A199	Musselburgh North	EDINBURGH

    Overall, the dataset contains a comprehensive list of the roads names within the UK. Decisions will need to be made about how to treat multiple features that actually refer to the same real world road.

    The main limitation of this dataset is that it can only be used to show the user the general location of a road – it can’t be used as a precise address gazetteer since it only provides street names with no knowledge of building numbers.


    Using source identifiers to link data

    November 29th, 2010

    In the Chalice project we’ve used Unlock Places to make links across the Linked Data web, using the source identifier which appears in the results of each place search. As this might be useful to others, it’s worth walking through an example.

    This search for “Bosley” shows us results in the UK from geonames and from the Ordnance Survey 50K gazetteer: http://unlock.edina.ac.uk/ws/nameSearch?name=Bosley&country=uk

    Here’s an extract of one of the results, the listing for Bosley in the Ordnance Survey 1:50K gazetteer:

    <identifier>11083412</identifier>
    <sourceIdentifier>28360</sourceIdentifier>
    <name>Bosley</name>
    <country>United Kingdom</country>
    <custodian>Ordnance Survey</custodian>
    <gazetteer>OS Open 1:50 000 Scale Gazetteer</gazetteer>

    The sourceIdentifier shown here is the identifier published by each of the original data sources that Unlock Places is using to cross-search.

    Ordnance Survey Research re-uses these identifiers to create its Linked Data namespace. For any place in the 50K gazetteer, we can reconstruct the link that refers to that place by appending the source identifier to this URL, which is the namespace for the 50K gazetteer: http://data.ordnancesurvey.co.uk/id/50kGazetteer/

    So our reference to Bosley can be made by adding the source identifier to the namespace:

    http://data.ordnancesurvey.co.uk/id/50kGazetteer/28360

    The same goes for source identifiers for places found in the geonames.org place-name gazetteer.

    <sourceIdentifier>2655141</sourceIdentifier>
    <name>Bosley</name>
    <gazetteer>GeoNames</gazetteer>

    Geonames uses http://sws.geonames.org/ as a namespace for its Linked Data links for places. So we can reconstruct the link for Bosley using the source identifier like this:

    http://sws.geonames.org/2655141/

    Note that the link needs the forward slash on the end to work correctly. If one looks at either of these links with a web browser, one is redirected to a human-readable page describing that place. To see the machine-readable, RDF version of the link’s contents, look at it with a command-line program such as curl, asking to “Accept” the RDF version:

    curl -L http://data.ordnancesurvey.co.uk/id/50kGazetteer/28360 -H "Accept: application/rdf+xml"

    I hope this is useful to others. We could add the links directly into the default search results, but many users may not be that interested in seeing RDF links in place-name search results. Thoughts on how we could offer this as a more useful function would be much appreciated.


    More on the use of Unlock Places by georeferencer.org

    November 19th, 2010

    Some months back, Klokan Petr Pridal, who maintains OldMapsOnline.org and works with libraries and cartographic institutes across Europe, wrote with some questions about the Unlock Places service. We met at FOSS4G where I presented our work on the Chalice project and the Unlock services.
    Petr writes about how Unlock is used in his applications, and what future requirements from the service may be:


    It was great to meet you at FOSS4G in Barcelona and discuss with you
    the progress related to Unlock and possible cooperation with
    OldMapsOnline.org and usage in Georeferencer.org services.

    As you have mentioned, the most important thing for us would be to
    have in Unlock API/database the bounding boxes (or bounding polygons) for places as direct part of the JSON response.
    We need that mostly for villages, towns and cities and for areas such
    as districts or countries – all over the world. We need something like
    “bounds” as provided by the Google geocoding API.

    The second most important feature is to have the chance to install the
    service in our servers
    – especially in case you can’t provide
    guarantees for it in a future.

    It would be also great to have chance to improve the service for non-English languages, but right now the gazetteers and text processing is not primary target of our research.

    In this moment the Unlock API is in use:

    As a standard gazetteer search service to zoom the base maps to a place people type in the search box in our Georeferencer.org service – a
    collaborative georeferencing online service for scanned historical
    maps. It is in use by National Library of Scotland and a couple of other libraries.

    Here’s an example map (you need to register first).

    The uniqueness of Unlock is in openness of the license (primarily GeoNames.org CC-BY and also OS OpenData) and also so far very good availability of the online service (EDINA hardware and network?). We are missing the bounding box to be able to zoom our base maps to the correct area (determine the appropriate zoom level). Unlock API replaced Google Geocoder, which we can’t use, because we are displaying also non-google maps (such as Ordnance Survey OpenData) and we are potentially deriving data from the gazetteer database (the control points on the old maps), which is against Google TOS.

    In the future we are keen to extend the gazetteer with alternative
    historical toponyms
    (which people can identify on georeferenced old
    maps too), or participate on such work.

    The other usage of Unlock API is:

    As a metadata text analyzer, in a service such as our
    http://geoparser.appspot.com/, where we automatically parse existing
    library textual metadata to identify place names and locate the
    described maps including automatic approximation of their spatial
    coverage (by identifying map scale and physical size in the text and
    doing a simple math on top of it). This service is in a prototype
    phase only, we are using Yahoo Placemaker and I was testing Unlock Text API
    with it too.

    Here the huge advantage of Unlock would be primarily the possibility
    to add custom gazetteers
    (with Geonames as the default one), language detection (for example via Google Language API or otherwise) and also possibility to add into the workflow other tools, such as lemmatizator for particular language – the simplest available via hun/a/ispellu
    database integration or via existing morphological rule-based software
    such as:

    The problem is that without returning the lemmatization of the text the geoparser is almost unusable in non-English languages – especially Slavic
    one.

    We are very glad for availability of your results and of the reliable
    online services you provide. We can concentrate on the problems we
    need to solve primarily (georeferencing, clipping, stitching and
    presentation of old maps for later analysis) and use your results of
    research as a component solving a problem we are touching and we have to practically solve somehow.”


    Very glad that Petr wrote at such length about comprehensive use of Unlock. pushing the edges of what we are doing with the service.

    We have some work in the pipeline adding bounding boxes for places worldwide by making Natural Earth Data searchable through Unlock Places. Natural Earth is a generalised dataset intended for use in cartography, but should also have quite a lot of re-use value for map search.


    Connecting archives with linked geodata – Part II

    October 22nd, 2010

    This is part two of a blog starting with a presentation about the Chalice project and our aim to create a 1000-year place-name gazetteer, available as linked data, text-mined from volumes of the English Place Name Survey.

    Something else i’ve been organising is a web service called Unlock; it offers a gazetteer search service that searches with, and returns, shapes rather than just points for place-names. It has its origins in a 2001 project called GeoCrossWalk, extracting shapes from MasterMap and other Ordnance Survey data sources and making them available under a research-only license in the UK, available to subscribers to EDINA’s Digimap service.

    Now that so much open geodata is out there, Unlock now contains an open data place search service, indexing and interconnecting the different sources of shapes that match up to names. It has geonames and the OS Open Data sources in it, adding search of Natural Earth data in short order, looking at ways to enhance what others (Nominatim, LinkedGeoData) are already doing with search and re-use of OpenStreetmap data.

    The gazetteer search service sits alongside a placename text mining service. However, the text mining service is tuned to contemporary text (American news sources), and a lot of that also has to do with data availability and sharing of models, sets of training data. The more interesting use cases are in archive mining, of semi-unusual, semi-structured sets of documents and records (parliamentary proceedings, or historical population reports, parish and council records). Anything that is recorded will yield data, *is* data, back to the earliest written records we have.


    Place-names can provide a kind of universal key to interpreting the written record. Social organisation may change completely, but the land remembers, and place-names remain the same. Through the prism of place-names one can glimpse pre-history; not just what remains of those people wealthy enough to create *stuff* that lasted, but of everybody who otherwise vanished without trace.

    The other reason I’m here at FOSS4G; to ask for help. We (the authors of the text mining tools at the Language Technology Group, colleagues at EDINA, smart funders at JISC) want to put together a proper open source distribution of the core components of our work, for others to customise, extend, and work with us on.

    We could use advice – the Software Sustainability Institute is one place we are turning for advice on managing an open source release and, hopefully, community. OSS Watch supported us in structuring an open source business case.

    Transition to a world that is open by default turns out to be more difficult than one would think. It’s hard to get many minds to look in the same direction at the same time. Maybe legacy problems, kludges either technical, or social, or even emotional, arise to mess things up when we try to act in the clear.

    We could use practical advice on managing an open source release of our work to make it as self-sustaining as possible. In the short term; how best to structure a repository for collaboration, for branching and merging; where we should most usefully focus efforts at documentation; how to automate the process of testing to free up effort where it can be more creative; how to find the benefits in moving the process of working, from a closed to an open world.

    The Chalice project has a sourceforge repository where we’ve been putting the code the EDINA team has been working on; this includes an evolution of Unlock’s web service API, and user interface / annotation code from Addressing History. We’re now working on the best way to synchronise work-in-progress with currently published, GPL-licensed components from LTG, more pieces of the pipeline making up the “Edinburgh geoparser” and other things…


    OpenStreetmap and Linked Geodata

    October 14th, 2010

    I’ve been travelling overmuch for the last six weeks, but met lots of lovely people. Most recently, during a trip this week to discuss the Open Knowledge Foundation‘s part in the LOD2 consortium project, had a long chat with Jens and Claus, the developers and academics behind Linked Geo Data, the Linked Data version of the OpenStreetmap data.

    linked geodata browser

    The most interesting bit for Unlock is the RESTful interface to search the data; by point, radius, and bounding box, by feature class and by contents of labels assembled from tags. So it looks like Opensearch Geo as much as Unlock’s place search api does.

    Claus made up a mapping between tags and clusters of tags in OpenStreetmap, to a simple linkedgeodata.org ontology. Here’s the mapping file – warning, it is quite large – OSM->linkedgeodata mapping rules. Pointed him at Jochen Topf’s new work on OSM tag analysis and clustering, Taginfo.

    As well as the REST interface, there is a basic GeoSPARQL endpoint using Virtuoso as a Linked Data store – we ran containment queries for polygons returning polygons with reasonable performance. There is a fracturing in the GeoSPARQL world both in proposed standards and in actual implementation.

    So we want to be able to return links to LinkedGeodata.org URLs in the results of our search. Right now Unlock’s place search returns original source identifiers (from geonames, etc) as well as our local identifiers, for place-names and shapes. In fact Unlock could help with the mapping across of Linkedgeodata.org URLs to geonames URLs, which are quite widely used, an entry point into the bigger Linked Data web.

    Another very interesting tool for making links between things on the Linked Data web is SILK, by Chris Bizer, Anja Jentsch and their research group at the Freie Universitat Berlin. The latest (or still testing?) release of SILK has some spatial inference capacity as well as structural inference. So we could try it out on, for example, the Chalice data just to see what kind of links can be made between URLs for linkedgeodata things and URLs for historic place-names.

    We’ve been setting up an instance of OpenStreetmap for Unlock and other purposes at EDINA recently. Our plan with this is to start working from Nominatim, which has a point-based gazetteer for place-names down to street address level, and attempt to extract and/or generalise shapes as well as points corresponding to the names. We’re doing this to provide more/richer data search, rather than republishing original datasets in some more/differently interpretable form. So there’s lots of common ground and I hope to find ways to work together in future to make sure we complement and don’t duplicate.


    Search and retrieve bounding boxes and shapes

    August 20th, 2010

    So we have a cool project running called Chalice, text-mining and locating historic placenames to build a historic gazetteer stretching back beyond Domesday, for a few areas of England and Wales. Claire Grover from LTG had some questions about using a shape based rather than point based gazetteer during “geographic information retrieval”, I thought it worth posting the answers here, as Unlock Places is able to do a lot more in public since the addition of Ordnance Survey Open Data.

    http://unlock.edina.ac.uk/features/Edinburgh – now by default returns info from OS Open Data sources including Boundary-Line as well as Meridian2 which have bounding boxes and detailed shapes for things like counties, parishes, though note they are all contemporary.

    (The above is just an alias for
    http://unlock.edina.ac.uk/ws/nameSearch?name=Edinburgh )

    So that’s a way to get bounding boxes and shapes for places that are in geonames, by comparing with other sources. The default search results have bounding boxes attached, one must look up a link to see the detailed geometry.

    Here’s how then to filter the query for place-names to a specific bounding box:
    http://unlock.edina.ac.uk/ws/spatialNameSearch?format=json&name=Stanley&minx=-8&maxx=4&miny=53&maxy=64&operator=within

    We have ‘search for names inside the shape which has this ID’ on our todo list but don’t yet have a pressing use case – for many things bounding boxes are enough, one even wants that bit of extra inclusion (e.g. Shropshire’s bounding box will contain a lot more than Shropshire, but as Shropshire’s boundary has changed over time, some approximation about the shape is actually helpful for historic geocoding).

    Note that all place-names for UK will have county containment information – we added this for Digimap – one day they may start using it!

    You may also be interested to play around with http://mapit.mysociety.org/ – it has all the same OS Open Data sources and mostly the same set of queries but in places does a little more – it doesnt have geonames integrated, though.

    Lasma did some work on conflating different mentions of places based on point-polygon relationships (e.g. if a shape and a point have the same name, and the shape contains the point, the name is “the same thing”). However this was an experiment that is not really finished. For example –
    http://unlock.edina.ac.uk/ws/uniqueNameSearch?name=Edinburgh – i see this returns a shape in preference to a point – and wonder if it always will, if a shape is available. However this is not much use when you actively want a set of duplicate names, as you do while geoparsing. It would be good to revisit this, again, with concrete use cases. And of course it would be good to do this for much wider than the UK, with shapes extracted from OpenStreetmap. Investigating…


    Your questions answered, @klokancz from Oldmapsonline.org

    July 30th, 2010

    Klokan Petr Pridal, the creator of the wonderful Old Maps Online and MapTiler, has been using Unlock Places in some collaborative project work with the National Library of Scotland. He had some technical questions for us, and some questions about the intended usage future of the service, so I thought it worth-while republishing the answers here on the Unlock blog.

    First, I like a lot the API… It is well documented, with examples. Easy to use. [Thanks!]

    It is a bit confusing that you use “name” parameter instead of “q” (according the OpenSearch.org), but otherwise it is very nice. I was testing it with the Google Closure UI.AutoComplete, which is using JSONP and the callback function – it is similar to the jQuery module.

    Right, our use of the “name” parameter for query is a legacy thing – it comes from Unlock’s predecessor, GeoCrossWalk. There’s been a lot of development in OpenSearch Geo since then, it would be worth our while to support it. However, I see OpenSearch as mainly for collections of geo-referenced things (datasets or documents) – not for the georeferences themselves – though of course it could be used to do both.

    There’s also the quicklinks API which was a thought experiment. It looks a lot more like the new MapIt API, which we’re also thinking about implementing in front of Unlock.


    It is great that you have bbox for the results and external link for the detailed footprint. The API gives anybody access to your combined geonames database with other source of data like OSM or OS. Geometry or at least bounding box is something I horribly miss at GeoNames API – and you have solved this problem!

    Ordnance Survey have solved this problem for us with Open Data by releasing sources of shapes that can be used outwith academic publications! (We’ve always had this in the academic-use-only version of Unlock, formerly GeoCrossWalk). We’re now looking at adding OpenStreetmap data to derive the same kind of bounding box and optional detailed shape, for Europe rather than just mainland UK.

    In this moment we are especially interested in usage of your Gazetteer via the “nameAndFeatureSearch” for “populated places” database. I am considering to link EDINA Unlock API from our Georeferencer.org service, instead of GeoNames.org API, which was planned originally. We can’t use Google Maps GeoCoding API because of the TOS. I expected that if we use your service and save the coordinates from GeoNames in our database, it is legal, same as if we would use
    directly GeoNames.org.

    For geonames data, “This work is licensed under a Creative Commons Attribution 3.0 License“. We preserve the attribution in our search results. If you’re republishing the coordinates then you should ideally keep the source data and make the attribution too – that goes for all the different data source attributions we make.


    BTW Georeferencer.org is going to be used also on the National Library of Scotland maps later this year…

    I’m really looking forward to seeing this, and I’m hoping to see more use made of the NLS Maps API in projects here at EDINA.

    I have a couple of questions related to the API:

    • – Is utf-8 input supported? I was not able to find records for “Nürnberg” or “Paříž” while query like “Nurnberg” or “Pariz” gives correct results. Is utf-8 encoded query passed automatically (urlencoded) to your service or are there any special parameters necessary?

    I passed this question on to Joe and Lasma; they went into a huddle, and a couple of hours later, Lasma sent this:

    Indeed, it was only doing ascii search. Joe just deployed a fix.
    Now you can do utf8 search.

    So utf8 search should now be behaving correctly as you would have expected it to. Thanks for pointing this out and helping us to improve the search service.

    – Is a combined query with country or administrative area possible? Something like “London, USA” or “Leith, Edinburgh”?

    Currently, if you do this sort of query – a comma-separated list of names – you see all Londons, and all USAs – as in this query: http://unlock.edina.ac.uk/features/London,USA?format=json

    The various Londons that are, in fact, in the USA, will be marked with a country element ‘United States’.

    But, i think what you’re asking for isn’t this – you’d like the Unlock Places search to pick out the Londons-contained-by-USA and just return those. We could do this, but don’t expose this sort of query via the API. We could change the meaning of comma-separated lists of names to do this, but that might break other peoples’ worlds. So the best answer I can give you is, we’ll think about how best to implement it and look at the access logs to see if we can reasonably change the meaning of the current API function.

    – What are the Terms and Conditions of the online service? Is it completely free for anybody or are there set already some limits on the number of requests, usage from website which are behind password, commercial web services, derived data, etc?

    So there are two versions of the Unlock Places gazetteer search service. One is completely open, built on various open data sources, and can be used by anyone for any purpose. We don’t have throttling or quotas on the API.
    If persistent or demented-looking requests ever become a problem, we’ll think about throttling requests from particular hosts. I like the approach that OpenStreetmap’s Nominatim search service takes here – to say, “if you’re planning really heavy traffic, please talk to us first, we can schedule it at a quiet time or you can install your own instance of Nominatim”.

    In the past I’ve fired off a million requests without any pause, to search through the 1881 census microdata placenames for UKDA, and this happily didn’t affect the performance of the service.

    The second gazetteer search service is limited to UK academic institutions that subscribe to the Digimap Ordnance Survey Collection, and the ways in which the data can be re-used are limited to academic services.
    The Archaeology Data Service, for example, uses Unlock Places in some of its services in this way. They don’t require a login, but they do have terms of use of their service, and don’t expose the Unlocked data directly.

    – Do you plan to release (make available for download) your Gazetteer database? If not, would you be willing to submit (later on?) at least the database with GeoNames.org IDs and the bboxes back the Mark Wick of GeoNames.org, so the great work you did is preserved also in the official free GeoNames database. You have much more to offer then bbox, but at least that would be excellent for the community.
    I feel that release of the database is important for sustainability…

    Right, everything in the open data side of Unlock is built from publicly available sources which are open licensed. One thing we could try is putting together a data package – using Open Knowledge Foundation’s datapkg project, for example – that would automate the process of rebuilding a database that looks like Unlock’s, from these different sources.

    – Are you going to support the service in the future?

    Unlock (Places, and Text) is a service supported by JISC, which manages technology funding for research and innovation in the UK. It’s hosted at the EDINA National Datacentre at the University of Edinburgh, which is also mostly supported directly by JISC.

    So EDINA has a service level agreement with JISC to maintain Unlock with maximum 10 hours of downtime in a year – I think we’re close to that.

    Our current agreement with JISC to support and develop the Unlock service at EDINA runs until July 2011. Its ongoing existence after that depends whether we, and JISC, can convincingly make the case that Unlock is creating “impact and value” in academia and beyond (museums, libraries and archives nearest by).

    One of the best ways we can make the case is to get more feedback from people like you, Petr – what you like about the service, what you wish it did, what it’s offering to research that commercial or government services cannot reach. Some more thoughts about that are at the bottom of my last post discussing MySociety’s MapIt service.


    Thank you a lot for you online service!

    Thank you a lot for your long email, Petr, and I hope it helps encourage others to write.


    An appreciation of MySociety’s MapIt service

    July 27th, 2010

    Impressed by the new MySociety service for doing interesting things with Ordnance Survey OpenData – MapIt. The API is well thought out and quick and clean, the documentation fits onto one page, the backend is free software.

    I will confess to mild chagrin, because as well as having all these wonderful properties, MapIt does almost everything that Unlock Places does for Boundary-Line and Code-Point. Compare, contrast:

    A simple search for records about a place beginning with a name, returning the results in JSON:

    The detail of the shape describing that place, in GeoJSON (in both cases the ID to be looked up is taken from the JSON results of the previous request):

    MapIt does things that are still on our todo list – such as exposing ST_Touches geometry query over web-based API:

    Matthew Somerville, MapIt’s creator writes that “MaPit is really just an extension of the service we have always run internally for our own purposes” – MySociety services like Fix My Street, Write To Them and the renowned They Work For You.

    It’s great to see a service that looks so much like Unlock emerge from the internal needs of an organisation with a track record of geospatially aware, simple useful web tools.

    However I pause to think, what are we providing with Unlock Places search through OS Open Data that MapIt isn’t doing at least as well?

    Well, we have a few more data sources, so a more comprehensive gazetteer search; MapIt is directed towards building applications around government data and assumes the client will probably know the “right” names or codes. We could implement a neat “Give me the official names and shapes for this more vernacular name” wrapper, perhaps.

    We have geonames mirrored in Unlock too – only point data, but global coverage – and are working on adding OpenStreetmap (probably just for Europe) to the cross-search. But I wonder, quite hard, how much we would gain from improving and adding to the MapIt codebase instead of persevering with our own gazetteer API code.

    A future focus for Unlock Places (from the New Year on) is adding historic place-names to the gazetteer, so we can do historic place-name text mining with Unlock Text – incorporating the data coming out of the CHALICE project – as this is a common request for researchers, and not something that’s currently being done commercially.

    The Unlock Text service remains a bit more novel. This does text mining across documents (plain text, HTML or XML metadata), extracts likely placenames and uses the gazetteer search to pick the most likely locations. The text miner looks for other entities too – personal and organisational names, references to dates – but we only expose the placename part over our the web API.


    What else we’ve been up to lately

    July 22nd, 2010

    The Unlock blog has been quiet for a couple of months; since we added Ordnance Survey Open Data to the gazetteer search the team members have mostly been working on other things.

    Joe Vernon, our lead developer, has been working on the backend software for EDINA’s Addressing History project. This is a collaboration with the National Library of Scotland to create digitised and geocoded versions of historic post office directories. The sneak preview of the API is looking promising – though i agree with the commenter who suggests it should all be Linked Data!

    Lasma Sietinsone, our database engineer has been working on new data backends for Geology Roam, the new service within Digimap. She’s now finally free to start work on our OpenStreetmap mirror and adding search of OpenStreetmap features to Unlock’s open data gazetteer.

    I’ve been putting together a new project which has just started – CHALICE, short for Connecting Historical Authorities with Links, Contexts and Entities. This is a collaboration with several partners – Language Technology Group, who do the text mining magic behind the Unlock Text service; the Centre for Data Digitisation and Analysis in Belfast; and the Centre for e-Research at KCL. The CHALICE project arose from discussions at the wrap-up workshop on “Embedding GeoCrossWalk” (as Unlock was once known). It will involve text mining to create a historic gazetteer for parts of the UK in Linked Data form.

    I also worked with Yin Chen on a survey of EDINA services with an eye to where use of Linked Data could be interesting and valuable; then took a long holiday.

    So we are overdue for another burst of effort on the Unlock services, and there should be lots more to write about here on the blog over the coming weeks and months.