RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    Talk to us about JISC 06/11

    June 23rd, 2011

    Glad to hear that Unlock has been cited in the JISC 06/11 “eContent Capital” call for proposals.

    The Unlock team would be very happy to help anyone fit a beneficial use of Unlock into their project proposal. This could feature the Unlock Places place-name and feature search; and/or the Unlock Text geoparser service which extracts place-names from text and tries to find their locations.

    One could use Unlock Text to create Linked Data links to geonames.org or Ordnance Survey Open Data. Or use Unlock Places to find the locations of postcodes; or find places within a given county or constituency…

    Please drop an email jo.walsh@ed.ac.uk or look up metazool on Skype or Twitter to chat about how Unlock fits with your proposal for JISC 06/11 …


    Testing Unlock 3: new API, new features, soon even documentation

    May 18th, 2011

    This week we are public testing version 3 of Unlock – a fairly deep rewrite including a new simpler API and some more geometrical query functions (searching inside shapes, searching using a buffer). New data – providing a search across Natural Earth Data, returning shapes for countries, regions, etc worldwide. So at last we can use Natural Earth for search, and link it up to geonames point data for countries. We also have an upgraded version of the Edinburgh Geoparser so have date and event information as well as place-name text mining, in Unlock Text.

    The new search work is now on our replicated server at Appleton Tower and in a week or two we’ll switch the main unlock.edina.ac.uk over to the new version (keeping the old API supported indefinitely too). Here are notes/links from Joe Vernon. If you do any testing or experimentation with this we’d be very interested to hear how you got on. Note you can add ‘format=json‘ to any of these links to get javascript-useful results, ‘format=txt‘ to get a csv, etc.

    ‘GENERIC’ SEARCHING

    http://geoxwalk-at.edina.ac.uk/ws/search?name=sheffield

    http://geoxwalk-at.edina.ac.uk/ws/search?name=wales&featureType=european

    http://geoxwalk-at.edina.ac.uk/ws/search?featureType=hotel&name=Marriott&minx=-79&maxx=-78&miny=36&maxy=37&operator=within

    NATURAL EARTH GAZETTEER

    http://geoxwalk-at.edina.ac.uk/ws/search?name=lake&gazetteer=naturalearth&country=canada

    DISTANCE BETWEEN TWO FEATURES

    Distance between Edinburgh and Glasgow (by feature ID):

    http://geoxwalk-at.edina.ac.uk/ws/distanceBetween?idA=14131223&idB=11153386

    SEARCHING WITHIN A FEATURE – ‘SPATIAL MASK’

    United Kingdom’s feature ID is: 14127855

    Searching for ‘Washington’s within the United Kingdom…

    http://geoxwalk-at.edina.ac.uk/ws/search?name=Washington&spatialMask=14127855

    Also, note the difference between searching for within the bounding box of the UK, or adding the ‘realSpatial‘ parameter, which uses the polygon of the feature concerned.

    http://geoxwalk-at.edina.ac.uk/ws/search?name=Washington&spatialMask=14127855format=txt&maxRows=100&realSpatial=no

    http://geoxwalk-at.edina.ac.uk/ws/search?name=Washington&spatialMask=14127855&format=txt&maxRows=100&realSpatial=yes

    In this case, it picks up entries in Ireland if using the bounding box rather than the UK’s footprint.

    SPATIAL SEARCHING WITH A BUFFER

    8 hotels around the Royal Mile
    http://geoxwalk-at.edina.ac.uk/ws/search?featureType=hotel&minx=-3.2&maxx=-3.19&miny=55.94&maxy=55.95&operator=within

    75 within 2km
    http://geoxwalk-at.edina.ac.uk/ws/search?featureType=hotel&minx=-3.2&maxx=-3.19&miny=55.94&maxy=55.95&operator=within&buffer=2000

    FOOTPRINTS & POSTCODES

    …should still be there:
    http://geoxwalk-at.edina.ac.uk/ws/footprintLookup?identifier=14131223
    http://geoxwalk-at.edina.ac.uk/ws/postCodeSearch?postCode=eh91pr

    IMPLICIT COUNTRY SEARCHING

    http://geoxwalk-at.edina.ac.uk/ws/search?format=txt&gazetteer=geonames&featureType=populated place&name=louth
    vs
    http://geoxwalk-at.edina.ac.uk/ws/search?format=txt&gazetteer=geonames&featureType=populated place&name=louth, uk

    TIME BOUNDED SEARCH (still in development)

    http://geoxwalk-at.edina.ac.uk/ws/search?name=edinburgh&startYear=2000&endYear=2009

    http://geoxwalk-at.edina.ac.uk/ws/search?name=edinburgh&startYear=2000&endYear=2010

    Very happy with all this, bringing the Unlock service up to offering something usefully distinctive again, trying to restrain myself from saying (“if X was so easy why don’t we do Y?”)


    Unlock in use

    January 28th, 2011

    It would be great to hear from people about how they are using the Unlock place search services. So you’re encouraged to contact us and tell us how you’re making use of Unlock and what you want out of the service.
    screenshots from Molly, Georeferencer
    Here are some of the projects and services we’ve heard about that are making interesting use of Unlock in research applications.

    The Molly project based at University of Oxford provides an open source mobile location portal service designed for campuses. Molly uses some Cloudmade services and employs Unlock for postcode searching.

    Georeferencer.org uses Unlock Places to search old maps. The service is used by National Library of Scotland Map Library and other national libraries in Europe.
    More on the use of Unlock Places by georeferencer.org.

    CASOS at CMU has been experimenting the Unlock Text service to geolocate social network information.

    The Open Fieldwork project has been georeferencing educational resources: “In exploring how we could dynamically position links to fieldwork OER on a map, based on the location where the fieldwork takes place, one approach might be to resolve a position from the resource description or text in the resource. The OF project tried out the EDINA Unlock service – it looks like it could be very useful.”

    We had several interesting entries to 2010’s dev8d developer challenge using Unlock:

    Embedded GIS-lite Reporting Widget:
    Duncan Davidson, Informatics Ventures, University of Edinburgh
    “Adding data tables to content management systems and spreadsheet software packages is a fairly simple process, but statistics are easier to understand when the data is visual. Our widget takes geographic data – in this instance data on Scottish councils – passes it through EDINA’s API and then produces coordinates which are mapped onto Google. The end result is an annotated map which makes the data easier to access.”

    Geoprints, which also works with the Yahoo Placemaker API, by
    Marcus Ramsden at Southampton University.
    “Geoprints is a plugin for EPrints. You can upload a pdf, Word document or Powerpoint file, and it will extract the plain text and send it to the EDINA API. GeoPrints uses the API will pull out the locations from that data and send it to the database. Those locations will then be plotted onto a map, which is a better interface for exploring documents.”

    Point data in mashups: moving away from pushpins in maps:
    Aidan Slingsby, City University London
    “Displaying point data as density estimation services, chi surfaces and ‘tagmaps’. Using British placenames classified by generic form and linguistic origin, accessed through the Unlock Places API.”

    The dev8d programme for 2011 is being finalised at the moment and should be published soon; the event this year runs over two days, and should definitely be worth attending for developers working in, or near, education and research.


    Exploring the Locator OS OpenData set

    January 21st, 2011

    Fiona Hemsley-Flint had a good look at the OS Locator dataset which is available from the Ordnance Survey Open Data portal. I thought a summary of her findings might be of use to others thinking about how to use this dataset.

    Overview

    OS Locator contains a list of all the road names in UK, “derived from a number of Ordnance Survey datasets [Meridian2, Road database, Locality dataset, Boundary-Line]. These include the roads database which contains information on road names and road numbers and is the latest generation of Ordnance Survey’s sophisticated and highly detailed geographic data”. OS recommend viewing it on top of mid-scale datasets such as 1:10k & 1:25k Raster and streetview (which is freely available via OS opendata).

    Geometries

    Each feature is geo-referenced by a centre point and a bounding box (although some of the bboxes are actually line features where the road segment of the feature is horizontal or vertical).
    OS Locator names shown on OS map
    Figure 1. Multiple occurrences of Ferry Road, differentiated by their locality.

    Attribution

    The roads have a name and/or a classification, where the classification represents a road number, (e.g. ‘A1’ or ‘B1243’). They also have an associated settlement (town), locality, county/region and local authority; the latter two are derived from Boundary-Line, it is unclear what is used to form the ‘Locality dataset’. Locality and settlement are likely to be the most useful of these attributes when displaying result sets. For roads which cross locality boundaries, a point is assigned for each separate locality, therefore one road may have more than one point associated with it, distinguished by its locality.

    Storage

    851505 rows of data were added to a development server.
    Multiple geometry columns have been added to take into account the different geometries available.
    A ‘tsvector’ column has also been added to implement Postgres text search functionality. An example query might be:
    select name, classification, locality, settlement from os.locator_nov_10 where search @@ to_tsquery(‘high & street & edinburgh’);

    Which returns the following result set:

    Name	Classification	Locality	settlement
    CORSTORPHINE HIGH STREET		Se Corstorphine	EDINBURGH
    HIGH STREET		Musselburgh Central	EDINBURGH
    HIGH STREET		Musselburgh North	EDINBURGH
    HIGH STREET		Holyrood	EDINBURGH
    HIGH STREET	A199	Musselburgh North	EDINBURGH
    HIGH STREET	A199	Musselburgh Central	EDINBURGH
    NORTH HIGH STREET		Musselburgh North	EDINBURGH
    NORTH HIGH STREET	A199	Musselburgh West	EDINBURGH
    PORTOBELLO HIGH STREET	B6415	Milton	EDINBURGH
    PORTOBELLO HIGH STREET	B6415	Portobello	EDINBURGH
    NORTH HIGH STREET	A199	Musselburgh North	EDINBURGH

    Overall, the dataset contains a comprehensive list of the roads names within the UK. Decisions will need to be made about how to treat multiple features that actually refer to the same real world road.

    The main limitation of this dataset is that it can only be used to show the user the general location of a road – it can’t be used as a precise address gazetteer since it only provides street names with no knowledge of building numbers.


    Using source identifiers to link data

    November 29th, 2010

    In the Chalice project we’ve used Unlock Places to make links across the Linked Data web, using the source identifier which appears in the results of each place search. As this might be useful to others, it’s worth walking through an example.

    This search for “Bosley” shows us results in the UK from geonames and from the Ordnance Survey 50K gazetteer: http://unlock.edina.ac.uk/ws/nameSearch?name=Bosley&country=uk

    Here’s an extract of one of the results, the listing for Bosley in the Ordnance Survey 1:50K gazetteer:

    <identifier>11083412</identifier>
    <sourceIdentifier>28360</sourceIdentifier>
    <name>Bosley</name>
    <country>United Kingdom</country>
    <custodian>Ordnance Survey</custodian>
    <gazetteer>OS Open 1:50 000 Scale Gazetteer</gazetteer>

    The sourceIdentifier shown here is the identifier published by each of the original data sources that Unlock Places is using to cross-search.

    Ordnance Survey Research re-uses these identifiers to create its Linked Data namespace. For any place in the 50K gazetteer, we can reconstruct the link that refers to that place by appending the source identifier to this URL, which is the namespace for the 50K gazetteer: http://data.ordnancesurvey.co.uk/id/50kGazetteer/

    So our reference to Bosley can be made by adding the source identifier to the namespace:

    http://data.ordnancesurvey.co.uk/id/50kGazetteer/28360

    The same goes for source identifiers for places found in the geonames.org place-name gazetteer.

    <sourceIdentifier>2655141</sourceIdentifier>
    <name>Bosley</name>
    <gazetteer>GeoNames</gazetteer>

    Geonames uses http://sws.geonames.org/ as a namespace for its Linked Data links for places. So we can reconstruct the link for Bosley using the source identifier like this:

    http://sws.geonames.org/2655141/

    Note that the link needs the forward slash on the end to work correctly. If one looks at either of these links with a web browser, one is redirected to a human-readable page describing that place. To see the machine-readable, RDF version of the link’s contents, look at it with a command-line program such as curl, asking to “Accept” the RDF version:

    curl -L http://data.ordnancesurvey.co.uk/id/50kGazetteer/28360 -H "Accept: application/rdf+xml"

    I hope this is useful to others. We could add the links directly into the default search results, but many users may not be that interested in seeing RDF links in place-name search results. Thoughts on how we could offer this as a more useful function would be much appreciated.


    More on the use of Unlock Places by georeferencer.org

    November 19th, 2010

    Some months back, Klokan Petr Pridal, who maintains OldMapsOnline.org and works with libraries and cartographic institutes across Europe, wrote with some questions about the Unlock Places service. We met at FOSS4G where I presented our work on the Chalice project and the Unlock services.
    Petr writes about how Unlock is used in his applications, and what future requirements from the service may be:


    It was great to meet you at FOSS4G in Barcelona and discuss with you
    the progress related to Unlock and possible cooperation with
    OldMapsOnline.org and usage in Georeferencer.org services.

    As you have mentioned, the most important thing for us would be to
    have in Unlock API/database the bounding boxes (or bounding polygons) for places as direct part of the JSON response.
    We need that mostly for villages, towns and cities and for areas such
    as districts or countries – all over the world. We need something like
    “bounds” as provided by the Google geocoding API.

    The second most important feature is to have the chance to install the
    service in our servers
    – especially in case you can’t provide
    guarantees for it in a future.

    It would be also great to have chance to improve the service for non-English languages, but right now the gazetteers and text processing is not primary target of our research.

    In this moment the Unlock API is in use:

    As a standard gazetteer search service to zoom the base maps to a place people type in the search box in our Georeferencer.org service – a
    collaborative georeferencing online service for scanned historical
    maps. It is in use by National Library of Scotland and a couple of other libraries.

    Here’s an example map (you need to register first).

    The uniqueness of Unlock is in openness of the license (primarily GeoNames.org CC-BY and also OS OpenData) and also so far very good availability of the online service (EDINA hardware and network?). We are missing the bounding box to be able to zoom our base maps to the correct area (determine the appropriate zoom level). Unlock API replaced Google Geocoder, which we can’t use, because we are displaying also non-google maps (such as Ordnance Survey OpenData) and we are potentially deriving data from the gazetteer database (the control points on the old maps), which is against Google TOS.

    In the future we are keen to extend the gazetteer with alternative
    historical toponyms
    (which people can identify on georeferenced old
    maps too), or participate on such work.

    The other usage of Unlock API is:

    As a metadata text analyzer, in a service such as our
    http://geoparser.appspot.com/, where we automatically parse existing
    library textual metadata to identify place names and locate the
    described maps including automatic approximation of their spatial
    coverage (by identifying map scale and physical size in the text and
    doing a simple math on top of it). This service is in a prototype
    phase only, we are using Yahoo Placemaker and I was testing Unlock Text API
    with it too.

    Here the huge advantage of Unlock would be primarily the possibility
    to add custom gazetteers
    (with Geonames as the default one), language detection (for example via Google Language API or otherwise) and also possibility to add into the workflow other tools, such as lemmatizator for particular language – the simplest available via hun/a/ispellu
    database integration or via existing morphological rule-based software
    such as:

    The problem is that without returning the lemmatization of the text the geoparser is almost unusable in non-English languages – especially Slavic
    one.

    We are very glad for availability of your results and of the reliable
    online services you provide. We can concentrate on the problems we
    need to solve primarily (georeferencing, clipping, stitching and
    presentation of old maps for later analysis) and use your results of
    research as a component solving a problem we are touching and we have to practically solve somehow.”


    Very glad that Petr wrote at such length about comprehensive use of Unlock. pushing the edges of what we are doing with the service.

    We have some work in the pipeline adding bounding boxes for places worldwide by making Natural Earth Data searchable through Unlock Places. Natural Earth is a generalised dataset intended for use in cartography, but should also have quite a lot of re-use value for map search.


    OpenStreetmap and Linked Geodata

    October 14th, 2010

    I’ve been travelling overmuch for the last six weeks, but met lots of lovely people. Most recently, during a trip this week to discuss the Open Knowledge Foundation‘s part in the LOD2 consortium project, had a long chat with Jens and Claus, the developers and academics behind Linked Geo Data, the Linked Data version of the OpenStreetmap data.

    linked geodata browser

    The most interesting bit for Unlock is the RESTful interface to search the data; by point, radius, and bounding box, by feature class and by contents of labels assembled from tags. So it looks like Opensearch Geo as much as Unlock’s place search api does.

    Claus made up a mapping between tags and clusters of tags in OpenStreetmap, to a simple linkedgeodata.org ontology. Here’s the mapping file – warning, it is quite large – OSM->linkedgeodata mapping rules. Pointed him at Jochen Topf’s new work on OSM tag analysis and clustering, Taginfo.

    As well as the REST interface, there is a basic GeoSPARQL endpoint using Virtuoso as a Linked Data store – we ran containment queries for polygons returning polygons with reasonable performance. There is a fracturing in the GeoSPARQL world both in proposed standards and in actual implementation.

    So we want to be able to return links to LinkedGeodata.org URLs in the results of our search. Right now Unlock’s place search returns original source identifiers (from geonames, etc) as well as our local identifiers, for place-names and shapes. In fact Unlock could help with the mapping across of Linkedgeodata.org URLs to geonames URLs, which are quite widely used, an entry point into the bigger Linked Data web.

    Another very interesting tool for making links between things on the Linked Data web is SILK, by Chris Bizer, Anja Jentsch and their research group at the Freie Universitat Berlin. The latest (or still testing?) release of SILK has some spatial inference capacity as well as structural inference. So we could try it out on, for example, the Chalice data just to see what kind of links can be made between URLs for linkedgeodata things and URLs for historic place-names.

    We’ve been setting up an instance of OpenStreetmap for Unlock and other purposes at EDINA recently. Our plan with this is to start working from Nominatim, which has a point-based gazetteer for place-names down to street address level, and attempt to extract and/or generalise shapes as well as points corresponding to the names. We’re doing this to provide more/richer data search, rather than republishing original datasets in some more/differently interpretable form. So there’s lots of common ground and I hope to find ways to work together in future to make sure we complement and don’t duplicate.


    OpenSearch Geospatial in progress

    March 15th, 2010

    One promising presentation I saw last week at the Jornadas SIG Libre – Oscar Fonts’ work in the Geographic Information Group at the Universitat Jaume I building on OpenSearch Geospatial interfaces to different services. OpenSearch geo query of OSM

    The demonstrator showed during the talk was an OpenLayers map display hooked up to various OpenSearch Geo services.

    Some are “native” OpenSearch services, like the GeoCommons data deposit and mapmaking service, the interfaces published by Terradue as part of the European GENESI-DR earth observation distributed data repository project.

    The UJI demo also includes an API adapter for sensationally popular web services with geographic contents. Through the portal one can search for tweets, geotagged Flickr photos, or individual shapes from OpenStreetmap.

    Oscar’s talk highlighted the problem of seeming incompatibility between the original draft of the OpenSearch Geospatial extensions, and the version making its way through the Open Geospatial Consortium’s Catalog working group as a “part document” included in the next Catalog Services for the Web specification.

    The issues currently breaking backwards-compatibility between the versions are these:

        geo:locationString became geo:name in the OGC draft version.
        geo:polygon was omitted from the OGC draft version, and replaced with geo:geometry which allows for complex geometries (including multi-polygons) to be passed through using Well Known Text.

    1) looks like syntactic sugar – geo:name is less typing, and reads better. geo:locationString can be deprecated but supported.

    2) geo:geometry was introduced into the spec as a result of work on the GENESI-DR project, which had a strong requirement to support multi-polygons (specifically, passes over the earth of a satellite, which crossed the dateline and thus were made up of two polygons meeting on either side of the dateline).

    geo:polygon has a much simpler syntax, just a list of (latitude, longitude) pairs which join up to make a shape. This also restricts queries to two dimensions.

    This seems to be the nub of the discussion – should geo:polygon be included in the updated version – risking it being seen as clashing with or superfluous to geo:geometry, leading to end user confusion?

    There is always a balance to be met between simplicity and complexity, Oscar pointed out in his talk what I have heard in OGC Catalog WG discussions too – that as soon as a use case becomes sufficiently complex, then CSW is available and likely fitter for the job. geo:geometry is already at the top end of acceptable complexity.

    It’s about a year since I helped turn Andrew Turner’s original draft into an OGC consumable form. Anecdotally it seems like a lot more people are interested in seeing what can be done with OpenSearch Geo now.

    The OGC version is not a fork. The wiki draft was turned into a draft OGC spec after talking with Andrew and Raj Singh about the proposed changes, partly on the OpenSearch Google Group. The geo:relation parameter was added on the basis of feedback from the GeoNetwork and GeoTools communities. There’s been a Draft 2 page, as yet unmodified, on the OpenSearch wiki since that time.

    In order to build the confidence of potential adopters, these backwards-incompatibilities do need to be addressed. Personal point of view would be to update the wiki draft, deprecating locationString and including both polygon and geometry parameters.

    I was impressed by the work of Oscar and collaborators, though wondering if they are going to move in to aggregation and indexing, search-engine-style, of the results, or just use the OpenSearch interface to search in realtime fairly fast moving sources of data. I wish I’d asked this question in the session, now. It all offers reinforcement and inspiration for putting OpenSearch Geo interfaces on services nearby – Go-Geo!, CKAN. The NERC Data Discovery Service could benefit, as could SCRAN. We’ll get to see what happens, which I’m glad of.