RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    More on the use of Unlock Places by georeferencer.org

    November 19th, 2010

    Some months back, Klokan Petr Pridal, who maintains OldMapsOnline.org and works with libraries and cartographic institutes across Europe, wrote with some questions about the Unlock Places service. We met at FOSS4G where I presented our work on the Chalice project and the Unlock services.
    Petr writes about how Unlock is used in his applications, and what future requirements from the service may be:


    It was great to meet you at FOSS4G in Barcelona and discuss with you
    the progress related to Unlock and possible cooperation with
    OldMapsOnline.org and usage in Georeferencer.org services.

    As you have mentioned, the most important thing for us would be to
    have in Unlock API/database the bounding boxes (or bounding polygons) for places as direct part of the JSON response.
    We need that mostly for villages, towns and cities and for areas such
    as districts or countries – all over the world. We need something like
    “bounds” as provided by the Google geocoding API.

    The second most important feature is to have the chance to install the
    service in our servers
    – especially in case you can’t provide
    guarantees for it in a future.

    It would be also great to have chance to improve the service for non-English languages, but right now the gazetteers and text processing is not primary target of our research.

    In this moment the Unlock API is in use:

    As a standard gazetteer search service to zoom the base maps to a place people type in the search box in our Georeferencer.org service – a
    collaborative georeferencing online service for scanned historical
    maps. It is in use by National Library of Scotland and a couple of other libraries.

    Here’s an example map (you need to register first).

    The uniqueness of Unlock is in openness of the license (primarily GeoNames.org CC-BY and also OS OpenData) and also so far very good availability of the online service (EDINA hardware and network?). We are missing the bounding box to be able to zoom our base maps to the correct area (determine the appropriate zoom level). Unlock API replaced Google Geocoder, which we can’t use, because we are displaying also non-google maps (such as Ordnance Survey OpenData) and we are potentially deriving data from the gazetteer database (the control points on the old maps), which is against Google TOS.

    In the future we are keen to extend the gazetteer with alternative
    historical toponyms
    (which people can identify on georeferenced old
    maps too), or participate on such work.

    The other usage of Unlock API is:

    As a metadata text analyzer, in a service such as our
    http://geoparser.appspot.com/, where we automatically parse existing
    library textual metadata to identify place names and locate the
    described maps including automatic approximation of their spatial
    coverage (by identifying map scale and physical size in the text and
    doing a simple math on top of it). This service is in a prototype
    phase only, we are using Yahoo Placemaker and I was testing Unlock Text API
    with it too.

    Here the huge advantage of Unlock would be primarily the possibility
    to add custom gazetteers
    (with Geonames as the default one), language detection (for example via Google Language API or otherwise) and also possibility to add into the workflow other tools, such as lemmatizator for particular language – the simplest available via hun/a/ispellu
    database integration or via existing morphological rule-based software
    such as:

    The problem is that without returning the lemmatization of the text the geoparser is almost unusable in non-English languages – especially Slavic
    one.

    We are very glad for availability of your results and of the reliable
    online services you provide. We can concentrate on the problems we
    need to solve primarily (georeferencing, clipping, stitching and
    presentation of old maps for later analysis) and use your results of
    research as a component solving a problem we are touching and we have to practically solve somehow.”


    Very glad that Petr wrote at such length about comprehensive use of Unlock. pushing the edges of what we are doing with the service.

    We have some work in the pipeline adding bounding boxes for places worldwide by making Natural Earth Data searchable through Unlock Places. Natural Earth is a generalised dataset intended for use in cartography, but should also have quite a lot of re-use value for map search.


    Connecting archives with linked geodata – Part II

    October 22nd, 2010

    This is part two of a blog starting with a presentation about the Chalice project and our aim to create a 1000-year place-name gazetteer, available as linked data, text-mined from volumes of the English Place Name Survey.

    Something else i’ve been organising is a web service called Unlock; it offers a gazetteer search service that searches with, and returns, shapes rather than just points for place-names. It has its origins in a 2001 project called GeoCrossWalk, extracting shapes from MasterMap and other Ordnance Survey data sources and making them available under a research-only license in the UK, available to subscribers to EDINA’s Digimap service.

    Now that so much open geodata is out there, Unlock now contains an open data place search service, indexing and interconnecting the different sources of shapes that match up to names. It has geonames and the OS Open Data sources in it, adding search of Natural Earth data in short order, looking at ways to enhance what others (Nominatim, LinkedGeoData) are already doing with search and re-use of OpenStreetmap data.

    The gazetteer search service sits alongside a placename text mining service. However, the text mining service is tuned to contemporary text (American news sources), and a lot of that also has to do with data availability and sharing of models, sets of training data. The more interesting use cases are in archive mining, of semi-unusual, semi-structured sets of documents and records (parliamentary proceedings, or historical population reports, parish and council records). Anything that is recorded will yield data, *is* data, back to the earliest written records we have.


    Place-names can provide a kind of universal key to interpreting the written record. Social organisation may change completely, but the land remembers, and place-names remain the same. Through the prism of place-names one can glimpse pre-history; not just what remains of those people wealthy enough to create *stuff* that lasted, but of everybody who otherwise vanished without trace.

    The other reason I’m here at FOSS4G; to ask for help. We (the authors of the text mining tools at the Language Technology Group, colleagues at EDINA, smart funders at JISC) want to put together a proper open source distribution of the core components of our work, for others to customise, extend, and work with us on.

    We could use advice – the Software Sustainability Institute is one place we are turning for advice on managing an open source release and, hopefully, community. OSS Watch supported us in structuring an open source business case.

    Transition to a world that is open by default turns out to be more difficult than one would think. It’s hard to get many minds to look in the same direction at the same time. Maybe legacy problems, kludges either technical, or social, or even emotional, arise to mess things up when we try to act in the clear.

    We could use practical advice on managing an open source release of our work to make it as self-sustaining as possible. In the short term; how best to structure a repository for collaboration, for branching and merging; where we should most usefully focus efforts at documentation; how to automate the process of testing to free up effort where it can be more creative; how to find the benefits in moving the process of working, from a closed to an open world.

    The Chalice project has a sourceforge repository where we’ve been putting the code the EDINA team has been working on; this includes an evolution of Unlock’s web service API, and user interface / annotation code from Addressing History. We’re now working on the best way to synchronise work-in-progress with currently published, GPL-licensed components from LTG, more pieces of the pipeline making up the “Edinburgh geoparser” and other things…


    What else we’ve been up to lately

    July 22nd, 2010

    The Unlock blog has been quiet for a couple of months; since we added Ordnance Survey Open Data to the gazetteer search the team members have mostly been working on other things.

    Joe Vernon, our lead developer, has been working on the backend software for EDINA’s Addressing History project. This is a collaboration with the National Library of Scotland to create digitised and geocoded versions of historic post office directories. The sneak preview of the API is looking promising – though i agree with the commenter who suggests it should all be Linked Data!

    Lasma Sietinsone, our database engineer has been working on new data backends for Geology Roam, the new service within Digimap. She’s now finally free to start work on our OpenStreetmap mirror and adding search of OpenStreetmap features to Unlock’s open data gazetteer.

    I’ve been putting together a new project which has just started – CHALICE, short for Connecting Historical Authorities with Links, Contexts and Entities. This is a collaboration with several partners – Language Technology Group, who do the text mining magic behind the Unlock Text service; the Centre for Data Digitisation and Analysis in Belfast; and the Centre for e-Research at KCL. The CHALICE project arose from discussions at the wrap-up workshop on “Embedding GeoCrossWalk” (as Unlock was once known). It will involve text mining to create a historic gazetteer for parts of the UK in Linked Data form.

    I also worked with Yin Chen on a survey of EDINA services with an eye to where use of Linked Data could be interesting and valuable; then took a long holiday.

    So we are overdue for another burst of effort on the Unlock services, and there should be lots more to write about here on the blog over the coming weeks and months.


    Notes from Linking Geodata seminar at CeRch

    July 20th, 2010

    Note, this blog entry was originally published in May 2010.

    While on a bit of a road trip, had the chance to give a short seminar at the Centre for e-Research at Kings College London. This was informal, weren’t expecting much of a showing, so there are no slides, here is a quick summary.

    Introduced by Dr Stuart Dunn, and i talked about project ideas we had just been discussing – the attempt to mine the English Place Name Survey for its structure, now called CHALICE – mining archaelogical site records and artefact descriptions and attaching them to entities in OpenStreetmap using LinkedGeodata.org – mining key reference terms from documents in archives, attempting to link documents to reference data.

    Linked Geodata seems like a good place to start, pick out a sample entry and walk through the triples, at this point a bit of jumping about and graph-drawing on the whiteboard.

    There’s a list of mappings between items in Linked GeoData and in dbpedia.org, and likely thus through to geonames.org and other rich sources of Linked Data. Cf. Linked Geodata Datasets. Via sameas.org geographic links can be traversed to arrive at related media objects, resources, events.
    geonames.org has its 8m+ points and seems to be widely used in the academic geographic information retrieval community, due to its global coverage and open license.

    The text mining process used in the Edinburgh geoparser and elsewhere is two-phase, the first is the extraction purely looking at the text, of entities which seem likely to be placenames; the second phase is looking those names up in a gazetteer, and using relations between them to guess which of the suggested locations is the most likely.

    Point data, cartographic in origin. Polygon geoparsing.
    Machine learning approaches to both phases.

    We looked at UK-postcodes.com and the great work @pezholio has done on the RDF representations of postcodes there, with links across to some of the statistical area namespaces from data.gov.uk – along with the work that Ordnance Survey Research
    have in hand
    , there’s lots of new Linked Open Geodata in the UK.

    Historic names and shapes, temporal linking, these are areas where more practical, and open research has yet to be done.