Random Post: get_results("SELECT ID,post_title,guid FROM $wpdb->posts WHERE post_status= \"publish\" ORDER BY RAND() LIMIT 1"); $p=$post[0]; echo ('' . $p->post_title . ''); ?>
RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    Unlock Text: new API

    January 25th, 2012

    With apologies for the long hiatus on the Unlock blog. We have been active in development of the service behind the scenes. Now the Unlock team are pleased to present a new API to Unlock Text, the geoparsing (place name text mining and mapping) service.

    Read the Getting Started guide: http://unlock.edina.ac.uk/texts/getstarted or jump straight to the full Unlock Text API documentation

    Project Bamboo - research apps, infrastructure
    Meanwhile, we’ve been working with the US-based, Mellon-funded Bamboo project, a research consortium building a “Scholarly services platform” of which geoparsing is a part. We’ve been helping Bamboo to define an API which can be implemented by many different services.

    Bamboo needed to be able to send off requests and get responses asynchronously and to be able to poll to tell whether a geoparsing task is done. The resulting approach feels much more robust than our previous API to the geoparser (which had a long wait time on large documents, complained about document type); is better suited to batches or collections of texts, which most people will be in practise working with.

    We hope others find the new Unlock Text API useful. We have a bit of a sprint planned in the coming weeks and would be happy to accept any requests for new features or improvements, please get in touch with the Unlock team if there’s something you’re really missing.


    Talk to us about JISC 06/11

    June 23rd, 2011

    Glad to hear that Unlock has been cited in the JISC 06/11 “eContent Capital” call for proposals.

    The Unlock team would be very happy to help anyone fit a beneficial use of Unlock into their project proposal. This could feature the Unlock Places place-name and feature search; and/or the Unlock Text geoparser service which extracts place-names from text and tries to find their locations.

    One could use Unlock Text to create Linked Data links to geonames.org or Ordnance Survey Open Data. Or use Unlock Places to find the locations of postcodes; or find places within a given county or constituency…

    Please drop an email jo.walsh@ed.ac.uk or look up metazool on Skype or Twitter to chat about how Unlock fits with your proposal for JISC 06/11 …


    Unlock in use

    January 28th, 2011

    It would be great to hear from people about how they are using the Unlock place search services. So you’re encouraged to contact us and tell us how you’re making use of Unlock and what you want out of the service.
    screenshots from Molly, Georeferencer
    Here are some of the projects and services we’ve heard about that are making interesting use of Unlock in research applications.

    The Molly project based at University of Oxford provides an open source mobile location portal service designed for campuses. Molly uses some Cloudmade services and employs Unlock for postcode searching.

    Georeferencer.org uses Unlock Places to search old maps. The service is used by National Library of Scotland Map Library and other national libraries in Europe.
    More on the use of Unlock Places by georeferencer.org.

    CASOS at CMU has been experimenting the Unlock Text service to geolocate social network information.

    The Open Fieldwork project has been georeferencing educational resources: “In exploring how we could dynamically position links to fieldwork OER on a map, based on the location where the fieldwork takes place, one approach might be to resolve a position from the resource description or text in the resource. The OF project tried out the EDINA Unlock service – it looks like it could be very useful.”

    We had several interesting entries to 2010’s dev8d developer challenge using Unlock:

    Embedded GIS-lite Reporting Widget:
    Duncan Davidson, Informatics Ventures, University of Edinburgh
    “Adding data tables to content management systems and spreadsheet software packages is a fairly simple process, but statistics are easier to understand when the data is visual. Our widget takes geographic data – in this instance data on Scottish councils – passes it through EDINA’s API and then produces coordinates which are mapped onto Google. The end result is an annotated map which makes the data easier to access.”

    Geoprints, which also works with the Yahoo Placemaker API, by
    Marcus Ramsden at Southampton University.
    “Geoprints is a plugin for EPrints. You can upload a pdf, Word document or Powerpoint file, and it will extract the plain text and send it to the EDINA API. GeoPrints uses the API will pull out the locations from that data and send it to the database. Those locations will then be plotted onto a map, which is a better interface for exploring documents.”

    Point data in mashups: moving away from pushpins in maps:
    Aidan Slingsby, City University London
    “Displaying point data as density estimation services, chi surfaces and ‘tagmaps’. Using British placenames classified by generic form and linguistic origin, accessed through the Unlock Places API.”

    The dev8d programme for 2011 is being finalised at the moment and should be published soon; the event this year runs over two days, and should definitely be worth attending for developers working in, or near, education and research.


    More on the use of Unlock Places by georeferencer.org

    November 19th, 2010

    Some months back, Klokan Petr Pridal, who maintains OldMapsOnline.org and works with libraries and cartographic institutes across Europe, wrote with some questions about the Unlock Places service. We met at FOSS4G where I presented our work on the Chalice project and the Unlock services.
    Petr writes about how Unlock is used in his applications, and what future requirements from the service may be:


    It was great to meet you at FOSS4G in Barcelona and discuss with you
    the progress related to Unlock and possible cooperation with
    OldMapsOnline.org and usage in Georeferencer.org services.

    As you have mentioned, the most important thing for us would be to
    have in Unlock API/database the bounding boxes (or bounding polygons) for places as direct part of the JSON response.
    We need that mostly for villages, towns and cities and for areas such
    as districts or countries – all over the world. We need something like
    “bounds” as provided by the Google geocoding API.

    The second most important feature is to have the chance to install the
    service in our servers
    – especially in case you can’t provide
    guarantees for it in a future.

    It would be also great to have chance to improve the service for non-English languages, but right now the gazetteers and text processing is not primary target of our research.

    In this moment the Unlock API is in use:

    As a standard gazetteer search service to zoom the base maps to a place people type in the search box in our Georeferencer.org service – a
    collaborative georeferencing online service for scanned historical
    maps. It is in use by National Library of Scotland and a couple of other libraries.

    Here’s an example map (you need to register first).

    The uniqueness of Unlock is in openness of the license (primarily GeoNames.org CC-BY and also OS OpenData) and also so far very good availability of the online service (EDINA hardware and network?). We are missing the bounding box to be able to zoom our base maps to the correct area (determine the appropriate zoom level). Unlock API replaced Google Geocoder, which we can’t use, because we are displaying also non-google maps (such as Ordnance Survey OpenData) and we are potentially deriving data from the gazetteer database (the control points on the old maps), which is against Google TOS.

    In the future we are keen to extend the gazetteer with alternative
    historical toponyms
    (which people can identify on georeferenced old
    maps too), or participate on such work.

    The other usage of Unlock API is:

    As a metadata text analyzer, in a service such as our
    http://geoparser.appspot.com/, where we automatically parse existing
    library textual metadata to identify place names and locate the
    described maps including automatic approximation of their spatial
    coverage (by identifying map scale and physical size in the text and
    doing a simple math on top of it). This service is in a prototype
    phase only, we are using Yahoo Placemaker and I was testing Unlock Text API
    with it too.

    Here the huge advantage of Unlock would be primarily the possibility
    to add custom gazetteers
    (with Geonames as the default one), language detection (for example via Google Language API or otherwise) and also possibility to add into the workflow other tools, such as lemmatizator for particular language – the simplest available via hun/a/ispellu
    database integration or via existing morphological rule-based software
    such as:

    The problem is that without returning the lemmatization of the text the geoparser is almost unusable in non-English languages – especially Slavic
    one.

    We are very glad for availability of your results and of the reliable
    online services you provide. We can concentrate on the problems we
    need to solve primarily (georeferencing, clipping, stitching and
    presentation of old maps for later analysis) and use your results of
    research as a component solving a problem we are touching and we have to practically solve somehow.”


    Very glad that Petr wrote at such length about comprehensive use of Unlock. pushing the edges of what we are doing with the service.

    We have some work in the pipeline adding bounding boxes for places worldwide by making Natural Earth Data searchable through Unlock Places. Natural Earth is a generalised dataset intended for use in cartography, but should also have quite a lot of re-use value for map search.


    Connecting archives with linked geodata – Part II

    October 22nd, 2010

    This is part two of a blog starting with a presentation about the Chalice project and our aim to create a 1000-year place-name gazetteer, available as linked data, text-mined from volumes of the English Place Name Survey.

    Something else i’ve been organising is a web service called Unlock; it offers a gazetteer search service that searches with, and returns, shapes rather than just points for place-names. It has its origins in a 2001 project called GeoCrossWalk, extracting shapes from MasterMap and other Ordnance Survey data sources and making them available under a research-only license in the UK, available to subscribers to EDINA’s Digimap service.

    Now that so much open geodata is out there, Unlock now contains an open data place search service, indexing and interconnecting the different sources of shapes that match up to names. It has geonames and the OS Open Data sources in it, adding search of Natural Earth data in short order, looking at ways to enhance what others (Nominatim, LinkedGeoData) are already doing with search and re-use of OpenStreetmap data.

    The gazetteer search service sits alongside a placename text mining service. However, the text mining service is tuned to contemporary text (American news sources), and a lot of that also has to do with data availability and sharing of models, sets of training data. The more interesting use cases are in archive mining, of semi-unusual, semi-structured sets of documents and records (parliamentary proceedings, or historical population reports, parish and council records). Anything that is recorded will yield data, *is* data, back to the earliest written records we have.


    Place-names can provide a kind of universal key to interpreting the written record. Social organisation may change completely, but the land remembers, and place-names remain the same. Through the prism of place-names one can glimpse pre-history; not just what remains of those people wealthy enough to create *stuff* that lasted, but of everybody who otherwise vanished without trace.

    The other reason I’m here at FOSS4G; to ask for help. We (the authors of the text mining tools at the Language Technology Group, colleagues at EDINA, smart funders at JISC) want to put together a proper open source distribution of the core components of our work, for others to customise, extend, and work with us on.

    We could use advice – the Software Sustainability Institute is one place we are turning for advice on managing an open source release and, hopefully, community. OSS Watch supported us in structuring an open source business case.

    Transition to a world that is open by default turns out to be more difficult than one would think. It’s hard to get many minds to look in the same direction at the same time. Maybe legacy problems, kludges either technical, or social, or even emotional, arise to mess things up when we try to act in the clear.

    We could use practical advice on managing an open source release of our work to make it as self-sustaining as possible. In the short term; how best to structure a repository for collaboration, for branching and merging; where we should most usefully focus efforts at documentation; how to automate the process of testing to free up effort where it can be more creative; how to find the benefits in moving the process of working, from a closed to an open world.

    The Chalice project has a sourceforge repository where we’ve been putting the code the EDINA team has been working on; this includes an evolution of Unlock’s web service API, and user interface / annotation code from Addressing History. We’re now working on the best way to synchronise work-in-progress with currently published, GPL-licensed components from LTG, more pieces of the pipeline making up the “Edinburgh geoparser” and other things…


    Notes from Linking Geodata seminar at CeRch

    July 20th, 2010

    Note, this blog entry was originally published in May 2010.

    While on a bit of a road trip, had the chance to give a short seminar at the Centre for e-Research at Kings College London. This was informal, weren’t expecting much of a showing, so there are no slides, here is a quick summary.

    Introduced by Dr Stuart Dunn, and i talked about project ideas we had just been discussing – the attempt to mine the English Place Name Survey for its structure, now called CHALICE – mining archaelogical site records and artefact descriptions and attaching them to entities in OpenStreetmap using LinkedGeodata.org – mining key reference terms from documents in archives, attempting to link documents to reference data.

    Linked Geodata seems like a good place to start, pick out a sample entry and walk through the triples, at this point a bit of jumping about and graph-drawing on the whiteboard.

    There’s a list of mappings between items in Linked GeoData and in dbpedia.org, and likely thus through to geonames.org and other rich sources of Linked Data. Cf. Linked Geodata Datasets. Via sameas.org geographic links can be traversed to arrive at related media objects, resources, events.
    geonames.org has its 8m+ points and seems to be widely used in the academic geographic information retrieval community, due to its global coverage and open license.

    The text mining process used in the Edinburgh geoparser and elsewhere is two-phase, the first is the extraction purely looking at the text, of entities which seem likely to be placenames; the second phase is looking those names up in a gazetteer, and using relations between them to guess which of the suggested locations is the most likely.

    Point data, cartographic in origin. Polygon geoparsing.
    Machine learning approaches to both phases.

    We looked at UK-postcodes.com and the great work @pezholio has done on the RDF representations of postcodes there, with links across to some of the statistical area namespaces from data.gov.uk – along with the work that Ordnance Survey Research
    have in hand
    , there’s lots of new Linked Open Geodata in the UK.

    Historic names and shapes, temporal linking, these are areas where more practical, and open research has yet to be done.