Leveraging geodata for enriched records

This post discusses how I’ve been using various geodata tools (principally Yahoo!, but also Flickr shapefiles, Google’s maps and geocoder apis, Geonames, OSdata and I’m now exploring the Unlock project from Edina to see what they can offer as well), for the enrichment of our database. I started writing this post back in May, but as I’ve just spoken at the W3G unconference in Stratford-on-Avon, I thought I’d finish it and get it out. Gary Gale and his helpers produced a very good un-conference, at which I met some very interesting people (shame TW Bell couldn’t come!) and saw some good examples of what other people are up to.

My presentation from that conference is embedded here:


Most of us realise the power of maps and I’ve made them a very central cog of the new Scheme website that we soft launched at the end of March 2010. Hopefully this isn’t too long and boring and has some technical stuff that may be of some use to others. As always, the below is CC-NC-SA.

A map showing all finds recorded by the SchemeAt the Scheme, we’ve been collecting data on the provenance of archaeological discoveries made by the public and publishing it online for 13 years now (much longer than I’ve worked for the Scheme!), and these are collated on our database and provide the basis for spatial interrogation of where and when these objects have been discovered. Many researchers are using the database for a variety of geomatics, for example patterning, cluster analysis etc. A few of our recent AHRC funded PhD candidates have been implementing GIS techniques as one of the integral parts of their research – for example:

  1. Tom Brindle, KCL
  2. Philippa Walton, UCL
  3. Ian Leins, Newcastle University
  4. Katie Robbins, Southampton University

They are all incidentally alumni of the Scheme (and one has rejoined recently!), and have been inspired with their research from working on these data that we collate. I won’t be discussing the philosophical arguments of provenance and its meaning within this article, but demonstrating how we’re using third-party tools to enhance find spot data on our site and talk about some of the problems we face to make use of them. Some of the post will have some code examples, but you can gloss over them if you’re not into that (which the majority of our readers will probably be!)

Find spot data

Our Find Liaison Officers record the majority of objects that we are shown onto our database and ask the finder to provide us with the most accurate National Grid Reference (OSGB36) that they can produce. Many of our finders are now using GPS units to produce grid references (we’re aware of the degree of accuracy/precision they provide, but as most objects aren’t from secure archaeological contexts, the variance won’t affect work that much.) We encourage people to provide these figures to a level of 8 figures and above and this proportion is growing every year. The list below shows the precision of each grid reference length:

  • 0 figure [SP] = 100 kilometre square
  • 2 figure [SP11] = 10 kilometre square
  • 4 figure [SP1212] = 1 kilometre square
  • 6 figure [SP8123123] = 100 metre square

Then we get the figures that are actually of some archaeological use:

  • 8 figure [SP812341234] = 10 metre square
  • 10 figure [SP1234512345] = 1 metre square
  • 12 figure [SP123456123456] = 10 centimetre square

This find spot data is given to us in confidence by the finders and landowners and we therefore have to protect this confidence. We have an agreement with the main providers of our data – the metal detecting community – representative body (The National Council for Metal Detecting), that we won’t publish on-line find spots at a precision higher than a 4 figure national grid reference or to parish level. These grid references can be obscured from public view completely by asking the Finds Liaison Officer to enter a to be “known as” alias on the find spot form at the time of recording (or subsequently).

Converting OSGB36 grid references to Latitude and Longitude pairs

Most of the web mapping programs out there, make use of Latitude and Longitude pairs for displaying point data on their mapping interfaces. Therefore, we now convert all our NGRs to LatLng on our database and these are stored as floats in two columns in our find spots table. Whilst processing these grid references to the LatLng pairs, I also do some further manipulations to produce and insert into our spatial data table:

  1. Grid reference length
  2. Accuracy of grid reference as shown above
  3. Four figure grid reference
  4. 1:25k map reference
  5. 1:10k map reference
  6. Findspot elevation
  7. Where on Earth ID

The PHP code functions to do this are based around some written by the original Oxford ArchDigital team and has some additions by me. There are publicly available code examples by Barry Hunter or Jeff Stott or some other versions out there on the web! My code is used as either a service or view helper in my Zend Framework project and bundles together a variety of functions. I’m not a developer, I’ve just taught myself bits and pieces to get the PAS website back on the road; if you see errors, do let me know or suggest ways to improve the code.

Using Yahoo! to geo-enrich our data

Several years ago, the great Tyler Bell (formerly of Oxford ArchDigital and Yahoo!) gave a paper at an archaeological computing conference at UCL’s Institute of Archaeology, where he broke his joke about XML being like high school sex (won’t elaborate on this, ask him) whilst some toe-rags were trying to steal my push bike (they came off badly as I was standing behind them!) Tyler’s paper gave me much food for thought, and it is over the last year or so, that this idea has really come to fruition with our data. The advent of Yahoo!’s suite of geoPlanet tools has allowed us to do various things to our data set and present it in different ways. Below, I’ll show you some of things their powerful suite of tools has allowed us to do.

Putting dots on maps for finds with just a parish name

Prior to 2003, we often received the majority of our finds with a very vague find spot, often just to parish level. As everyone loves maps and would like to see where these finds came from, I wanted to get a map on every page that needed one.. Previously, our FLOs would be asked to centre a find on the parish for these find spots; this is now such a waste of time when you can use geoPlanet to get a latitude and longitude, Postcode, type of settlement, bounding box and a WOEID to enhance the data that we hold. To do this is pretty simple with the aid of YQL.

I first heard about YQL from Jim O’Donnell (formerly NMM’s web wizard) and then more at the Yahoo! Hack day in London, when I pedalled round London with Andrew Larcombe on the Purple Pedals bikes. Yahoo describe YQL as SELECT * FROM Internet, which is indeed pretty true. Building opentables to use with their system is pretty easy – I’ll write more about some Museum tables in another post soon. So all my geo extraction is performed using YQL and the examples below show how. All of these are done with the public endpoint. If you run a high traffic site, it is definitely worth changing your code to use Oauth and authenticate your YQL calls for the non-public endpoint (better rate limits etc). It is slightly tricky and you do need to work out how to refresh your Yahoo token, but it is worth the effort.

For example, I grew up in Stapleford, Cambridgeshire and you can search for that with the following YQL call:

[HTML]select * from geo.places where text="stapleford,cambridgeshire"[/html]

Which maps to this REST URL of:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.places%20where%20text%3D%22stapleford%2Ccambridgeshire%22
producing an XML or JSON response like below (diagnostics omitted):

[XML toolbar="true"]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2010-05-05T11:12:49Z" yahoo:lang="en-US">
<results>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:URI="http://where.yahooapis.com/v1/place/35984">
<woeid>35984</woeid>
<placeTypeName code="7">Town</placeTypeName>
<name>Stapleford</name>
<country code="GB" type="Country">United Kingdom</country>
<admin1 code="GB-ENG" type="Country">England</admin1>
<admin2 code="GB-CAM" type="County">Cambridgeshire</admin2>
<admin3/>
<locality1 type="Town">Stapleford</locality1>
<locality2/>
<postal type="Postal Code">CB22 5</postal>
<centroid>
<latitude>52.145329</latitude>
<longitude>0.151490</longitude>
</centroid>
<boundingBox>
<southWest>
<latitude>52.127220</latitude>
<longitude>0.133460</longitude>
</southWest>
<northEast>
<latitude>52.164879</latitude>
<longitude>0.176640</longitude>
</northEast>
</boundingBox>
<areaRank>3</areaRank>
<popRank>1</popRank>
</place>
</results>
</query>
[/xml]

By parsing the XML or JSON response (I tend to use the JSON response),  a Latitude and Longitude pair can be retrieved for placing the object onto the map. It isn’t the true find spot, but can at least give a high level overview of the point of origin. Whilst doing this, I also take the postcode, woeid, bounding box etc to reuse again. Parsing data is pretty simple once you have got your response and decoded the JSON, for example:

[PHP]
$place = $place->query->results->place;
$placeData = array();
$placeData['woeid'] = (string) $place->woeid;
$placeData['placeTypeName'] = (string) $place->placeTypeName->content;
$placeData['name'] = (string) $place->name;
if($place->country){
$placeData['country'] = (string) $place->country->content;
}
if($place->admin1) {
$placeData['admin1'] = (string) $place->admin1->content;
}
if($place->admin2){
$placeData['admin2'] = (string) $place->admin2->content;
}
if($place->admin3){
$placeData['admin3'] = (string) $place->admin3->content;
}
if($place->locality1){
$placeData['locality1'] = (string) $place->locality1->content;
}
if($place->locality2){
$placeData['locality2'] = (string) $place->locality2->content;
}
if($place->postal){
$placeData['postal'] = $place->postal->content;
}
$placeData['latitude'] = $place->centroid->latitude;
$placeData['longitude'] = $place->centroid->longitude;
$placeData['centroid'] = array(
‘lat’ => (string) $place->centroid->latitude,
‘lng’ => (string) $place->centroid->longitude
);
$placeData['boundingBox'] = array(‘southWest’ => array(
‘lat’ => (string) $place->boundingBox->southWest->latitude,
‘lng’ => (string) $place->boundingBox->southWest->longitude),
‘northEast’ => array(
‘lat’ => (string) $place->boundingBox->northEast->latitude,
‘lng’ => (string) $place->boundingBox->northEast->longitude)
);
return $placeData;
[/php]

The image below shows an autogenerated findspot and a parish boundary (see below for flickr shapefile use) and adjacent places.

An autogenerated findspot

Within our database, I have a certainty field for where the co-ordinates originate from. This table has the following content:

  1. From a map
  2. From finder verbally
  3. GPS from the finder
  4. GPS from the FLO
  5. Centred on the parish via a paper map
  6. Recorded at a rally (so certainty could be dubious)
  7. Produced via webservice

Therefore researchers are appraised of where the findspot comes from and whether we can treat it (if at all) as useful.

Getting elevation (via Geonames)

The woeid or the LatLng can be used to get elevation of the find spot. This can be achieved by a combination of reverse geocoding against Flickr place names (for woeid) and the Geonames API call for ‘Elevation – Aster Global Digital Elevation Model’. So for example, I want to get the elevation for the centre of Stapleford. You can query the geonames API with the following YQL:

[HTML wraplines="true"]select * from json where URL="http://ws.geonames.org/astergdemJSON?lat=52.145329&lng=0.151490";[/html]

Which when executed produces this response:

[XML]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="1" yahoo:created="2010-09-30T12:26:00Z" yahoo:lang="en-US">
<results>
<json>
<astergdem>17</astergdem>
<lng>0.15149</lng>
<lat>52.145329</lat>
</json>
</results>
</query>
[/xml]

So I now have the elevation of 17 metres above sea level. Great! I’ve been experimenting a bit with this against some findspots that we know elevation for. One high profile object, the Crosby Garrett Helmet was pinpointed to 1 metre difference in the GPS elevation and the Geonames sourced one.

By providing an elevation for each of our findspots, researchers can then do viewshed analysis; I don’t think anyone has really done this yet for the artefact distributions that we record, but I could be proved wrong!

Reverse geocoding from Latitude and Longitude with Yahoo!

At present, the GeoPlanet suite doesn’t provide this feature, but you can still manage to do this via YQL and using the following query:

[HTML]select * from flickr.places where lon={Longitude} and lat={latitude}[/html]

So for example using Stapleford’s LatLng as the YQL parameters gives you:

[XML wraplines="true"]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2010-05-05T01:27:47Z" yahoo:lang="en-US">
<results>
<places accuracy="16" latitude="52.145329" longitude="0.151490" total="1">
<place latitude="52.145" longitude="0.151" name="Stapleford, England, United Kingdom" place_id="m2G8tyiaBJVjFQ" place_type="locality" place_type_id="7" place_url="/United+Kingdom/England/Stapleford/in-Cambridgeshire" timezone="Europe/London" woeid="35984"/>
</places>
</results>
</query>
[/xml]

You’ll notice a couple fo useful things in the Flickr XML returned, for example the place_url, in this case: /united+kingdom/england/stapleford/in-cambridgeshire  which when appended to Flickr’s root URL for photos can give you http://www.flickr.com/places/united+kingdom/england/stapleford/in-cambridgeshire which in turn gives you access to feeds in various flavours from that page.

One of the other cool things available in Flickr’s API is placeinfo. I’d love a boundary map of how Flickr views the parish of Stapleford. As I previously obtained and gave my findspot a WOEID, I can see if Flickr has this data. So perform this YQL query:

[HTML]select * from flickr.places.info where woe_id=’35984′[/html]

And execute it to obtain the following XML:

[XML wraplines="true"]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="1" yahoo:created="2010-09-30T12:34:50Z" yahoo:lang="en-US">
<results>
<place has_shapedata="1" latitude="52.145" longitude="0.151"
name="Stapleford, England, United Kingdom"
place_id="m2G8tyiaBJVjFQ" place_type="locality"
place_type_id="7"
place_url="/United+Kingdom/England/Stapleford/in-Cambridgeshire"
timezone="Europe/London" woeid="35984">
<locality latitude="52.145" longitude="0.151"
place_id="m2G8tyiaBJVjFQ"
place_url="/United+Kingdom/England/Stapleford/in-Cambridgeshire" woeid="35984">Stapleford, England, United Kingdom</locality>
<county latitude="52.373" longitude="0.007"
place_id="pVJUVwKYA5qQZa9wqQ"
place_url="/pVJUVwKYA5qQZa9wqQ" woeid="12602140">Cambridgeshire, England, United Kingdom</county>
<region latitude="52.883" longitude="-1.974"
place_id="pn4MsiGbBZlXeplyXg"
place_url="/United+Kingdom/England" woeid="24554868">England, United Kingdom</region>
<country latitude="54.314" longitude="-2.230"
place_id="DevLebebApj4RVbtaQ"
place_url="/United+Kingdom" woeid="23424975">United Kingdom</country>
<shapedata alpha="0.00015" count_edges="16"
count_points="44" created="1248244568" has_donuthole="0" is_donuthole="0">
<polylines>
<polyline>52.155731201172,0.17115999758244 52.158447265625,0.17576499283314 52.159084320068,0.18161700665951 52.159244537354,0.18208900094032 52.15747833252,0.18410600721836 52.153221130371,0.18645000457764 52.151500701904,0.17897999286652 52.14905166626,0.17045900225639 52.136436462402,0.15260599553585 52.135303497314,0.14247800409794 52.140232086182,0.13955999910831 52.145477294922,0.14135999977589 52.145721435547,0.14150799810886 52.145240783691,0.14707000553608 52.154125213623,0.16043299436569 52.155731201172,0.17115999758244</polyline>
</polylines>
<URLs>
<shapefile>http://farm4.static.flickr.com/3483/shapefiles/35984_20090722_6d95b5e27e.tar.gz</shapefile>
</urls>
</shapedata>
</place>
</results>
</query>
[/xml]

Brilliant, the polylines can be used draw an outline shapefile on the map.

Extracting place data from find descriptions

Y!Geo tagsMany of our objects are tied by descriptive prose to various places around the World. By using Yahoo’s Placemaker, we can now extract the entities from the finds data and allow for cross referencing of all objects that have Avon, England within their description. The image below shows you where you’ll see the tags displayed on the finds record, as I’m into Classics, you’ll notice I label these with lower case Greek letters for bullets. Probably pretentious!  To get these tags is really very straightforward and can use another pretty simple YQL call, for example, this text is from the famous Moorlands Patera.

[HTML toolbar="true" wraplines="true"]
SELECT * FROM geo.placemaker WHERE documentContent = "Only two other vessels with inscriptions naming forts on Hadrian’s Wall are known; the ‘Rudge Cup’ which was discovered in Wiltshire in 1725 (Horsley 1732; Henig 1995) and the ‘Amiens patera’ found in Amiens in 1949 (Heurgon 1951). Between them they name seven forts, but the Staffordshire patera is the first to include Drumburgh and is the only example to name an individual. All three are likely to be souvenirs of Hadrian’s Wall, although why they include forts on the western end of the Wall only is unclear" AND documentType="text/plain"[/html]

Which then produces this output in XML:

[XML toolbar="true"]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2010-05-05T04:26:57Z" yahoo:lang="en-US">
<results>
<matches>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>575961</woeId>
<type>Town</type>
<name><![CDATA[Amiens, Picardie, FR]]></name>
<centroid>
<latitude>49.8947</latitude>
<longitude>2.29316</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>575961</woeIds>
<start>177</start>
<end>183</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Amiens]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>575961</woeIds>
<start>201</start>
<end>207</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Amiens]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>12602186</woeId>
<type>County</type>
<name><![CDATA[Wiltshire, England, GB]]></name>
<centroid>
<latitude>51.3241</latitude>
<longitude>-1.9257</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>12602186</woeIds>
<start>123</start>
<end>132</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Wiltshire]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>12602189</woeId>
<type>County</type>
<name><![CDATA[Staffordshire, England, GB]]></name>
<centroid>
<latitude>52.8248</latitude>
<longitude>-2.02817</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>12602189</woeIds>
<start>276</start>
<end>289</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Staffordshire]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>23509175</woeId>
<type>LandFeature</type>
<name><![CDATA[Hadrian's Wall, Bardon Mill, England, GB]]></name>
<centroid>
<latitude>54.9522</latitude>
<longitude>-2.32975</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>23509175</woeIds>
<start>418</start>
<end>432</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Hadrian’s Wall]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
</matches>
</results>
</query>
[/xml]

In the above XML response, you can now see the matches that Placemaker has found in the text sent to their service. You can now parse this data and use it for tagging or any other purpose that you want to put the data to.  YQL has the added benefit of caching at the Yahoo! end and you can do multiple queries in one call as demonstrated by Chris Heilmann in his Geoplanet explorer.

A YQL Multiquery example

For example, I want to combine a placemaker call and also get some spatial information for a find spot where I only have the placename. To do this, I write this YQL query:

[HTML toolbar="true" wraplines="true"]
select * from query.multi where queries=’
select * from geo.placemaker where documentContent = "Only two other vessels with inscriptions naming forts on Hadrian’s Wall are known the Rudge Cup which was discovered in Wiltshire in 1725 (Horsley 1732, Henig 1995) and the Amiens patera found in Amiens in 1949 (Heurgon 1951). Between them they name seven
forts, but the Staffordshire patera is the first to include Drumburgh and is the only example to name an individual. All three are likely to be souvenirs of Hadrian’s Wall, although why they include forts on the western end of the Wall only is unclear" and documentType="text/plain" and appid="";
select * from geo.places where text="staffordshire moorlands, staffordshire,uk" ‘
[/html]

The base URL to call this is the public version – http://query.yahooapis.com/v1/public/yql and as we are using one of the community tables, the call needs to be made with &env=store://datatables.org/alltableswithkeys appended (urlencoded).
This can be run in the console and produces the following XML response – 4 place matches and the geo data for Staffordshire Moorlands.

[XML toolbar="true" wraplines="true"]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="2" yahoo:created="2010-09-30T11:44:08Z" yahoo:lang="en-US">
<results>
<results>
<matches>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>575961</woeId>
<type>Town</type>
<name><![CDATA[Amiens, Picardie, FR]]></name>
<centroid>
<latitude>49.8947</latitude>
<longitude>2.29316</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>575961</woeIds>
<start>196</start>
<end>202</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Amiens]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>12602186</woeId>
<type>County</type>
<name><![CDATA[Wiltshire, England, GB]]></name>
<centroid>
<latitude>51.3241</latitude>
<longitude>-1.9257</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>12602186</woeIds>
<start>120</start>
<end>129</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Wiltshire]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>12602189</woeId>
<type>County</type>
<name><![CDATA[Staffordshire, England, GB]]></name>
<centroid>
<latitude>52.8248</latitude>
<longitude>-2.02817</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>12602189</woeIds>
<start>271</start>
<end>284</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Staffordshire]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
<match>
<place xmlns="http://wherein.yahooapis.com/v1/schema">
<woeId>23509175</woeId>
<type>LandFeature</type>
<name><![CDATA[Hadrian's Wall, Bardon Mill, England, GB]]></name>
<centroid>
<latitude>54.9522</latitude>
<longitude>-2.32975</longitude>
</centroid>
</place>
<reference xmlns="http://wherein.yahooapis.com/v1/schema">
<woeIds>23509175</woeIds>
<start>413</start>
<end>427</end>
<isPlaintextMarker>1</isPlaintextMarker>
<text><![CDATA[Hadrian&rsquo;s Wall]]></text>
<type>plaintext</type>
<xpath><![CDATA[]]></xpath>
</reference>
</match>
</matches>
</results>
<results>
<place xmlns="http://where.yahooapis.com/v1/schema.rng"
xml:lang="en-US" yahoo:URI="http://where.yahooapis.com/v1/place/12696078">
<woeid>12696078</woeid>
<placeTypeName code="10">Local Administrative Area</placeTypeName>
<name>Staffordshire Moorlands District</name>
<country code="GB" type="Country">United Kingdom</country>
<admin1 code="GB-ENG" type="Country">England</admin1>
<admin2 code="GB-STS" type="County">Staffordshire</admin2>
<admin3/>
<locality1/>
<locality2/>
<postal/>
<centroid>
<latitude>53.071468</latitude>
<longitude>-1.993490</longitude>
</centroid>
<boundingBox>
<southWest>
<latitude>52.916691</latitude>
<longitude>-2.211330</longitude>
</southWest>
<northEast>
<latitude>53.226250</latitude>
<longitude>-1.775660</longitude>
</northEast>
</boundingBox>
<areaRank>6</areaRank>
<popRank>0</popRank>
</place>
</results>
</results>
</query>
[/xml]

Concordance with other services

One of the other things I am interested in, is finding concordance between WOEID and Geonames places. This is quite easy to do using another geo table. For example look up Amiens, Picardie by WOEID:

[HTML]select * from geo.concordance where namespace="woeid" and text="575961"[/html]

Produces:

[XML]
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="1" yahoo:created="2010-09-30T12:41:28Z" yahoo:lang="en-US">
<results>
<concordance xml:lang="en-US"
xmlns="http://where.yahooapis.com/v1/schema.rng"
xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:URI="http://where.yahooapis.com/v1/concordance/woeid/575961">
<woeid>575961</woeid>
<geonames>3037854</geonames>
<locode>FRAMI</locode>
</concordance>
</results>
</query>
[/xml]

So you now have a WOEID and a geonames ID. Amiens WOEID = 575961 and Geonames = 3037854. You can use the geonames id that is produced for linked data; for example: http://ws.geonames.org/rdf?geonameId=3037854

Problems using YQL for geodata

Even though combining YQL with the power of Geoplanet is awesome, I did run into a few problems. None of these were really insurmountable:

  1. Hit rate limit constantly – Google’s indexing of our site was causing our server to make too many requests to YQL; fixed by changing caching model and switching to Oauth endpoint. Also I changed my code to ignore responses when the headers returned were: text/html;charset=UTF-8. The rate limit page thrown up by Yahoo! is HTML and not an XML response.
  2. Some places were pulled out of text when they were irrelevant – Copper Alloy, Tamil Nadu is one example. Fixed by creating a stop list
  3. geonames API sometimes takes a while to respond and made application hang – changed cUrl settings
  4. Took quite a long time to parse 400,000 records – can’t do much with that!

However, I’d really recommend using YQL to extract geodata for your application. Hopefully, Yahoo! will maintain YQL and Geo as integral parts of their business model…. In the future, I would love to run the British Museum collections data through these functions and see what cross-referencing I could find….

Six month review of new website performance

The Crosby Garrett Helmet

The Scheme’s new website has been online now for 6 months and I’ve been looking at the performance and costs incurred during this period. We’ve had several large discoveries since the site went live – the Frome Hoard and the Crosby Garrett Roman Helmet for instance. However, they aren’t typical objects so we don’t get the big spikes in referral from large news aggregators or providers daily. I’m a little disappointed that web traffic hasn’t grown significantly since we went live with the new site, but we’re still getting  a long period of activity/ pages viewed per visit. I’ve worked hard on search engine visibility (apart for a blip in July when I blocked all search engines via a typo in my robots.txt file – as the great Homer says, D’oh!) and we’re now seeing a surge in pages being added to Google’s index (nearly up to 50% of 400,000 publicly accessible pages now included according to webmaster tools).

Web statistics

All the web statistics are produced via Google Analytics, I haven’t bothered with the old logfile analysis.  The old stats that we used to return for the DCMS and quoted in our annual reports were heavily reliant on ‘hits’, a metric I always hated.  Some simple observations:

  • We get a trend of heavy weekday usage, with noticeable dips at weekends when recording isn’t as prevalent.

    Typical weekly pattern

    Typical weekly pattern

  • We don’t get a huge audience, our topic is pretty niche, but hopefully it will keep increasing.
  • Overall visitors average 10mins 59 seconds on site and view nearly 14 pages a visit, with a bounce rate of 37.46%.
  • Those visits that are mainly within the confines of the database module average 21 pages per visit and around 17 mins 17 seconds, with a bounce rate of 23.62%
  • We had significant surges in traffic on the days that Frome and Crosby Garrett were announced (8th July and 14th September)
  • 14% of our users who are stuck with IE use version 6. Guess where the majority of these poor people are based… Government sector offices.
  • 149 countries are representing as having visited; the top two countries are the UK (76% of total) and USA (7%) which account. I assume, this is mainly because our subject material is mainly centred on England & Wales. It would be great if we could penetrate the archaeological syllabus in other countries as we have such a mass of data to play with.
  • We have now consolidated our domains down to one so our previous webstats definitely gave a false measure of usage of our resources.
  • We have 66 partner organisations and 3 main funders/hosting organisations; MLA, British Museum and DCMS. We get very little referral traffic from any of these as shown here: BM – 2,495 referrals, MLA – 64 referrals, DCMS – 60 referrals. I think it is a shame a flagship project doesn’t get more click through, but then it is hard to position us higher on these sites as there is so much culture to promote. However, MLA’s description of our project is rather out of date.
  • Google accounts for 45.88% of the originating point for traffic to our site, Yahoo for 0.84% and Bing 0.80%
  • In the 6 month period, we have had 130,235 visits;  1,788,580 page views; 65,531 Visitors. Compared to the same period last year, (which coincidentally ends with the day that the Staffordshire Hoard was announced, we had 95,902 visits; 514,341 page views; 57,990 visitors – all figures for finds.org.uk and for findsdatabase.org.uk we had: 48,786 visits; 1,185,537 page views; 17,767 visitors). All these figures are devoid of usage of XML/JSON/KML functions and feeds.
  • A more detailed breakdown can be found in this PDF

New functions

Since launch, we’ve  released lots of new features, all based on Zend framework code:

  • More extensive mining of theyworkforyou for Parliamentary data
  • Heavy use of YQL throughout the website
    • Flickr images pulled in
    • Oauth YQL calls to make use of Yahoo! geo functions
  • Created a load of YQL tables for Museum and heritage website API and opensearch modules
  • Integrated Geoplanet’s data into database backend from their data dump
  • Added old OS maps from the National Library of Scotland (these are great and easy to implement) to most of our maps, for example a search for ‘Sompting‘ axeheads and click on ‘historical’
  • Integrated the Ordnance Survey 1:50K dataset for antiquities and Roman sites
  • Integrated the English Heritage Scheduled Monuments dataset (only available to higher level users.)
  • Pulled in data from Amazon for our references (prices, book cover art etc) for example ‘Toys, trifles and trinkets’ by Egan and Forsyth
  • Mined the Guardian API for news relating to the Scheme
  • Created functions for the public to record their own objects and find previously recorded ones easily. This has been quite well received, see Garry Crace’s article on how he found it.
  • Used some semantic techniques (FOAF for example – our contacts page uses this in rdfa)
  • Context switched formats for a wide array of pages across the site
  • Got OAI access working
  • Created extensive sitemaps for search indexing

Database statistics

Some raw statistics of progress with the new database can be seen below:

24601 records have been created which documents the discovery and recording of 94,978 objects (one hoard of coins adds 52,503 objects alone – so remove these and you get  42475 objects). We also released functions that allowed the public to record their own objects, and this has resulted in the addition of 740 records from 32 recorders. We expect this number to increase following the release of an instructional guide produced by our Kent FLA and FLO – (Jess Bryan and Jen Jackson).

Users

User accounts created: 855 with no spam accounts created so far.

  • 2 Finds Adviser status
  • 38 Finds Liaison Officer status
  • 13 Historic Environment Officer status
  • 745 ordinary members
  • 57 Research status accounts

In the previous existence of our database over at findsdatabase.org.uk, we had 1135 accounts created in 7 years.

Research

58 new research projects have been added to our research register with the following levels of activity:

962,601 searches have been performed since relaunch. We’ve had 132 reports of incorrect data being published on our data (undoubtedly, there are more errors, people are just shy!) and 250-ish comments on records. These functions are both protected by reCaptchas and akismet and we’ve had 5 spam submissions in 6 months.

Contributors of data

943 new contributors have offered data for recording or become involved by recording or researching. I’m tidying up the database so that we can do better analysis of what people use our facility for. We now collect primary activity and postcodes, so that we can do some better statistical analysis.

Running costs for following domains:

www.finds.org.uk
www.findsdatabase.org.uk
www.staffordshirehoard.org.uk
www.pastexplorers.org.uk

Server farm hosting fee: £828
Bandwidth cost for excess load: £234
Remote backup space: £900 (350GB images)
Amazon S3 backup space: £0.27 ($0.42) for (11GB data transfer of MySQL backups)
Flickr licence: £15.30 ($24)
Get satisfaction account: £36.38 ($57) which I cancelled after 3 months due to the fact it was underused.

Development costs: Covered by my salary, not revealing that.

Total IT cost for running: £2013.95 (or around 8p per record or a more meaningless statistic because of the huge hoard, of circa 2p per object)
We plan to make this reduce further by switching backup to S3 for images as well or renegotiating with our excellent providers at Dedipower in Reading. Since the demise of Oxford ArchDigital, we’ve already made IT cost savings of c. £15,000 per annum in support fees and also all development work has been taken on in house.

Hopefully people are finding our new site much more useful, we’ve got more stuff to come….

My experience of self-recording on the database

Well the long awaited new PAS database has landed and users out there in the land of archaeology and historical research are busying themselves adding new artefacts and data mining this fantastic and unique historical resource.  Dan the database builder man has received some well deserved plaudits for his new creation; a work of love if ever there was one, created entirely on a shoestring with fewer servers than Wimbledon in the closed season!

Time has elapsed since the beta launch and users have started to settle in to the day to day interaction using established as well as the new features it has to offer.  One of the principal benefits of the new database is the ability Joe Public now has to self-record any qualifying discoveries made.

I belong to a trio of detectorists called the Sussex Pastfinders, like many, our ethos is to give identity,  location and historical context to what we find, such that landowners, farmers, archaeologists, historians, scholars and the public alike can all benefit from the results.  As such we undertake a variety of local history projects, with all our qualifying finds being recorded with the PAS and each project illustrated and formally written up.  Whilst the majority of what we find we can identify ourselves, often the finer points elude us, and very occasionally we are closer to clueless.  In the peak detecting season we would see our heavily worked FLO starting to drown in the flood of recovered objects from across Sussex.  Given our reliance on the PAS to help complete our project reports, and with us working to promised landowner deadlines, we would often create pressure on the system and develop a backlog of finds for recording.  The new self-recording feature therefore now enables us to give some assistance in getting our finds landed professionally on the database.

As a self-recorder there is a wide spectrum of find data entry for the individual to get involved in.  At its most basic only the object type and broad period have to be completed before the record can be saved.  The system automatically generates a unique “PUBLIC” find number.  Keeping this number together with the find when handing it over to the FLO will help facilitate and smooth the recording process considerably and improve process efficiently.  The only kit required to do this is a computer with web access.  Subsequently the FLO will complete the required database fields and “promote” the object to be visible to all.  At the high-level end of the self-recording spectrum is the full identification, annotation and illustration of the find.  If this is done to the necessary PAS standard in the first instance, then with a ‘rubber-stamp’ of approval from the FLO the find is promoted to full view.

PUBLIC-E92C88PAS record number: PUBLIC-E92C88
Object type: Scraper (tool)
Broadperiod: Neolithic
County of discovery: East Sussex
Stable url: http://www.finds.org.uk/database/artefacts/record/id/391660

Fig 1: An example record I recorded on the Scheme database.

However, this Full-Monty recording does require access to comprehensive reference material, a balance accurate to a hundredth of a gram, a vernier gauge, and a reasonable macro photography set-up.  There are of course all points in between the two extremes.  The database entry is very intuitive, most of the fields are not free-form but simple to use drop down boxes, and if an error is made there is a simple edit option that allows the record to be corrected.  The arrangement I have with Laura our Sussex FLO is that if there are any gaps or clarification points in a record I have created then a few words of explanation are placed in the notes section of the record.  These are spotted on review, the record adjusted and the discussion notes deleted before it is promoted on the database.

Full Monty recording is a bit of an eye opener as to the required intricacies of find identification and recording procedures.  Clearly in a database of this size with many individuals inputting data, accuracy and consistency are paramount.  Researchers and the like are searching against various entry fields and if the recorders are logging objects differently the search function will not work accurately.  Following a day’s training from Laura I came to realise the conveyor belt of find identification moves at a speed required to ensure standards are upheld; moreover the standards are high and as you would expect professional.  If you don’t know the exact Parish you don’t guess you find out. If you can’t remember the exact object reference, then being vague will not do.  There is also a comprehensive guideline document that is issued to ensure overriding principles are followed and a ‘controlled vocabulary’ is published on the database itself as a reference to help ensure consistency of language.  The job done by our FLOs is sometimes unsung, seeing what’s involved makes you realise that what they do, for many if not all, is a vocation rather than a job, done for love not money – well done guys –much respect.

The biggest challenge I have found in self-recording on the database is the proper description of the object.  The guidelines are there to be followed but people express themselves in different ways.  There is obviously some latitude but the aim is to make the description self-contained.  Having produced a masterpiece of language without disappearing up your own dangling participle, the acid test is that given only the finished written description, could the find be fully understood and interpreted by the reader – it’s tougher to achieve than you think, especially when your spectacle buckles sound like bifocals!

With the two ends of the self-recording spectrum and indeed with all points in between, being able to personally contribute to a national database of this standing is unparalleled.  For me there is a satisfaction and pride to be taken in helping to see the whole process through from beginning to end, finishing in the knowledge that prior to your actions an historic prospective-find that was degrading somewhere in a field, and that perhaps in a few years time there would be nothing more than a stain left in the ground to betray its former presence.  Along you came, researched the location avoiding any archaeological sensitivity, land designations or schemes, found out who owned the land, gained their precious permission and that of the tenant farmer, spent hours searching and finally made the find.  It seems only fitting then to want to participate in completing the job by fully identifying and recording it.  So instead of just an anonymous stain in the ground, a historical object is saved from that oblivion, correctly identified, and has the best possible context restored to it for all to see, access, enjoy, and draw conclusions from.

At the time of writing there are over 700 PUBLIC recorded finds on the database, a number which is growing all the time (you can only view the ones that are promoted to public view – for example this search result.  If you would like to become a self-recorder at whatever level then do have a chat with your FLO and come to your own mutually agreeable arrangement as to how to record your finds.  If training is required they will I know be happy to oblige.

Adding old OS maps to findspot maps

Today on Twitter, David Haskiya alerted me to a set of old Ordnance Survey maps that have been scanned by the National Library of Scotland and turned into the  ‘NLS Maps API: Historic map of Great Britain for use in mashups’. These old maps are really useful (they cover England and Wales as well as Scotland!) for the work that our Finds Liaison Officers do, or for researchers using our database. Low level phenomenological research can be conducted.  Their instructions are pretty straightforward to follow and I have now added this layer to our findspot mapping (at the moment just for higher level users). The image below gives an example of the embedded Googlemap that we can produce from these OS tiles:

Old Map from NLS

Our maps now have the following layers:

  • Satellite
  • Terrain
  • Openstreetmap
  • Google Earth
  • Basic map
  • Hybrid
  • Historical

To implement this layer all you need to do is the following (I have Jquery as my javascript framework), firstly add the Javascript file that runs their tileserver to either your head tags or before the closing body tag of your HTML document.

[crayon-519c880ed1935/]

Then you need to initiate the layer and add the historical map button and copyright layer:

[crayon-519c880ed1945/]

You will then need to add the map type to your mapping script by adding the following javascript:

[crayon-519c880ed1952/]

So for example my code for running our map looks like the below (and I add this before the closing body tags, and using Zend Framework’s inlineScript syntax within my PHP script:

[crayon-519c880ed195e/]

So really simple to integrate and get running on your site.

Another milestone reached

On the 26th July 2010, the Scheme recorded the 400,000 record on the database; another Roman coins, this time a nummus of the House of Constantine. We had an internal challenge, with the Deputy Head down to buy the person who recorded this object, a bottle of sparkling wine. The landmark object is show below and was recorded by Tom Brindle, our acting FLO for Staffordshire and the West Midlands.

WMID-D6D183PAS record number: WMID-D6D183
Object type: Coin
Broadperiod: Roman
County of discovery: Shropshire
Stable url: http://www.finds.org.uk/database/artefacts/record/id/400298

Several FLOs expressed dismay, that the object was a Roman coin and a metal detector find, I think they were hoping for a lithic or something else found by a fieldwalker for a change… However, coins and metal detectorists are the best represented on our database….

Records Finds recorded Year of recording
3476 4588 1998
6128 8201 1999
11323 18106 2000
11481 16368 2001
8164 11996 2002
14657 21684 2003
26383 39000 2004
33919 52202 2005
37502 58311 2006
49308 79052 2007
37455 56449 2008
39981 66481 2009
112893 190091 2010

You might wonder why these figures don’t always match the Annual Reports; well, the database is constantly being worked on, errors corrected, finds removed if duplicate records  and so on. There’s some blips in the figures being recorded – 2002 for example being foot and mouth hit, in 2003 the Scheme went National and we phased in our new database and in March 2010 we imported 2 large datasets from IARCW and CCI (and you might have heard about 52,503 coins found in Somerset – only 1 record of those though – April). However the 2010 figures are encouraging when you look at the statistics for recording since we went live with our new database (shown below with a comparison to 2009, same period).

Statistics for 2009
Records Objects Month
3638 4395 1
2694 5410 2
2842 3414 3
3191 6284 4
3768 5229 5
3307 4429 6
3152 3819 7
Statistics for 2010
Records Objects Month
4290 12274 1
3509 5526 2
88596 90380 3
4191 57775 4
3957 5255 5
4490 14518 6
3860 4363 7

Using our data to place a google map on your own site (without the api)

This post is just a short overview of how you can get our data onto your website without being uber-geeky and knowing how to play with our Applications Programming Interface (API – more on this over the next month or so.)  The Scheme’s website can now serve up various different flavours of content by means of context switching. You can now get:

  1. RSS
  2. ATOM
  3. XML (finds lists and searches are returned in MIDAS format, other pages just plain XML responses)
  4. JSON
  5. KML
  6. CSV

To find out what versions of the content you can retrieve for a page is pretty simple. If you scroll towards the foot of any page on our website, look for the text:

This page is available in: {contexts available} representations.

This makes use of the Zend Framework context switch parameter -format. So any URL that has an alternative representation just needs appending format/{context}. So for example, you want to view all finds for Essex in ATOM format you would call this url:

http://www.finds.org.uk/database/search/results/county/ESSEX/format/atom

You can now use this output within your own site using simple software tools such as widgets, simplepie etc. However, what is probably of more interest to many people is getting a map of objects found locally to them. So for example, you run a parish council and you want all objects found in the district. Let’s try my home district of South Cambridgeshire. If you go to our advanced search facility and scroll to the bottom and choose county as Cambridgeshire and district as South Cambridgeshire, then submit the form and wait a second for the search to complete.

Now that the results are there, look at the page foot for the representations available and you’ll see the letters KML. If you click on this, you can now get data in the format that can be used in many online mapping programmes and Google Earth. So if you want to see this on the map, copy the URL generated; in this case:

http://www.finds.org.uk/database/search/results/county/CAMBRIDGESHIRE/district/SOUTH+CAMBRIDGESHIRE/format/kml

Now head over to http://maps.google.com.

In the search bar, paste the URL that you copied and press search.

Google maps search bar with url pasted in

The map should now change to show pins for degraded findspot locations. These pins are only provided when the ‘to be known as’ field has not been filled in and the actual points are taken from the 1km grid reference (4 figure). So the map should now render like the below image:

Google map generated from the KML

Now you have generated this map, you can grab either the link for the map and send directly to some one, or you can grab the HTML code to embed the map into a webpage. Look in the top corner of the map for the control labelled embed and click this; you then get the layer appearing which looks like the image below:

Link box from google

As this post deals with embedding the map on your own webpage, it is assumed that you can enter raw HTML directly. Copy the text which is contained in the box labelled “Paste HTML to embed in website”. This looks like:
[sourcecode]
<iframe width="425" height="350" frameborder="0" scrolling="no" marginheight="0" marginwidth="0"
src="http://maps.google.com/maps?f=q&amp;source=s_q&amp;hl=en&amp;geocode=&amp;q=http:%2F%2Fwww.finds.org.uk%2Fdatabase%2Fsearch%2Fresults%2Fcounty%2FCAMBRIDGESHIRE%2Fdistrict%2FSOUTH%2BCAMBRIDGESHIRE%2Fformat%2Fkml&amp;sll=37.0625,-95.677068&amp;sspn=47.885545,114.169922&amp;IE=UTF8&amp;ll=52.257917,-0.000189&amp;spn=0.72983,0.782087&amp;iwloc=lyrftr:kml:cF4oaez0SXhtHIuPUpXMoJUR9uPk2SiORITteHHHGjS0fvow5su0kSjIVdHy4TwDOfCcxM4bseHHEGTe2fPgy5si2VKsJEzIMAg,gf42aba810981b24d,52.065156,0.171661,0,-32&amp;output=embed"></iframe>
<br /><small>
<a href="http://maps.google.com/maps?f=q&amp;source=embed&amp;hl=en&amp;geocode=&amp;q=http:%2F%2Fwww.finds.org.uk%2Fdatabase%2Fsearch%2Fresults%2Fcounty%2FCAMBRIDGESHIRE%2Fdistrict%2FSOUTH%2BCAMBRIDGESHIRE%2Fformat%2Fkml&amp;sll=37.0625,-95.677068&amp;sspn=47.885545,114.169922&amp;IE=UTF8&amp;ll=52.257917,-0.000189&amp;spn=0.72983,0.782087&amp;iwloc=lyrftr:kml:cF4oaez0SXhtHIuPUpXMoJUR9uPk2SiORITteHHHGjS0fvow5su0kSjIVdHy4TwDOfCcxM4bseHHEGTe2fPgy5si2VKsJEzIMAg,gf42aba810981b24d,52.065156,0.171661,0,-32" style="color:#0000FF;text-align:left">View Larger Map</a>
</small>
[/sourcecode]

Then once you have pasted this code into your webpage, saved it and if you aren’t using a content management system, upload it to your website and then the map will be embedded as shown below:

View Larger Map

In the infowindow bubbles that come up when you click on a findspot location, you will see this text:

This findspot has been produced from the 4 figure reference. It is not the precise findspot.

As mentioned above, due to findspot security/ landowner privacy, and an agreement we have with the major body that gives us artefact spatial information, we cannot publish co-ordinates publicly at a precision greater than parish or 1km square (4 figure grid reference) and we also hold back from view finds that have had the “to be known as” field. Therefore, the map you get from this is not 100% accurate! This is not something we can change.

A couple of weeks ago, we sent a mailshot out to all MPs for England and Wales, detailing how they could get finds for their constituency onto their own webpages. This is done in exactly the same way as the above and constituency finds feeds can be obtained from the news section of the website under (and powered by YQL calls of the theyworkforyou API):

http://finds.org.uk/news/theyworkforyou/constituencies

Two examples with finds in their constituencies are the coalition leaders (the Roman coin hoard from Frome announced on the 8th July, had a colalition type coin inside). David Cameron’s constituency of Witney shows this map:

And Nick Clegg’s Sheffield Hallam constituency shows this map:

Once geoRSS is enabled and working properly, you can also do the above using any of the feeds for finds where the context switch called is ATOM. This will be done by the middle of next week, alongside ATOM paging.

Ordnance Survey 1:50 000 Gazetteer import and reuse

OS opendata logoOn April Fool’s day, the Ordnance Survey opened up its data for people to reuse with less restrictions applied. At the heart of everything we do, place perhap the most important. The Scheme uses National Grid References and place names that you would find on an OS map. The things that these maps depict, often inform where people discover objects; they represent habitation in a past and present form, sometimes concurrently, sometimes from anitiquity. The last word in that sentence is the key to why I wanted to use the 1:50 000 data in our web application (our database). Two categories of place are defined within this dataset:

  • Roman Antiquity – of which there are 237 instances
  • Antiquity – of which there are 5252 instances

If you download the 1:50k dataset from the Ordnance Survey or from the mysociety cache (remember they are charity, so don’t abuse their servers), there is a document that outlines what the fields mean in the dataset. The important one here was the f_code or feature code column. The data is available via SPARQL (see Leigh Dodd’s article on this), but I wanted to keep a local copy of this data on my server so that I could use it and transform it for some other tasks. After downloading and unzipping onto our server (placing it into the /tmp folder will save you getting error 13 codes with the mysqlimport later), I then created the following MySQL table:

[crayon-519c880edf83d/]

I then needed to import this data, which I accomplished using mysqlimport command in my terminal as below (I use putty at work and OSX terminal at home). Note that I renamed the gazetteer data to the same table name as the mysql table and retained the txt extension. Fields are delimited by a colon and there is no header row.

[crayon-519c880edf84d/]

The import should run through and insert 259080 rows of data. Even though I am only interested in the antiquity type fields, I have imported the lot in case I want the rest later. Now I have the data installed, I can manipulate it and use it in the way that I want. If you know your grid references, then it is apparent that the data presents at just 1KM square resolution; this is the maximum precision level to which we publicly display our find spots, so it will tie in quite nicely to the public display of information.

However, I wanted decimal degrees for the latitude and longitude within my table. I therefore inserted two new rows into the MySQL table – latitude and longitude (DOUBLE) and then used a php function to convert the 1km square grid reference into lat/lon values. I’ve also done some further manipulations to get the imprecise centred 1KM grid reference WOEID to get enhanced geographical data.

Now the grid reference has been converted into decimal values, I can now plot these quickly onto a Google Map or use mathematical formulae to get distance from a point; for example the Haversine.

d = R \, {haversin}^{-1}(h) = 2 R \arcsin\left(\sqrt{h}\,\right)

There’s s some very good discussion and code available on these resources, so there’s no point reinventing the wheel:

  • Haversine formula can be obtained from various sources, and this page lists 9 scripting language variants.
  • Vincenty formula can be found on the Movable Type script pages (via @codepo8).
  • More detailed explanation can be found at the Seventh Sense blog

For the Scheme’s database, I wanted to work out if objects were close to an antiquity or Roman antiquity or whether a MP’s constituency or district has them within their bounding box (as found from querying theyworkforyou’s api – more on that later.) As I use Zend Framework, I created a model and then used a view helper to render data onto a finds record. If you’re interested in my code for this, you’re welcome to have it….this one uses the Haversine and is just the model that generates a MySQL query against my database table.

[crayon-519c880edf85c/]

Below is a record of an object from the controversial Water Newton rally, which was near the site of the Roman town of Durobrivae, near Chesterton in Cambridgeshire [WOEID: 39263, 1KM NGR - TL1295, Lat: 52.561508 Lon: -0.364190].

Alert for Durobrivae

You’ll see that there are two Scheduled Monument Alerts – there are actually 5 entries in the National Monuments Record for this SAM – and there is 1:50k OS alert for a Roman antiquity. These SAM alerts aren’t shown to users below ‘research’ level. I can then click through to find all records associated within a certain distance of this OS point, and map them if I have at least ‘Research’ user rights on our database. In future, I will try and link these place names through to other linked data resources. By tying them to a WOEID, I can find archaeological photos on Flickr for example.

I don’t think that this breaches the OS licence and there are probably other ways to accomplish this in php, I just dabble in code, so don’t rip me to shreds….

Access levels and what you can view

Following our Portable Antiquities Advisory Group meeting, I was asked what levels of detail people are privy to on the Scheme’s database. The below outlines what these account levels can do/see and what geo information is displayed.

Public user – not logged in

The public user level is the most basic of all our levels of access. This gives you access to:

  • Finds awaiting validation (denoted by the yellow flag)
  • Finds that have been validated and published by our finds advisers (denoted by green flag)
  • Low level mapping
    • no dots on maps
    • findspot to 1km grid square level and slight obfuscation of findspot by randomised subtraction/addition of 10ths of a degree to the degraded findspot
    • limited zoom level.
  • No access to personal data
  • Can add comments but has to fill in reCaptcha

Registered user – most basic level of login

  • Finds awaiting validation (denoted by the yellow flag)
  • Finds that have been validated and published by our finds advisers (denoted by green flag)
  • They can create their own records of their objects and get full mapping capabilities for only these objects which is enhanced over the below low level grade map.
  • Low level mapping
    • no dots on maps,
    • findspot to 1km grid square level and slight obfuscation of findspot by randomised subtraction/addition of 10ths of a degree to the degraded findspot,
    • limited zoom level.
  • Cannot see maps or retrieve finds by parish for any record with the findspot form’s “to be known as” field completed
  • No access to personal data
  • Can add comments without having to fill in reCaptchas
  • Can add/edit their own records
  • Can save searches

Researchers

  • Finds awaiting validation (denoted by the yellow flag)
  • Finds that have been validated and published by our finds advisers (denoted by green flag)
  • Cannot view finds that are still in progress (quarantine/review)
  • As above, they can create their own records of their objects and get full mapping capabilities forthese objects.
  • High level mapping
    • findspot plotted with a dot on the map
    • full precision for findspot
    • Flickr shapefile outline for parishes
    • Access to Scheduled Ancient Monument proximity search
    • Full zoom capabilities
  • No access to personal data
  • Can add/edit their own records
  • Enhanced spreadsheet downloads

Historic Environment Officers

  • Finds awaiting validation (denoted by the yellow flag)
  • Finds that have been validated and published by our finds advisers (denoted by green flag)
  • Cannot view finds that are still in progress (quarantine/review)
  • As above, they can create their own records of their objects and get full mapping capabilities forthese objects.
  • High level mapping
    • findspot plotted with a dot on the map
    • full precision for findspot
    • Flickr shapefile outline for parishes
    • Access to Scheduled Ancient Monument proximity search
    • Full zoom capabilities
  • No access to personal data
  • Can add/edit own records
  • Enhanced spreadsheet download
  • Special download of csv for import into exeGesis HBSMR (if you don’t know what that is, don’t worry!)

Treasure & Finds Liaison Officers

  • Finds in quarantine – records that need more data (reminds them to do so!)
  • Finds on review – current working versions
  • Finds awaiting validation (denoted by the yellow flag)
  • Finds that have been validated and published by our finds advisers (denoted by green flag)
  • As above, they can create their own records of their objects and get full mapping capabilities forthese objects.
  • High level mapping
    • findspot plotted with a dot on the map
    • full precision for findspot
    • Flickr shapefile outline for parishes
    • Access to Scheduled Ancient Monument proximity search
    • Full zoom capabilities
  • Full access to personal data
  • Can add/edit own records
  • Can edit records made by member, HERO and research users
  • Can edit any records they made when working in other counties
  • Can edit records made by anyone at their institution
  • Enhanced spreadsheet download
  • Special download of csv for import into exeGesis HBSMR

Find Advisers

  • Finds in quarantine – records that need more data (reminds them to do so!)
  • Finds on review – current working versions
  • Finds awaiting validation (denoted by the yellow flag)
  • Finds that have been validated and published by our finds advisers (denoted by green flag)
  • As above, they can create their own records of their objects and get full mapping capabilities forthese objects.
  • High level mapping
    • findspot plotted with a dot on the map
    • full precision for findspot
    • Flickr shapefile outline for parishes
    • Access to Scheduled Ancient Monument proximity search
    • Full zoom capabilities
  • Full access to personal data
  • Can add/edit own records
  • Can edit any records created by any user and can publish finds
  • Enhanced spreadsheet download
  • Special download of csv for import into exeGesis HBSMR

Admin

Now that would be telling.

Adding records to our database as a registered member

The Scheme’s database has changed significantly since it went live in its original format in 1999. It is now possible for all users to add and edit their own finds (descriptive, spatial, numismatic, reference and visual details) and add to this country’s archaeological record of public discovery. To add your own ‘finds’ to our database is relatively straight forward and this post outlines how to do this. As with a few other features of this site, you need to get a few things in place before it works properly for you!

So how do you record?

  1. Register for a user account on our site (or if you already have an account and haven’t logged in since 21st March 2010, reset your password). We need to have you registered for auditing changes and notifications etc. Your personal details won’t be sold or divulged to evil marketing companies or anyone else who hasn’t sighed up to our T&C.
  2. Contact your local Finds Liaison Officer and talk to them about self-recording your objects (we have a strict vocabulary for data entry and there’s some things you might like explained before proceeding). We have to use strict terminology to ensure that things can be found easily and that we can interoperate with other people’s databases.
  3. You can only record your own finds as we can’t divulge other people’s details under the Data Protection Act (sorry!) Once we link your personal details to your account, you can see your own records easily and your name gets appended to records created by you automatically.
  4. If you have a Treasure object, we would rather that this is reported directly to the FLO for recording so that all the steps needed to dispense the law are followed and no confusion arises (sorry!)
  5. Once you have spoken to your FLO, you can happily record away! So keep reading.

Adding a find’s basic information

Find form interface screen capture

  1. Once you have logged in, look for the button labeled “Add a new object (or artefact in some places)” on either your home screen or on the artefact listers, click on this.
  2. Now you can fill in the data for your find.  Many fields are strictly controlled by driven vocabulary – for example, object type auto-completes and others are select driven drop-downs. Most are pretty obvious! Just follow the labels to the left of each form control.
  3. Out of all the fields, the only compulsory ones that you must enter are object type and broadperiod, therefore you can start records and return to them. However, we want really complete records with as much information as you can give (you can edit later of course).
  4. Once you have filled in your form, press submit and you will be taken to your record and you can now add extra bits. We haven’t adopted multi page forms as you might not have all details at hand and we’re trying to make it all very simple…
  5. We also use FCKEditor and HtmlPurifier to ensure valid HTML in the data that you enter. We’ll strip out a wide variety of tags generated by word if you paste from there and also remove curly quotes etc. If you are interested (which probably you aren’t) we store your text in UTF-8.

Adding numismatic data

Screen capture of numismatic interface

  1. You now have a choice of which bits of data you want to add to the record (if you have entered a coin, you can add numismatic data) and for this example we’re assuming you are entering a Roman coin. So to add numismatic data, look for the link entitled “add numismatic data”. Click on this.
  2. This step is driven by logic determined by the denomination type you have. If you choose a denomination, we set in motion a series of cascaded or linked dropdowns.
  3. Once you have chosen a denomination, then choose a ruler from the list that is generated (you can’t enter a ruler that doesn’t exist for a denomination type!)
  4. After a ruler has been chosen, the cascade sets in motion again and configures the mint, moneyer (only available for Republican coins), reverse type (only available for 4th Century coins) and Reece period. Choose the correct option for your coin if you can fill it in. If not leave blank.
  5. Enter any information for reverse/obverse inscription/description
  6. Choose die axis measurement and status options
  7. Now save your data and you will return to the record you have created.

Adding spatial data

Findspot data capture form

Provenance is vital for the study of stray archaeological finds. The majority of objects we record will have little or no archaeological context and are found in the plough soil, but their spatial co-ordinates may well tell you more about the area’s archaeology. By providing the Scheme with higher degrees of precision for your findspots, the better the research academics and lay researchers can do from these data. The form for recording the spatial data is again pretty straightforward and you have the option to hide sections from public view (comments, address, postcode, all co-ordinates). The below outlines how to enter the spatial information for a findspot (all finds can only have one!)

  1. All objects are attached to a named place. We use the Ordnance Survey’s place name data, so we have an  array of data to choose from (Euro-region, County, District, Parish). These place name drop-down lists are also cascaded, so start by choosing your county and then follow the dropdowns choices as presented.
  2. To hide the data entered in step 4 from the public, you can enter a pseudonym in the “known as” box (be sensible about it :) )
  3. If you have an address and postcode for the findspot, please fill these in (these never get displayed to the public or research user).
  4. Now we need to get the co-ordinates for the findspot. If you don’t have a provenance for the find, we’d rather it wasn’t recorded as it doesn’t add to our useful archaeological record. If possible, record to a higher precision than 4 figure grid reference (which is better than 1km square precision) – this is the maximum level we’ll publish data online to the public user. We also use the National Grid reference system to place our objects onto a map and this is transformed into the following after saving:
    1. Easting
    2. Northing
    3. 4 figure grid reference
    4. Latitude and longitude pair
    5. Elevation on landscape
    6. 1:25K map
    7. 1:10K map
    8. A Yahoo! Where on Earth ID or WOEID for cross referencing against their database (you can see this in action via the adjacent places displayed on the findspot section) and other services that may use their identifier system.
  5. After filling in this section you can tell us about the landuse types and any comments or descriptions needed about the findspot.
  6. Now save your data!
  7. This returns you to the record, where you will now see a map of your findspot and the data that you have entered. You’ll also see any data we’ve managed to retrieve on that area from Yahoo! – a Flickr shapefile for the parish (if available), adjacent places (from the geoplanet database) and postcode etc. More is planned for this section, news on that later!

Adding an image or images

A screen capture of the image interface

A visual record of the object is really important for research of the object (for many researchers it is often more important than the findspot!) Adding an image is quite straightforward and we add one at a time to make it more simple. We do suggest naming your files sensibly, avoiding non-alphanumeric characters and removing spaces (replace with an underscore, hyphen or camel case the  filename).

To add an image do the following:

  1. Look for the add an image link
  2. After clicking on this, you’ll see a form with several fields. Click on the ‘choose’  button to find your image that you want to attach (must be under 6MB and we would rather that you uploaded a high resolution JPEG or TIFF image.) If your filename already exists, we’ll tell you and likewise if it is an invalid filetype.
  3. When you get to the image label box, refer to the image labeling document produced by our Finds Advisers for the correct methodology that we want to adhere to.
  4. Choose your county, copyright, period and image type (your default copyright can be set from the edit account link under your home area. Set it and then logout and back in so that the session picks up the default.)
  5. Then submit the new image
  6. If everything works okay, you’ll then get redirected back to the record and you can add a new image if needed
  7. This process generates:
    1. Thumbnail
    2. Small derivative
    3. Display derivative
    4. Medium derivative (used for the lightbox overlays)
    5. A Zoomify derivative
    6. Original image
  8. All users can download the original image – we share everything on this site!

Taking good quality images

This is really important for the record of the object, you can get some good advice on-line or by purchasing Ian Cartwright’s short guide entitled ‘Photographing detector finds’.  If you want good examples of images, have a look at our FLO for the Isle of Wight, Frank Basford’s records.

Basic desires from the Scheme for images are:

  • 300 dpi resolution (so images can be reproduced for academic publication)
  • good lighting
  • a white or black background
  • well focused with good depth of field
  • scale bar – you can get some good ones from here: http://www.vendian.org/mncharity/dir3/paper_rulers/

Images and text on this website are disseminated under a Creative Commons Non-Commercial Share-Alike licence and are used in a variety of media for enriching our knowledge of the past. You can opt out of this by choosing ‘all rights reserved’ under the image copyright dropdown, or by choosing this as default from your profile settings.

We’ll have some more information on photography and scanning of objects and coins in the coming weeks.

How do my records go public?

As we’re trialling public data entry, we’re currently keeping all public records hidden from view in the review stage (you’ll see a quarantine flag- black biohazard symbol ~ I didn’t like black flags ~ next to your record number when you look at your my finds list). If this feature gains popularity, we’ll extend the system so that you do the following:

  1. Enter all data
  2. Decide it is ready for checking by our staff and you choose to push your record to review
  3. The object record is peer reviewed by the FLO for the county of origin of the object
  4. They then decide that it can be seen and it then goes to validation
  5. Eventually a finds adviser might check it and it will go green for published.

You can only edit your records in the quarantine (a legacy phrase from our old system) and review stages, if you have more to add at a later date, you can ask any of the Scheme’s staff to return it.

I’m still struggling with the above!

If you still need help self-recording objects, you can go firstly to your FLO for help and more information on our recording philosophy or contact us at the Central office on info@finds.org.uk

First month of beta site webstats

The Scheme’s website has been running in β for just over  a month now has experienced no down time in that period. Attached to this post, is an analytics report for the period 24th March – 24th April and is provided just for reference. The stats aren’t that impressive yet in terms of critical mass for visitors (we’ve not publicised the new features yet as we’re catching all glitches); but there’s two figures which are quite good – visit length at 12 minutes 41 seconds and pages per visit at 16. The Scheme’s content is possibly classed as niche interest as well, but very academic and lay research driven, so we probably won’t ever get huge visitor figures.

Shortly, the Google analytics analysis module will go online and you will be able to get our stats at any time for our content, If you want to quote these anywhere, please feel free to do so.