About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Wednesday, January 25, 2012

Solr to Google Earth

This is a basic how-to you can probably find done better elsewhere, but since I didn't find all the bits in one place myself I thought I may as well put this up.

The task: show query results from Solr on a map or in Google Earth using the latitude/longitude data in there. Make the results update as you move around, because there may be too many to bring back all at once.
The technology: Solr, XSLT, KML, PHP, Apache web server

This is a pretty common sccenario and for basic needs and people that aren't already into full-blown mapping solutions this is going to be a better choice. That said, for the latter there are plenty of options and you may want to investigate, for instance, OSGeo/OSGeo4W.

I had a small amount of time to evaluate out some possibilities for a future project, so I needed something quick and familiar. I'd done some Solr-based mapping a couple of years back so I had some code to nick. So, having turned some OSGB36 data into WGS84 lat/longs (thanks for the help, @portableant!) I got it into a Solr index, which I'm not going to go into here except to say that I used the "tdouble" datatype because a trie field seems like a good idea for efficient searching, and you need something that can cope with all those floating points. I believe there are proper geo data types but I'm ashamed to say I've not even bothered looking at them, I think with them you could do fancier proximity search and the like but basic is fine for me. So here's are the relevant bits from the schema.xml:

<types>
...
<fieldtype name="tdouble" omitnorms="true" class="solr.TrieDoubleField" positionincrementgap="0" precisionstep="8">
...
</types>
<fields>
...
<field name="latitude" indexed="true" type="tdouble" stored="true">
<field name="longitude" indexed="true" type="tdouble" stored="true">
...
</fields>

With the index done and queries working it was a matter of getting some KML out. Solr will transform XML on the fly so converting its XML output into KML is really not hard. Plus I already had a transform to cannibalise. It goes something like this:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" space="preserve">
<xsl:output type="text/xml; charset=UTF-8">
<xsl:preserve-space elements="*">
<xsl:template match="/">
<kml xmlns="http://www.opengis.net/kml/2.2">
<document>
<name>Results</name>
<xsl:apply-templates select="//doc[double/@name='longitude']">
</document>
</kml>
</xsl:template>
<xsl:template match="doc">
<placemark id="{str[@name='id']}">
<description>
<xsl:text escaping="yes">
<![CDATA[ <![CDATA[ ]]></xsl:text>
<p>
<xsl:value-of select="str[@name='title']">
</p>
<xsl:value-of select="']]>'" disable-output-escaping="yes" />
</description>
<name><xsl:value-of select="str[@name='title']"></name>
<point>
<coordinates><xsl:value-of select="double[@name='longitude']">,<xsl:value-of select="double[@name='latitude']">,0</coordinates>
</point>
</placemark>
</xsl:template>
</xsl:stylesheet>

Put your own preferred fields in here, of course. Some notes: the CDATA bits are because you need to put HTML in your "description" element into CDATA, but outputting this with XSLT takes a little lateral thinking because it needs to see your CDATA declaration as....CDATA. Hence this structure (and the bit for closing the section with "]]>"). Secondly, note that I start by selecting the "doc" elements that Solr returns but filtered for the presence of "longitude", since we only want to show things that have a point. Actually you may do the filtering in the Solr query instead (or a filter query). As we'll get to later on, in fact. Final note: your fields will obviously be different and you'll want to put something interesting into the "description" element, which is what pops up in balloons on Google Maps and the like.
Getting KML out like this is just fine for showing this stuff on Google Maps, but this wasn't working for me on Google Earth. The reason is that GE wants the correct content type, whereas GMaps, OpenLayers etc don't care. GE isn't bothered by the file extension AFAIK, but headers? Yes. The other thing I wanted to do was create a network link, which is a means by which a request for KML can be updated to restrict it to a geographical area (a bounding box). With a network link you can specify how the north, east, south and west limits are expressed, which makes it pretty easy to slot those values into a Solr URL. However because of the content type issue this wasn't going to work.
So, here's a PHP file that proxies the Solr query and spits it out with the right content type:

<?php
/*
Google Earth wants text/plain or application/vnd.google-earth.kml+xml so this script is a proxy that will pass on all parameters to solr and return the results with the right headers
Set up a bunch of defaults like the number of points you want and the default bounding box, which is basically the whole world here (I think)
*/
$rows=100;
$start=0;

if(is_numeric($_GET["start"])){
$start=$_GET["start"];
}
if(is_numeric($_GET["rows"])){
$rows=$_GET["rows"];
}
$lat0=-90;
if(isset($_GET["lat0"])){
$lat0=$_GET["lat0"];
}
$lat1=90;
if(isset($_GET["lat1"])){
$lat1=$_GET["lat1"];
}
$lon0=-180;
if(isset($_GET["lon0"])){
$lon0=$_GET["lon0"];
}
$lon1=180;
if(isset($_GET["lon1"])){
$lon1=$_GET["lon1"];
}
if(isset($_GET["q"])){
$text="text:".$_GET["q"]."+AND+";
}

$solrbaseurl = "http://localhost:8080/solr/myindex/select/?";
$url=$solrbaseurl."q=".$text."longitude:[$lon0+TO+$lon1]+AND+latitude:[$lat0+TO+$lat1]&wt=xslt&tr=kml.xsl&start=".$start."&rows=".$rows;
$s = file_get_contents($url);
header('Content-Type: application/vnd.google-earth.kml+xml');
echo $s;
?>

[CAUTION: see below for a note on web server configuration for another necessary step.]
This script was written with a network link in mind, so I decided to pass in latitide and longitude start and finish values using the preferred parameters for this, but it's up to you how you do it (see the docs here but they don't make it clear that you can pick your own format for the bounding bpox parameters). I take them (lat0, lat1, lon0, lon1) and put them into the Solr query so that you end up with something like:
q=text:church+AND+longitude:[-1.5+TO+1.5]+AND+latitude:[48.5+TO+51.5]
(the "text:church+AND+" part is only put in if a query term was also specified)
So that gets you a set of results within a bounding box. So if you can update this with a new bounding box every time the user's viewport changes it's pretty useful. So the next thing is the network link itself. It's another KML file which again I needed to make on the fly so I could call it from a form, and, of course, I had to put it out with the right headers so here's another PHP script:

<?php
//this script is a proxy to create a KML file for a network link
header('Content-Type: application/vnd.google-earth.kml+xml');
print("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<NetworkLink>
<name/>
<visibility>0</visibility>
<open>0</open>
<description>A network link to some results</description>
<refreshVisibility>0</refreshVisibility>
<flyToView>0</flyToView>
<Link>
<href>http://localhost/myapp/solrKmlLatlongs.php?q=<?php echo $_GET["q"];?></href>
<refreshInterval>2</refreshInterval>
<viewRefreshMode>onStop</viewRefreshMode>
<viewRefreshTime<1</viewRefreshTime>
<viewFormat<lat0=[bboxSouth]&lat1=[bboxNorth]&lon0=[bboxWest]&lon1=[bboxEast]</viewFormat>
</Link>
</NetworkLink>
</kml>

The "Link" element in that KML points at the other file, proxying Solr (in this case located at http://localhost/myapp/solrKmlLatlongs.php)
"viewFormat" is the part where you can specify how the bounding box parameters are to be passed into your KML-emitting script. There's other stuff you can look up for yourself.
Basically, if you call this script with your solr query it will chuck out the KML with the network link, which you can view in GE (it will be switched off by default). Then whenever you zoom in or move around the query will refresh with a new bounding box. For me, with a pretty big dataset that you can't really load all at once, I may start with a text query that covers the whole of the UK (no bounding box parameters) but, being limited to, say, 100 results, these may be scattered all over the map. As you zoom in and shift round, it will load more and more results according to your current view.
The final thing to note is that it's not probably enough to set the content type in PHP. In my case I needed to add a couple of lines to my Apache config (httpd.conf) so it knew what to do:

AddType application/vnd.google-earth.kml+xml .kml
AddType application/vnd.google-earth.kmz .kmz

I hope this helps