The Doofer Call: April 2009

Thursday, April 30, 2009

NMM, YQL, COBOAT, CODS

Jim O'Donnell organised a talk on Tuesday at the National Maritime Museum from Christian Heilmann of Yahoo! Mia wrote up her notes already and I've not got much to add, but it was a very enjoyable presentation, and when he reached the juicy bit about YQL and BOSS, both of which I'd left for another day's exploration, I learned a lot. Clearly there's a lot of potential there (especially now it's augmented by YQL Execute, announced yesterday), and it looks like it will let you do a bunch of things that Pipes can't do, or is a pain to do (the GUI is great and yet infuriating with Pipes). YQL gives a common API meta-interface (I guess that's the word) for loads of other APIs and for things with no API; it also handles all the crap with authentication, tokens etc; and it will act as the gatekeeper for your API so you don't get hammered by unreasonable numbers of requests.

As with similar tools/services (Pipes, Dapper, dbpedia, and various things nearer the surface like GMaps), YQL is clearly a blessing from both ends of the telescope: we get to use it for its intended purpose - to be "select * from Internet" is the grandiose ambition - knitting together data sources from Yahoo! and beyond; and we also get to offer our data in a developer-friendly way to encourage its reuse by creating OpenTables [note that these are purely a machine-friendly description of how to access data: no data is handed over as such]. Jim has already been busy creating Open Tables and experimenting with YQL.

Following the talk we headed for a pint (and one of themost jaw-dropping jokes I've heard, from Chris), and it was good to talk to Tristan from Cogapp. When I stopped raving incoherently about the marvel that is Solr (yes, still in love even as I gradually find out more about it), Tristan cleared up some questions for me about Cogapp's COBOAT app. They recently open-sourced this (as far as possible), in the context of the Museum Data Exchange project with OCLC (see Gunter Waibel's recent post), where it plays the role of connecting various collections management systems to an OAI Gateway-in-a-box, OAICatMuseum (well seems like it's only used with TMS in the project, but the point of COBOAT is that it just makes life easier for mapping one data structure to another, and another CollMS would slot in just fine).

For me, both COBOAT and OAICatMuseum are of interest for the role they could play in our the revamped Collections Online Delivery System* we'll build this year, resources allowing (in other words, don't hold your breath. Mission critical, yeah, but worth paying for? I await the answer with interest). Integrating and re-mapping data sources, an OAI gateway, and sophisticated and fast search are key requirements, as is a good clean API, and taking these two applications along with Solr I feel like I may have identified candidates for achieving all of these aims. We're a long way from a decision, of course, at least on the architecture as a whole, but I have some tasty stuff to investigate, and I'm already well down the track in my tests of Solr.

Thanks again to Jim for arranging the talk. He's got another great guest coming up, hopefully I can make it to that one too.

*I'm resigned to this thing being called CODS but still hoping for something less, well, shit

Sunday, April 26, 2009

macro-blogging about micro-blogging

am on Twitter at last learning to be brief. Not easy.

Saturday, April 25, 2009

Catching up with Europeana v1.0 [pt.2]

[see part 1 for stuff about what I did before the kick-off meeting]

So April 2nd/3rd were the kick-off meeting for Europeana 1.0, the project to take the prototype that launched last November and develop it into a full service. There may have been glitches at the launch but at the meeting there was a tremendous feeling of optimism, sustained I suppose by the knowledge that those glitches were history, and by the strength of the vision that has matured in people's minds.

The meeting was about getting the various re-shuffled (and trimmed) work-groups organised, with their scope understood by their members and refined in some initial discussions before the proper work begins. There are tight dependencies going in all directions between the work-groups. My problem was, on reflection, a very encouraging one: it was difficult to decide which WG I should work with, since they nearly all now have some mention of APIs in their core tasks. Given that concern over APIs was the reason I got involved with Europeana, it's great to see how central a place they occupy in the plans for v1.0. Not surprising, perhaps, given the attitudes I've discovered since joining, but feeling more real now that they're boosted up the agenda. For those who worry (as I used to) that Europeana was all about a portal this shows that fear is groundless. Jill Cousins (the project's director) distilled the essence of Europeana's purpose as being an aggregator, distributor, catalyst, innovator and facilitator; the portal, whilst necessary, is but a small part of this vision.

In the end I elected to join WG3.3, which will develop the technical specs of the service, including APIs. Jill is also organising a group to work up the user requirements (to feed to WG3.3), which I'll participate in. I guess this will also help to co-ordinate all the other API-related activity, and I'm thrilled to see several great names on the list for that group, not least Fiona Romeo of the National Maritime Museum. Hi Fiona! I hope to see more from the UK museum tech community raising their hand to contribute to a project that's actually going to do something, but for now it's great to have this vote of confidence from the museum that puts many of us to shame for their attitude and their actions.

So we heard about the phasing of developments; about the "Danube" and "Rhine" releases planned for the next two years; about the flotilla of projects like EuropeanaLocal, ApeNet, Judaica, Biodiversity Heritage Library, and especially EuropeanaConnect (a monster of a project supplying some core semantic and multilingual technology, and content too); and about the sandbox environment that will in due course be opened up to developers to test out Europeana, share code and develop new ideas. Though we await more details, this last item is particularly exciting for people like me, who will have the chance to both play with the contents and perhaps contribute to the codebase of Europeana itself, whilst becoming part of a community of like-minded digi-culture heads.

Man, you know, I've got so much stuff in my notes about specific presentations and discussions but you don't want all that so here's the wrap. As you can tell I've come away feeling pretty positive about the shape it's all taking, but there are undoubtedly big challenges, in terms of achieving detailed aims in areas like semantic search and multilinguality, but also in ensuring the long-term viability of the service Europeana hopes to supply; nevertheless the plans are good and, crucially, there are big rewards even if some ambitions aren't realised.

Within the UK there are a number of large museums with great digital teams and programmes that are not yet part of Europeana. There are also, obviously, lots of smaller ones with arguably even more to gain from being in it, but they have more of a practical challenge to participation right now. But why is it that those big fish are not on board yet? Is it just too early for them, or are there major deterrents at work? I know that there are people out there, including friends of mine, who are sceptical of Europeana's chances of success and sometimes of its validity as an idea. The former is still fair enough I suppose, or at least the long-term prospects are hard to predict; the latter, though, still mystifies me. If we want cross-collection, cross-domain search - and other functionality - based on the structured content of large numbers of institutions, there's really no alternative to bringing the metadata (not the content) into one place. Google and the like are not adequate stand-ins, despite their undoubtable power and despite the future potential for enabling more passive means of aggregation by getting, say, Yahoo! to take content off the page with POSH of some sort (which certainly gets my vote, but again relies on agreed standards). Mike Ellis and Dan Zambonini, and I myself separately, have done experiments with this sort of scraping into a centralised index, turning the formal aggregation model around, and there's something in that approach, it's true. Federated search is no panacea given that it requires an API from each content holder and is inferior for a plethora of reasons. Both are good approaches in their own ways and for the right problem - as Mike often reminds us, we can do a lot with relatively little effort and needn't get fixated on delivering the perfect heavyweight system if quick and light is going to get us most of the way sooner and cheaper. But I can't help but detect some sort of submerged philosophical or attitudinal* objection to putting content into Europeana - a big, sophisticated, and (perhaps the greatest sin of all) a European service. I sense a paranoia that being part of it could somehow reduce our own control of our content or make us seem less clever by doing things we haven't done, even if we're otherwise agile clever web teams in big and influential museums. But the fact is that a single museum is by definition incapable to doing this, and if you believe in network effects, in the wisdom of crowds, in the virtues of having many answers to a question available in one place, then you need also to accept that your content and your museum should be part of that crowd, a node in that network, an answer amongst many. If your stuff is good, it will be found. Stay out of the crowd and you don't become more conspicuous, you become less so. Time will doubtless throw up other solutions to this challenge, but right now a platform for building countless cultural heritage applications on top of content from across Europe (and beyond?) looks pretty good to me. It's heavyweight, sure, but that's not innately bad.

If your heritage organisation is inside the EU but isn't part of Europeana, or if it's in it but you aren't part of the discussions that are helping to shape it, then get on board and get some influence!

Flippin' 'eck, I didn't really plan on a rant.

*is this a made up word?

Catching up with Europeana v1.0 [pt.1]

Last November, a prototype Europeana launched. Many (perhaps even both) of you will know that the results were mixed: the index itself was successful, at least given its proof-of-concept status, but personalisation features were not optimised and led rapidly a crash as the user sessions racked up. It seems that the solution to this was essentially configuration, but politics meant that more had to be seen to be done and so hardware was thrown at the problem. A couple of weeks later the site was back but under the radar and without the personalisation bit ("My Europeana"), and more recently this too has returned - go and have a play here.

Prototyping done, the bid was assembled to develop a full-blown service, "Europeana v1.0". This bid to the European Commission was successful and just before Easter a kick-off meeting was held at the Koninklijke Bibliotheek in the Hague to initiate the project. This is actually but one of a suite a of projects under the EDLFoundation umbrella, all working in the same direction, but I guess you could say it's the one responsible for tying them together.

So how is Europeana shaping up now? Having spent three days finding out I can tell you now that I came back feeling good - and not just because I was heading straight off again on holiday. Day 1 was about travel and (obviously) a long and lovely trip to the Mauritshaus, but it ended with an hour in the company of Sjoerd Siebinga, lead developer on the project, and a session with Jill Cousins, Europeana's director. I went to see Sjoerd because I wanted to find out how Europeana's technical solution would fit with our plans at the Museum of London for a root-and-branch overhaul of our collections online delivery system. I knew that they'd be opening the source code up later this year, and I also knew that in essence what Europeana does is a superset of what we want to do, so I figured, find out if there'll be a good fit and whether there are things I could start to use or plan for now. Laughably, I thought that we might actually be able to help out by testing and developing the code further in a different environment - as if they needed me! I'll save this for another post, but in short Sjoerd took me on a tour of what they use as the core of the system (Solr) and blew me away. There are layers that they have built/will build above and below Solr that make Europeana what it is and may also prove helpful to us, but straight out of the box Solr is, quite simply, the bollocks. I've known of it for ages, but until given a tour of it didn't really grasp how it would work for us. Many, many thanks to Sjoerd for that.

Next I met with Jill for an interview for my research on digital sustainability in museums, where we dug into the roots of Europeana, its vision, key challenges, and of course sustainability (especially in terms of financial and political support). This was fascinating and revealing and added a lot to my understanding of the context of the project's birth and its fit in the historical landsacpe of EC-funded initiatives in digital/digitised cultural heritage. As a research exercise it was a test of my ability to work as an embedded researcher; one who is not just observing the processes of the project but contributing and arguing and necessarily developing opinions of his own. I really don't know how well I did in this regard - I'm not sure how often my attempts to be probing may in fact be leading, or whether my concerns with the project distort the approach I take in interviewing. Equally I don't know if this matters. A debate to expand upon another time, perhaps.

Days 2 and 3 were the kick-off meeting, and I'll put that in another post.

Thursday, April 16, 2009

Museums and digital sustainability: the other meaning

Well, this is not an angle I'd considered before in my research (the sustainability of digital resources in museums, since you ask), but I guess it's interpretation of the problem:
Pirate Bay server becomes museum artefact
Whether the Swedish National Museum of Science and Technology will be sustaining the file-sharing service is another matter.
I guess, joking aside, that really does highlight the key difference between (my definitions of) sustaining and preserving: the latter is about keeping stuff in existence, the former about fulfilling their purpose.

About Me