- Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger
Thursday, March 27, 2008
CR has just blogged about this topic again, in the run-up to a JISC workshop that I'll unfortunately miss next week.
Wednesday, March 26, 2008
Thursday, March 20, 2008
Gollito and Paskowski in fullest effect. Up until recently I could imagine more ridiculous moves than were actually being pulled off by guys like this, but now, well, my imagination would be very stretched to exceed this!
Wednesday, March 19, 2008
SearchMe beta, a search engine which shows a visual results (images of the web pages) categorised (as e.g. museum, art, shopping, fishing). Quite nice. Silverlight I think. It's a bit SW (in its results clustering, for example), though how it goes about doing this I don't know, but other "semantic" search stuff has shown up lately. TextWise (small "sw", I guess) has just been reviewed by TechCrunch, which was doubtless part of the point of offering a $1m prize for suggesting uses for its technology. Hakia is another such.
Stuff I've been doing the last week or two:
- the Great Fire of London site for Key Stage 1 kids finally soft-launched.
- working on templates for the Londinium site - the bulk of my time right now
- preparing the digital republication of an out of print handbook for identifying roman pottery fabrics. I probably mentioned it before, it involved the export of Quark to PDF, the export of PDF to XML, translation via several XSLT steps and manual clean-up to TEI-Lite, and finally modification of some XSLT to display this as an HTML page. Most of this was a while ago; right now I'm getting ready for the images which will need to be embedded once all the scanned thin sections are ready.
- testing out and integrating Flash interactives with our CMS. Several are pretty much ready for launch, including two from the London Sugar and Slavery exhibition, and two games
- advising as best I can on the development of the replacement map interface for the LSS gallery
- fretting over the re-branding exercise the MoL group is engaged in. how much work is it worth doing right now to fix issues on the sites if we'll be overhauling the whole thing in the autumn?
- testing new search engine SearchMe (see above). Didn't get good results for "roman london" yet, but it's only indexed a billion pages or so....
Monday, March 17, 2008
Those extra points:
There are geographical search (and geo plus time) projects going on in eContent Plus and IST, using co-ordinates, place names, changing boundaries etc. We would hope to incorporate these (possibly post-prototype). Everything in Europeana will be public domain (development-wise) therefore the software will be there for the taking (I hope I got that right!)
We mooted the possibility of privileged tags, i.e. those produced by certain authorised users, perhaps agreed by certain groups. Tags created by these users (most likely content contributors) would be treated differently so that we could pull out only certain items with a tag. But probably, rather than giving them some specific "privileged" status, we could achive the same thing just by identify them by contributor, user group or contributor type.
Stuff to clarify
- Licensing data model and assumptions
- Core common data
- Where is the boundary between Europeana and the contributor sites? Maquette seemed to include considerable data and the actual content displayed in-site for some types of asset e.g. images, but others might be held off-site. What are the rules?
- What needs to be added to the API to work well for libraries and archives?
Friday, March 14, 2008
- Content maintenance. It mimics the look of our CMS pages, but the content isn't integrated with our CMS. Changes to site structure won't be reflected in the menus, nor would updated content.
- Visual maintenance. The look of this site will change (we dearly hope) and I can't change their pages
- Google. I don't know how they look upon sites that look like copies of existing sites and point at their pages. I suspect it might look like spamming and I wouldn't want to be blacklisted.
- Site stats. We can't (readily) integrate the job site's stats with ours (if we get them at all). Not a huge deal to me but a factor.
- Cost. I don't know what this will have cost, but five minutes after getting hold of an RSS feed from their site I had integrated it into our own, replicating the most important part of what they'd done. I suspect we could have done it cheaper, in short!
Thursday, March 13, 2008
 see this too
Tuesday, March 11, 2008
Yesterday I put the EDLNet Paris slideshow onto Slideshare, but since Scribd also let's me put up other stuff I'm putting that and the notes there too. If you can't see these coz I've screwed up the Scribd embed link or something, go here for the presentation and here for the notes
EDLNet Paris presentation:
And just in case...
The previous post with options for API input parameters is also on Scribd (UPDATED 17/3/2008)
Monday, March 10, 2008
Fingers crossed they might even top that at Southend today. Go Dave White!
Saturday, March 08, 2008
Public API inputs and outputs
We discussed at the Paris meeting the range of parameters
that we thought that an API might need to handle to perform the sort of (public-facing)
tasks we envisaged. We didn't actually talk about output, except in regard
to the ability to specifiy return fields, but I think that this is actually
much the simpler part to work out. I've reworked our discussion, added a few
bits of my own (including the UGC bit), and split it into sections relating
to general parameters, filters for collections queries, and UGC. No doubt
lots more clarification and revision are needed and I'm pretty unclear on
some bits myself, but it's something!
The “profile” includes various elements defining the operation
in terms of function, languages, values and format of returned data etc. Collections
data requests will be required for some functions, and consist of various
filters. The third table relates to operations on user generated content,
including adding, editing and getting (by user or group). We may decide that
some operations are only open to specific users or categories of user; for
example, accessing UGC of some categories might only be possible for the owner
of that UGC (via their associate API key) or the owner of the collections
related to that UGC. TBC!
Query profile (data access and data addition/editing functions)
search, compare, translate, add, update
DC-XML, RSS, geoRSS, CDWALite, JSON, CSV. This might instead be implicit in the target URL.
Array of field names, but a default set would perhaps include GUID, title, thumbnail, short description, owner, owner type, media. Might also provide shortcuts to preset field groups. Will vary according to target entities
Formal metadata; all data; expert and user tags; user tags only; “expert” tags only; specific user/expert/group tags.
True, false [use/don’t use thesauri etc.]
EN, FR etc.
As above. If only one is present presume the same.
API user key
For the end user. Required for accessing/modifying data attached to specific users or groups. Presumably we need to authenticate and authorise in some way, too, for some operations.
Perhaps multi-value, specifying rights/licensing parameters. Likely to be more complex than one field!
Collection data filters (access only)
Objects, people, places, subjects [if we are enabling anything more than objects]
A unique identifier given to every record in Europeana
ID for a set of entities, which may require the appropriate key, depending upon privacy settings for that set.
Tricycle, ww2, treaty, Anne Briggs, documentary [multi-field search].
Name of object, person or place. If these use different fields, then the right one should be inferred from the target entity.
Examples: photograph; sunflowers; Forlì (or Forli); Max Brod (or M Brod); Rockall
14 July 1792; 19th century. “Older than 1850” might be expressed as: “- 20000000 – 1850”; “Younger than” as: “1850-2050”; uncertainty like “1850 +- 5” as a range: “1845-1855”, though this isn’t perfect
For returning objects, people and places
For returning objects, people and places. See also “geographical” below.
Of object, principally, for documents (if this data is well expressed)
Good structured data, ideally (we may require an ID), but we could permit a string search across the relevant field.
For searching by current location
Museum, library, archive, A/V archive
Including sub-parameters for grid reference, coordinates, place name, and the location of concern (e.g. place of creation, place of publication, location of subject matter)
Keyword occurrence, date precision, location, location relative to user, institution type – perhaps sorting partly inferred from the fields used in search, but if these are mixed e.g. date and place plus keyword, need to sort on one before the other.
text, audio, image, video or more specifically PDF, WAV, MPEG etc.
Map, book, video perhaps. Is this data held in a structured way, and is it distinct from the media metadata?
UGC operations (add, edit, view)
These operations will need user ID (or group ID) plus authentication and authorisation for certain operations (but not for viewing public data).
For modifying tag i.e. deleting, or viewing associated items
For modifying or viewing
Perhaps multiple values, including groups, so we can look for stuff with a given tag but only when tagged by a certain set of UGC contributors
Content contributor vs. other user
Friday, March 07, 2008
It was a successful meeting, I would say, and it was a pleasure to see a couple of faces I already knew, and meet others for the first time. On Monday (with me still reeling from a 4am start) we were taken through the results of the user testing. These were overwhelmingly positive, which needs to be taken with caution given the guided nature of the demo (especially with the online questionnaire, but also perhaps the expert users and the focus groups). All the same there were criticisms that provided something to get our teeth into, particularly around the home page and the pupose of the "who and what" tab. Search result ordering was an issue, a particularly thorny one in fact that we tackled on Tuesday as best we could. Clearly a lot of users don't really understand tagging, though they thought they liked it. Other plusses were for the timeline and map.
There was a good session with representatives of a French organisation for the blind and visually disabled after lunch (a bloomin' good lunch, in fact. Good wine, too. I love France!). Aside from HTML accessibility they talked extensively about Daisy, and it would be marvellous if some of the text content that may end up there could be daisified. No-one had heard of TEI (or DocBook) but it struck me that these formats are pretty close to what Daisy sounds like, and that there may be TEI material amongst the content we'll be aggregating, so translations to Daisy could be relatively straightforward. Anyone know?
Personalisation took us to the end of the day and we distinguished between activities done for private purposes (though perhaps with public benefits) like bookmarking with tagging, or tailoring search preferences, setting up alerts, or saving searches; and explicitly public activities like enriching content, suggesting and tagging (when not bookmarking). The question of downloads (what? how? assets or data?) and the related issue of licensing came up. I think we worked out that possibly four levels of privacy would be useful, extending the way Flickr and other sites work, with private, public, friends/family, and "share with institutions". The latter is really about saying, I will let me and my mates and the organisation whose objects I'm tagging/annotating look at the data, but not everyone. I think it's important and should be encouraged, as it lets those institutions do interesting stuff with the resulting UGC for everyone's benefit. We ran over into the next day to deal with communities (still plenty to think about there, I would say) and results display, a practical and useful discussion that touched on the fields that might be searched across and how they would be used in ranking.
Finally my bit came up. Although Fleur had suggested that I talk for maybe 15-20 minutes to kick off discussions on the API, I, feeling unsure of my ground, prepared pretty thoroughly with the result that I had material that kept me talking for an hour or more, I think, albeit with some digressions for debating what I was saying. On the whole it went down quite well, I think, but I learned a bit about what I should have added (proper, simple explanations of APIs, and more examples of how they're used) and what I should have left out (a section where for the sake of completeness I referred to the management of collection data, which is not part of the public API anyway and is outside the scope of our WP. This led to a digression that was I think still useful, but not to the topic of that moment). And then, seeding the discussion with a use case related to VLEs, we tried to figure out in more detail what functions and parameters would be needed in an API call, and what would be returned. And that, my friends, I will write up shortly for now I want my dinner. Home calls.
Thursday, March 06, 2008
Also from MS, SeaDragon (see also Photosynth) in Sliverlight 2. More than a bit useful for use cultural heritage types  and I should add that the video on that TechCrunch page is of a cultural heritage application - Hard Rock Cafe's memorabilia application, which was the demo shown at MIX08. They talk about the role of imaging for authentication, for bringing objects to life, and though it's obviously a business, their business is really not so far from ours (albeit for profit)
Sunday, March 02, 2008