Tag Archives: RDF

Better Recommendations Through (Linked) Data

Recommendations. Everyone’s talking about them, to paraphrase the old Eastenders slogan. I’m currently working on a pilot project looking at ways to expose the BBC’s archive content, help people find programmes they might be interested in, and clearly show when the programme was made/broadcast. Part of this work includes examining the ways we can improve episode to episode recommendations. I’ve been doing lots of thinking around this, and here’s the latest.

When it comes to recommendations, there seem to be four approaches. Each have their advantages and disadvantages, but I would argue that, until now, only three of the options have been tried in earnest.

Firstly, there’s the traditional method of hand-picked, manual ‘editorial’ recommendations. This means that staff consider each programme they’re responsible for, look around at what else is on offer, and pick out other programmes that could sensibly be recommended. The advantage of this method is that it’s often highly targeted, and good quality, basically because it’s been sense checked. The disadvantage is that it doesn’t scale well. It requires a great deal of human effort, and equally, a potentially vast knowledge of the programming output of a broadcaster in order to reap the maximum benefit. However, until recently, it’s been the safest, if not the only option on the cards.

The next three approaches are more to do with the reasons for recommendation. They’re often the reasons behind the manual recommendations, but as we turn to data-driven systems more and more, these reasons can inform automatic recommendations.

Production-based information – By this, I mean using production data, such as programme structure, categorisation, classification and cast/crew details, to power recommendations. In its simplest form, this can be seen on bbc.co.uk/programmes for almost any episode, where you can see the previous and next episodes in a series. Essentially, this is a recommendation as to what episodes it would make sense to consume before & after the one you’re looking at. Similarly, the genres, format and channel aggregations offer recommendations based on traditional broadcast classification structures. On the plus side, these are (relatively) easily sourced from the existing programme making workflow. They can also provide pretty useful recommendations. However, they tend to be very general. For instance, just because something is on the same channel, or in the same genre, or indeed, has the same actor in, doesn’t automatically make it a relevant recommendation. I would even argue that just showing other episodes in the same series or brand, as is done on things like iPlayer, aren’t really the best recommendations, and probably shouldn’t be sold as such.

Social-based information – Here, I’m talking about probably the most prevalent form of recommendation at the moment – or at least the one that everyone seems to be advocating. Here, we would collect data on a person’s viewing/listening habits, and use this data to provide other programmes that they might want to see, based on a combination of the frequency/range of their consumption, and the already established production-based recommendations. In addition, this can then popularly be combined with social networking information, so that recommendations can be provided based on what other people you are linked to have been consuming. Again, the advantages are that you can build up a fairly accurate picture of the type of audience you have, based on what they’re consuming, and this can then be used to influence both what you provide to them, and what you commission. However, there are major downsides to this, as well. Firstly, speaking personally, although I accept that recommendations from friends can be helpful, I don’t believe it’s the correct primary source for recommendations. Certainly, I’m not really interested in just knowing that other people have watched a particular programme – just because they watched it, doesn’t mean they would recommend it. Indeed, just because they liked it, also doesn’t mean they would recommend it. A recommendation, in this form, at least, has to be pro-active. That is, I’d much rather a friend actively recommended a programme to me, rather than a computer spying on their habits and then telling me. Which brings us to the second problem – the slightly dubious ethical/moral question of whether it’s right for companies to collect detailed information about audience habits. A really thorny question, which I’m not going to delve into now.

Which brings us on to the final form of recommendation, the one I believe gives the greatest benefit. And surprise, surprise, yes, it’s Content-based recommendations. Here, I mean something deeper than ‘this episode is in the same brand’, something more specific than ‘this programme has something to do with the same topic’, and something less, well, creepy than ‘twelve of your friends watched the Inbetweeners, so you must too!’. I’m also not talking about just tagging content. Tagging is probably the simplest and crudest way of doing this – it’s a start, but it really isn’t the end game. I mean that it’s necessary to, as far as possible, represent the actual content of the programme as data, and then link to other programmes which utilise the same data. This provides the most accurate recommendations, because we know that the exact same thing (or at least things with meaningful links between them) are being recommended. The downside, unfortunately, for the time being, is that it would have to be a fairly manual process. In this way, yes, it’s similar to the hand-picked, curated recommendations I mentioned earlier. The difference here, though, is two fold. Firstly, we’re capturing the reasons behind the recommendation as data itself, which leads to automatic re-use rather than constantly having to manually pick things (there would, of course, probably still need to be some form of editorial oversight to at least pick out highlights from the potential mass of auto-generated recommendations). Secondly, it can be folded into the production workflow from the very beginning, by engaging with writers & production staff, so that a seperate team is not required, and the recommendations can be captured and compiled at the very source, rather than after the fact. Commonly, the people who will know the content (and therefore the links) the best, will be the people who made the content in the first place.

This really shouldn’t be news to anyone, and yet it seems that this approach, until now, hasn’t been tried, in the main. I really can’t understand why, although given the problems and reluctance to even provide enough accurate data to power the production-based recommendation perhaps provides a clue. But I don’t think I’m alone in advocating this. In the oft-quoted (but perhaps not often enough!) words of Nicholas Negroponte, in 1995’s Being Digital:

We need those bits that describe the narrative with key words, data about the content, and forward and backward references….The bits about the bits change broadcasting totally. They give you a handle by which to grab what interests you and provide the network with a means to ship them into any nook or cranny that wants them. The networks will finally learn what networking is about.”

So that’s not just tags, but data to actually represent the content.

With all this in mind, I’ve begun to compile a mixture of production-based and content-based recommendations for traversing through the BBC’s archive. The next post will provide some examples of this, and lead you through the format and choices I’ve made in representing these links in the n3 format of RDF.

Narratives and the Semantic Web

Super Bowl Sunday Crystal Ball, by Circulating, from Flickr, Creative Commons license

“People assume that time is a strict progression of cause to effect – but actually, from a non-linear, non-subjective viewpoint, it’s more like a big ball of wibbly-wobbly…timey-wimey…stuff.”

The websites that we create around the narratives we tell currently focus on the objects within those narratives, like the playing pieces in a set of toys. More often than not, these are hand-crafted, static pages about certain editorially defined objects. Although we can record the links between objects implicitly as things that the audience can travel along, we tend not to expose them as things that the audience can explore and see in context. This, however, is what we are really interested in when describing narratives or telling stories. We define the objects within the world of the narrative, and then describe the interactions and changes between the objects. The intriguing thing is not necessarily the objects themselves, but the ways in which they change, or otherwise. A truly engaging website would therefore allow the audience to explore the world of the narrative not only by navigating between the objects, but by exposing and analysing the links between them, in order to derive more satisfaction.

Outside of the web, when we focus on an object, our minds give it context, and naturally establish the links between relative objects. For instance, when on a train journey, if I look out the window, I can see that branch of that tree which is placed there. We are instantly aware of both the object and its context, the thing and its links. Do the same thing with a computer, and it could identify and create a URL for a branch of a tree, but this would exist in a vacuum. It is up to us to give it the context. Using the principles and technologies underlying the Semantic Web, however, we can start to embed the context, the links, the meaning, so that, when using the web, we do not have to define these things every time. Instead, we can concentrate on uncovering and analysing those links, so that we can derive greater understanding and enjoyment from them.

Currently, websites such as www.bbc.co.uk/programmes define the objects, their contexts and links in a semantic web fashion, so that we can uniquely identify a particular object. Essentially, it provides the building blocks upon which we can establish the type of website I hae described above. Unfortunately, as far as I am aware, these building blocks are the limits of what we can currently, reliably, achieve. Emerging technologies such as SPARQL and RDF/graph visualisations will help us to build upon these blocks, but I do not think we currently have an established, reliable ‘toolkit’ or process that we can use to do this. However, this does not mean it cannot be done – it needs further experimentation. In the meantime, we can set about ensuring that the websites we build now will allow us to achieve the ideas mentioned above.

In the context of the BBC, there are two areas in which I can imagine the benefits of such an approach. The first, I will only give a overview of, as I have only thought briefly about the possibilities. The other, regarding fictional narrative, has been the focus of my previous blog posts, and I will continue the discussion here.

The first area is sport, particularly football. The BBC Football website contains a wealth of information, covering what is, in effect, the (almost) closed-off world of football. Fans essentially are following a narrative which spans matches, clubs, leagues, seasons, cup competitions etc. There is, obviously, some organisation taking place on the website – organising the clubs into their leagues, for instance. However, the links between these things – and here I mean not just the clubs, but the players, the action – are rarely revealed. We know that a team is relegated from a division because on one day their page exists within the ‘Premier League’ section, whereas the next, they are in the ‘Championship’ section. Their history may be recorded on the team’s page, or preserved in the numbers of a league table for a particular season, but there is no way of effectively (and, most importantly, engagingly) charting their fortunes. Of course, we can present these things in the numbers and bar charts and graphs, but they do not take advantage of the existence of the narrative behind them – which is really what people are interested in. Similarly with players. When two players go in for a tackle, we know that they have a history of confrontation, or perhaps an embarrasing own goal incident – what if we could provide the context around that tackle as and when, and after, it happens – filling in the back story, and getting the audience excited and engaged.

Similarly, by identifying and putting objects and events in context, we can give the audience something to latch on to. Take, for instance, a penalty incident. Say that the match was being covered on 5 Live with a commentary, it was shown and discussed on Match of the Day by pundits, and then also talked about on forums and 606 by fans. If we had an identifiable ‘hook’ for the incident, then potentially we could build a page which brought together all these different interpretations and discussions of the same event. That way, the audience would have an effective overview of the incident, with informed (and perhaps ill-informed!) opinions – their understanding and enjoyment would be enhanced, and of course, they could make their own contribution.

Back to the fiction – in my last post, I linked to a couple of images within which, I tried to explain what I aim to achieve, and where the benefits could be found. The first diagram establishes the episodes as a whole, regardless of series – and then drills down to a particular series, and a particular episode. A website that deals with a fictional narrative needs to remember that the episodes are merely a window onto the universe for the audience. If we intend to allow the audience to fully explore the universe, then apart from pointers leading them from/to episodes, as a form of ‘way-in’ (which, incidentally, should probably be through /programmes) the episodes themselves should (probably) not be included – all that exists are the objects (the places, the times, the characters) and the events.

The first diagram, once an episode has been specified, identifies the characters and events within the episode that are crucial to the narrative. For this, I limited myself to a handful of events and characters, which meant that I did not fully get the richness of the narrative across. However, potentially, we could identify as many events etc. as we require. Below the timeline of events (as presented to the viewer) there are coloured blobs, representing the characters in the events. This view shows us how the characters come and go throughout the episode (for instance, the Doctor only really appearing at certain points in the beginning, middle and end).

The second diagram gets closer to the value of this kind of site. Here, we see that the way in which each character experiences the events of the episode is quite different. This is crucial both to the plot and to the audience’s understanding and enjoyment of the episode. If, for instance, you wondered exactly how things tied together, then exploring this kind of site would allow you to piece together the parts of the puzzle. Perhaps on each character’s page, we would show their timeline, and how things happened to them. From the Doctor’s perspective, for instance, the event at the end of the episode is the first thing that happens to him – and the last from Sally’s point of view. Also, by showing these different timelines in the context of each other, we see the intricate way in which Steven Moffat (the writer) is able to weave the story together – giving the audience a greater appreciation of the story as a whole.

Obviously, Blink (so far) is an atypical episode of Doctor Who. By and large, the stories do not tend to concern themselves with the ‘timey-wimey’ stuff. However, over the course of a series, or indeed several series, characters, events etc re-appear – for instance the ‘Bad Wolf’ motif – the reason that the cliffhanger to ‘Turn Left’ works so well, is because it draws together elements of continuity established throughout several series. The audience gets maximum enjoyment out of such a moment because they are aware of the links and the context.

So what of the original series, whereby both ‘timey-wimey’ stuff and ‘story-arcs’ were at a minimum? Well, there are still instances of recurring themes, but overall, stories are self contained. That’s fine – they could be slotted into this kind of website just like everything else, because it essentially gives us a great pool of narrative to draw upon – if and when needed. Crucially, though, they represent a pool of ideas that future writers can draw upon if they wish. Continuity should not restrict the writing of future stories – the previous stories merely open out the fictional universe, creating more richness for authors. As such, when feeding the ‘classic’ stories into the website, the site becomes a form of ‘official’ wiki. Users can and should be encouraged to contribute, as a form of writing their own stories, but a distinction can be drawn between the events depicted on screen (it is, after all, and should not be forgotten, a television show..) and those where people ‘fill in the gaps’. The series itself has touched upon this, with the idea of certain events being ‘fixed points’ and others being ‘in flux’. As long as the narrative is not disrupted (i.e. breaks down so that it no longer makes sense to the audience) or becomes to insular (i.e. relying too heavily on continuity, so that new audiences are driven away), then continuity can enhance the fictional narrative universe as a whole.

Finally, a new diagram which, on a very basic level, tries to illustrate the idea that the website could be explored and presented through the model of, as quoted above, “a big ball of wibbly-wobbly…timey-wimey…stuff.” The diagram is quite obviously incomplete, but the idea is that the objects and the links between them are visualised, and the audience can then choose to look at a particular object, and see how it ties in to everything else – seeing both the object and its changing context and perspectives at the same time.

Phew. That’s enough for now. Till next time…

Canon Flux

Blue Box – by Brainless Angel, Creative Commons, via Flickr

Well, that’s been an interesting couple of weeks. I’ve made progress on both the RDF/Ontology and the Ruby/Rails front – although one much more significantly than the other. We’ll deal with both in a moment, but first a few encouraging signs.

Thanks, as ever, to those who have commented on my posts so far – some interesting questions have been posed, of which I’ll get onto later, and ideas have been discussed. One thing that has encouraged me a great deal has been this article by Tom Scott and Michael Smethurst on coherence at bbc.co.uk – most of which I’m familiar with from work, but the references to non-linear narratives, the BBC as a story-telling organisation, and how to adapt that for the web, are of particular interest and encouragement, because narrative and story-telling is at the heart of what I’m trying to achieve. Michael also wrote this article over at the Radio Labs blog, which is gathering plenty of praise (not least from Tim Berners-Lee himself!) – from my perspective, it’s given me a useful focus on the steps I’d need to take to move these ideas from blog posts to a working prototype, and hopefully beyond.

So, how have I been getting on with a) developing a front-end website for exploration and administration of a fictional universe (by administration, I mean the creation of new elements in the ‘toy-box’ and the links between them), and b) an ontology (and accompanying RDF examples) to describe the narrative content of episodes?

Ruby/Rails – A Web Front End

One half of this project has always been focused on providing a front end application/web site. One in which users (I won’t limit or define *who* they might be at the moment) could navigate and explore a fictional universe, in a wider, more open format than the current focus strictly on the episodes. A suggested approach to the project as a whole has been to create the web app first, use that to produce and store the data in a MySQL (or similar) database, and then expose the data as RDF etc. Unfortunately, my lack of technical expertise has severely hindered my progress on this front. Over a week and a half has been spent on just getting Ruby/Rails up and running properly, and learning the basics (for which, thanks must go to Anthony Green and Craig Webster in particular for being patient and offering help whenever possible). Although it’s still a strand which I think is important, and would like to develop, I’ve been worried that concentrating solely on the Ruby/Rails side is taking me away from the semantic web/linked data roots of the idea, which I’d prefer to get sorted out first. I’ve also realised that before I can really begin to develop the front end properly, I need to know the scope and domain model inside out. Both of these I had a fair idea of, but the domain model in particular was very much a work in progress – and so I felt there was less value in developing the application until I had it sorted out. Note my use of the past tense to describe the domain model – which leads us nicely on to…

OntoMedia – An Ontology for Describing the Narrative Content of Media

This is a story of serendipity (which reminds me, of course, of the Jon Pertwee story ‘The Green Death‘, aka ‘The One with the Giant Maggots’, in which the concept of serendipity plays an important role – anyway, where was I?). I’ve already described in detail my frustrations previously with the tantalising prospect of the SUDS ontology – something which several people have helpfully mentioned as a good starting place, but for which an actual ontology specification has been lacking. I’m still pursuing the SUDS material, thanks to Kim in the comments, but I’ve managed to get my hands on an ontology which might just be what I need – OntoMedia. A chance meeting with Mike Jewell at the last OpenSoho (see, networking can be useful) led to a discussion of this project – and it turns out that whilst at the University of Southampton, Mike and Faith Lawrence (amongst others) developed an ontology called OntoMedia for doing just as described in the heading. It has its’ roots in an exploration of online fan fiction, and is extremely detailed and flexible. The fan fiction roots also mean that it has been designed with geeky subjects like Doctor Who in mind, which is a bonus. However, being so detailed and tailored to the fan fiction roots means that, speaking personally, it sometimes focuses a little too much on fantasy genre elements (detailed descriptions of clans, bonds, blood oaths, woods and coppices etc), whilst seemly lacking a couple of minor basics (although I’m still getting to grips with it, so it’s possible that I’m just missing the obvious bits…!). But that’s not to knock it at all – it’s a highly accomplished piece of work, and allows all kinds of narratives to be described. Since our initial meeting, I’ve been discussing the possibility of developing and improving the onotology – I truly believe that with a little more work, it brings me a huge step closer to my goal, and could end up being widely used throughout the BBC. To be honest, I’m just surprised that no one else had picked up on its’ potential yet.

I think I’ll leave a detailed description of how to go about implementing stuff in OntoMedia for another blog post, but what I can do is give you a flavour of the basic principles. Essentially you establish the existence of (at least) two universes – reality, and the fictional universe. Within the fictional universe you establish a timeline, your characters, locations etc, and link your characters to defined actors in the ‘real world’. Here, we can deal with characters and elements which are of dubious or multiple origins – we can deine essentially concepts that are shared between media, and their provenance as part of a universe (or context). I’ve also then defined episodes as being things existing in the real world, with their own timelines – the episodes are then linked into the bbc.co.uk/programme equivalents. Finally, you establish events which can occur in multiple timelines (and in different orders within those timelines). That’s the principle, at least. For me, it all harks back quite nicely to that ‘toy-box’ analogy. You set the scene, choose your characters, then tell the story. It’s also important to bear in mind that we’re not trying to restrict creativity and lay down the law for what happened and when – to use the analogy from within Doctor Who, some things are fixed points in time (i.e. the stuff shown on screen), others are in flux.

As for my progress so far, I’ve been helped by Yves Raimond in particular for reminding me of the benefits of writing n3 triples, Patrick Sinclair and Nicholas Humphrey for other guidance. I’ve been working to two case studies. The first is to eventually show the benefits of linking characters and events across several episodes – for this, I’ve defined the scope as the 2005 series of Doctor Who (including The Christmas Invasion), with the intention to show the Bad Wolf arc (I can then extend this to cover the second, third and fourth series). The results of which can be seen here and here. (You’ll need an RDF extension like Tabulator for Firefox to navigate the links properly).

The second case study is designed to highlight the benefits of exploring events in the fictional universe and comparing them with the order in which they occur within a given episode – so that the skill with which the writer has constructed the story can be fully appreciated, and the enjoyment of the story can be increased. For this case study, I’ve chosen to concentrate on the award winning story from the 2007 series, ‘Blink‘ – famous for its use of multiple, interconnected timelines – very ‘timey-wimey’, as they say. Results so far, which just set up the timeline, the episode, the characters, actors and locations, can be found here.

Events and occurrences are, by their nature, a little more complex, and I’m currently trying to get my head around how best to represent them – the OntoMedia ontology allows extremely detailed representations of the data, but I’m trying to stick to simple representations for the moment – the achievement of which is my current challenge.

The ontology allows, essentially, the description of any narrative. Which leads me to a potential further case study. Obviously for the moment I’ve been concentrating on fictional universes – but this could easily apply to the real world. Could this be a way to describe events and blend the semantic web into other areas of the BBC’s output in an easier and more subtle way? For instance, coverage of a football match – again, define the teams, the players, the timeline of the match and the various events. Then, again, we would have permanent, stable URIs for each team, player, event – I think the possibilities and potential are huge.

Finally, in terms of my overall approach – my current thinking is to continue with writing the RDF, then load it into a triple store. An application would then be written to allow the querying of data in the triple store, and its representation in a well designed, user facing front end. If there are standard patterns in the RDF for creating characters, events etc using Ontomedia, then ideally the application would take these recipies and allow the user to input the data without having to interact directly with writing RDF.

So there we are – a great deal of progress – not all the way there, but a huge step forward – although the phrase ‘Standing on the Shoulders of Giants’ does come to mind… Once I’ve worked ou how to represent events and occurrences, the triple store will be next, then the Ruby/Rails application, and then some design magic. Wish me luck!

Tuning Fork

Tuning Fork, by Toby Esterhase, via Flickr – Creative Commons

Part three of my investigation into fictional content modelling. See the previous two posts for the background to the project. Thanks to those who’ve been discussing the ideas – I think it’s coming along nicely. I’ve been playing around with writing some RDF, trying to link up various ontologies, and explaining what I’m trying to do as I go along. Here’s a plain text file of quasi-RDF within comments – see what you think…(UPDATE: Now here in beautiful RDF format :-) )

One thing that has come up in the discussions, though, is that there’s perhaps two elements to what I’m trying to achieve. The first is to link existing ontologies and, if needed, build a new one, to help describe the narrative content of ‘stories’ within the context of television and radio programmes. The second is to experiment (and for me to learn) with existing ontologies, again, linking them up, to build dynamic and interesting webpages that work on linked data principles.

So I’m interested in the ontology *and* what kind of cool stuff we could build on top of it (which includes ideas around remixing narrative, and audience story-telling). I haven’t got any definite plans on top of that at the moment, but I think the key is to see where it takes us. Well, I have an image in my mind of the types of things we could do, but again, it will be easier to describe them by prototypes. Something that might help is if I was to link to this diagram, from the aforementioned Tristan Ferne’s Radio Labs blog, describing similar things to do with the Archers – except linking that up with linked data/ontology work…

Which would lead to something like the diagram below. Again, it isn’t a complete set of what I want to do, but it shows the types of objects we’re talking about, the relationships between them, and where they link to ontologies:

Contextual Data Model

Contextual Data Model

Actors – Using FOAF, with possible extensions, this would be a URL for each actor who appears in a BBC show. This page could pull in a biography from WIkipedia, for instance, but mainly it will show the audience all the programmes that the actor has appeared in. Linking Actors to Characters, all the way through to Episodes, would allow us to auto-generate the cast lists for the /programmes episode pages. However, one problem in an early implementation might be that if we only record ‘significant’ events within an episode, the cast lists won’t represent everyone – but over time, this could be improved (the rest of the cast could possibly be listed manually against the episode, greyed-out, until they have their own URL).

Portrayal – This would allow an Actor to play many Characters, and a Character to be played by many Actors. Here I’m thinking more of ‘flashback’ scenes where you see a character as a child, but as Tom pointed out in the comments, this could be used to handle the different actors playing the Doctor. BUt how then would you deal with the different ‘characterisations’ of the same character?

This is where the recursive relationship around ‘Character’ comes in – I haven’t worked out exactly what to call this yet, but it would allow both the foaf:knows relationship, and potentially use the owl:sameAs to link different Doctors? (Perhaps not – but something along those lines).

Again, a many-to-many resolver is needed between Characters and Events, which I’ve called ‘Action’ – I’m not sure whether these many-to-many objects would need to be made explicit and have their own URLs, but the main objects certainly would, as they could have useful pages for the audience to explore.

Events would be pages that would describe a significant event in the episode, something that would be worth describing, for instance an event which is part of a wider story arc – we would then need a URL to link these together, so you could say that ‘Someone points out that Donna has something on her back’ is part of the ‘Donna/Time-Beetle’ story arc (apologies for the random example!). This is, though, where the main value of the project would be for the audience. BY giving an event a URL, the user could trace storylines throughout the episodes, outside of the confines of the episode structure – making the fictional universe more cohesive, rather than restricting our view to the episodes, which are like ‘windows’ onto the fictional universe.

Similarly, if a user then wanted to write a story featuring some of the characters, they could refer to the character’s URL (which would then allow us to have something on the character’s page to say ‘others have written stories using this character’ – linking out  onto the web, and promoting new writers and stories. The users could equally refer to events, perhaps building events into their owns stories, taking them as cues for new stories etc. Again, it all fits in with the idea of giving our audience the tools to be creative, whilst using the advantages of the BBC website’s exposure to promote audience creativity.

There’s one many-to-many resolver which I’m not sure about at the moment – between Events and Episodes – what if the same event was  shown, or even just referred to, in more than one episode? We would need some way of defining this – but I’m not sure of the correct term for it yet, hence the ‘???’ object.

So – events could be described using the Event Ontology. Actors and Characters would use the FOAF ontology. Episodes would use the Programmes Ontology. We therefore just need a way of tying them together, and then once we have some examples, it would be good to start thinking about what new things we might need from a new ontology.

On the subject raised in the comments about expressing a person in FOAF as  fictional or real – I’d side withi Tom in saying that it would be  better to label the individual people as fictional, so that it was explicit which FOAF people were characters or not – and then you’d also have the issue of characters being used to represent, for instance, historical figures such as Charles DIckens…

Anyway, that’s enough for this entry. I hope I’ve got a little further in both clarifying the two strands of the idea, and exploring the breadth and potential of it. Comments, discussion, etc. encouraged! I’m hoping to present the idea in a meeting this coming Tuesday as a possible 10% time project, so I will keep you posted…

Baby Steps

Photo by strollerdos, via Flickr, Creative Commons

This is the second post in a series covering my exploration, experimentation and musings in the area of fictional modelling. In short, can we use the recent developments in semantic web technologies to represent elements of fictional content, and what does this allow us to do. For my introduction to the topic, see my previous post here. In this entry, I’ll talk about my first practical steps, and their implications. Thanks also go to Tom Scott, Dan Brickley and Anthony Green, amongst others, who responded to the first post with helpful comments.

Before I go any further, as pointed out by Chris Sizemore, it’s worth noting that work has been done in a similar area before. Previous IAs at the BBC, including Celia Romaniuk, worked on an ontology to describe the content of soap operas, known as SUDS. From what I have seen, it was an extension to FOAF in order to describe further relationships between people, the nature of people ‘playing’ characters, and various events that could take place between the characters in a show. This was done to tie in with an Eastenders website relaunch. I won’t go into much more detail here, but if you’re interested in seeing the original work, there’s a short article here and a great presentation here. Unfortunately, apart from a few example XML fragments, I have so far been unable to find a document that defines the SUDS ontology. This is a shame, because it would have been an extremely useful starting point for my experiments. One option might be to gather the examples together and try to reverse-engineer a schema, but for the moment, and partly as a way for me to learn as much as possible, I’ve decided to start from scratch. Hopefully at some point we can find the SUDS ontology and see how it compares to what I come up with.

So, where to start? Well, as the title suggests, I’m going to start small. Sort of. Readers of the blog, and others who know me, will probably have guessed that I’m a bit of a, shall we say, ‘fan’ of the BBC’s Doctor Who (currently in the news for apparently appointing a 12-year-old as the Eleventh Doctor). So much so, that in my sad little way, most things that I’m presented with in the course of my BBC IA work make me think “How could/would this apply to Doctor Who?”. As a programme that originally ran for 26 years, and has been enjoying an overdue renaissance, its rich history, and sheer refusal to ever completely conform to most IA domain models, make it both a source of frustration and inspiration. So when I read Tristan Ferne’s blog post over at BBC Radio Labs, shortly before joining the Beeb, I began to wonder. Have a read, it’s a good example of a similar idea.

Tristan’s article concerns fictional modelling for another hugely successful BBC show, The Archers. He talks about being able to break an episode down into scenes, characters, plots etc. and, for instance, potentially being able to build pages that allow the user to follow a story through multiple episodes, rather than being tied to the traditional episode format. Of course, to paraphrase Jack Bauer, events within The Archers occur in linear time. If we were able to build dynamic and interesting websites from a show like that, centred around a small English village, how about a show that goes forward, back and sideways in time and space? Harking back to my ‘toy box’ analogy from last time, with the imagination of the writers of a show like Doctor Who, and the imagination of our audiences, the potential to create some fantastic websites would be huge.

Sorry, where was I? Oh yes, starting small. So, yes, obviously I couldn’t hope to cover the whole scope of the show in one go. However, to show the potential of the semantic web and linked data approach, I’d want to start off by experimenting not only with characters who are linked together, but with a plot that is threaded through several episodes. I still haven’t quite decided what I’m going to choose for this, but I’m thinking that the story arc from either the first or fourth series of the current show would be good to try. But before all that, I had to learn how to create some linked data.

So I went even smaller, even simpler. I chose the first ever episode of the show, from 1963. This featured four main characters, and thanks to the workshop from Yves and the others, I had an inkling of an understanding of how to create FOAF profiles. The results can be seen here (best viewed if you use a Firefox plugin like Tabulator). So far so good. I then linked each character to the other, using the simple ‘knows’ relationship. Finally, to get my linked open data brownie points, I linked each character to its DBpedia equivalent, using the OWL ‘same as’ relationship. And that’s basically it. Except…

Except even this small experiment (which I eventually got working after help from Yves!) raises some interesting points. Firstly, the pernickety part of my brain is saying that we’re mixing two distinct things here. We’re using FOAF, which, I guess, and am happy to be corrected, is primarily intended to represent real people, to model fictional things. Crucially, nowhere, at the moment, are we explicitly stating that these resources are fictional characters.  So I’m wondering whether FOAF is the correct ontology to use. Of course, like SUDS, the ontology that results from these experiments will probably be an extension of FOAF, as it is true to say that we’re still modelling the same sort of ‘thing’, the relationship between ‘people’. But the point still stands – that somehow we need some way of indicating the ‘fictional’ nature of the FOAF person, if applicable.

Secondly, and perhaps more importantly, as Anthony Green pointed out, and as I discovered when I linked the characters to their DBpedia equivalents, there’s a lot of detailed information out there already. When I linked each character to DBPedia, I got back information which was extremely detailed and fairly well structured. Which, to be honest, depressed me a little bit. Was it worth me continuing? It’s clear that others had done a lot of similar work already, and I knew that ultimately it would be silly to reinvent the wheel.

However, then I remembered what data I was trying to link. Of course I should still link to the DBpedia equivalents, but the linked data I am thinking of is more to do with linking between characters, plots etc within my own domain. I’m still slightly uneasy with this, because I know that obviously the main thrust of the whole linked data movement is to link external sources together, and that creating silos of data is not good. However, I’m still definitely in favour of linking to DBpedia – if we were to make our ‘internal’ linked data semantically rich, and then link to external sources, then everyone would benefit, and in a way, we would be regarded as the ‘master’ source in the same way that, in my small experiment, I used DBpedia as my ‘master’ source.

So that’s it. A long, rambling blog post, and small, simple experiment. Baby steps. Apologies for the rambling, and I’m not sure that I *quite* explained myself properly in that last part – but there’s definitely some interesting issues coming up already, and I’m hoping that the advantages of my position will be borne out in future experiments. Finally, I’ve adapted the RDF file that I used to create the FOAF profiles to temporarily remove the OWL ‘same as’ relationship – just to ease the page loading time, and to, for the moment, give me a more clean space to work in. The adapted version is here, the original version here. Linking back *in* to DBpedia will be a task for later…

Again, comments, queries, advice is more than welcome – comment, twitter or email me.