Tag Archives: RDF

Better Recommendations Through (Linked) Data

Recommendations. Everyone’s talking about them, to paraphrase the old Eastenders slogan. I’m currently working on a pilot project looking at ways to expose the BBC’s archive content, help people find programmes they might be interested in, and clearly show when the programme was made/broadcast. Part of this work includes examining the ways we can improve episode to episode recommendations. I’ve been doing lots of thinking around this, and here’s the latest.

When it comes to recommendations, there seem to be four approaches. Each have their advantages and disadvantages, but I would argue that, until now, only three of the options have been tried in earnest.

Firstly, there’s the traditional method of hand-picked, manual ‘editorial’ recommendations. This means that staff consider each programme they’re responsible for, look around at what else is on offer, and pick out other programmes that could sensibly be recommended. The advantage of this method is that it’s often highly targeted, and good quality, basically because it’s been sense checked. The disadvantage is that it doesn’t scale well. It requires a great deal of human effort, and equally, a potentially vast knowledge of the programming output of a broadcaster in order to reap the maximum benefit. However, until recently, it’s been the safest, if not the only option on the cards.

The next three approaches are more to do with the reasons for recommendation. They’re often the reasons behind the manual recommendations, but as we turn to data-driven systems more and more, these reasons can inform automatic recommendations.

Production-based information – By this, I mean using production data, such as programme structure, categorisation, classification and cast/crew details, to power recommendations. In its simplest form, this can be seen on bbc.co.uk/programmes for almost any episode, where you can see the previous and next episodes in a series. Essentially, this is a recommendation as to what episodes it would make sense to consume before & after the one you’re looking at. Similarly, the genres, format and channel aggregations offer recommendations based on traditional broadcast classification structures. On the plus side, these are (relatively) easily sourced from the existing programme making workflow. They can also provide pretty useful recommendations. However, they tend to be very general. For instance, just because something is on the same channel, or in the same genre, or indeed, has the same actor in, doesn’t automatically make it a relevant recommendation. I would even argue that just showing other episodes in the same series or brand, as is done on things like iPlayer, aren’t really the best recommendations, and probably shouldn’t be sold as such.

Social-based information – Here, I’m talking about probably the most prevalent form of recommendation at the moment – or at least the one that everyone seems to be advocating. Here, we would collect data on a person’s viewing/listening habits, and use this data to provide other programmes that they might want to see, based on a combination of the frequency/range of their consumption, and the already established production-based recommendations. In addition, this can then popularly be combined with social networking information, so that recommendations can be provided based on what other people you are linked to have been consuming. Again, the advantages are that you can build up a fairly accurate picture of the type of audience you have, based on what they’re consuming, and this can then be used to influence both what you provide to them, and what you commission. However, there are major downsides to this, as well. Firstly, speaking personally, although I accept that recommendations from friends can be helpful, I don’t believe it’s the correct primary source for recommendations. Certainly, I’m not really interested in just knowing that other people have watched a particular programme – just because they watched it, doesn’t mean they would recommend it. Indeed, just because they liked it, also doesn’t mean they would recommend it. A recommendation, in this form, at least, has to be pro-active. That is, I’d much rather a friend actively recommended a programme to me, rather than a computer spying on their habits and then telling me. Which brings us to the second problem – the slightly dubious ethical/moral question of whether it’s right for companies to collect detailed information about audience habits. A really thorny question, which I’m not going to delve into now.

Which brings us on to the final form of recommendation, the one I believe gives the greatest benefit. And surprise, surprise, yes, it’s Content-based recommendations. Here, I mean something deeper than ‘this episode is in the same brand’, something more specific than ‘this programme has something to do with the same topic’, and something less, well, creepy than ‘twelve of your friends watched the Inbetweeners, so you must too!’. I’m also not talking about just tagging content. Tagging is probably the simplest and crudest way of doing this – it’s a start, but it really isn’t the end game. I mean that it’s necessary to, as far as possible, represent the actual content of the programme as data, and then link to other programmes which utilise the same data. This provides the most accurate recommendations, because we know that the exact same thing (or at least things with meaningful links between them) are being recommended. The downside, unfortunately, for the time being, is that it would have to be a fairly manual process. In this way, yes, it’s similar to the hand-picked, curated recommendations I mentioned earlier. The difference here, though, is two fold. Firstly, we’re capturing the reasons behind the recommendation as data itself, which leads to automatic re-use rather than constantly having to manually pick things (there would, of course, probably still need to be some form of editorial oversight to at least pick out highlights from the potential mass of auto-generated recommendations). Secondly, it can be folded into the production workflow from the very beginning, by engaging with writers & production staff, so that a seperate team is not required, and the recommendations can be captured and compiled at the very source, rather than after the fact. Commonly, the people who will know the content (and therefore the links) the best, will be the people who made the content in the first place.

This really shouldn’t be news to anyone, and yet it seems that this approach, until now, hasn’t been tried, in the main. I really can’t understand why, although given the problems and reluctance to even provide enough accurate data to power the production-based recommendation perhaps provides a clue. But I don’t think I’m alone in advocating this. In the oft-quoted (but perhaps not often enough!) words of Nicholas Negroponte, in 1995’s Being Digital:

We need those bits that describe the narrative with key words, data about the content, and forward and backward references….The bits about the bits change broadcasting totally. They give you a handle by which to grab what interests you and provide the network with a means to ship them into any nook or cranny that wants them. The networks will finally learn what networking is about.”

So that’s not just tags, but data to actually represent the content.

With all this in mind, I’ve begun to compile a mixture of production-based and content-based recommendations for traversing through the BBC’s archive. The next post will provide some examples of this, and lead you through the format and choices I’ve made in representing these links in the n3 format of RDF.

Narratives and the Semantic Web

Super Bowl Sunday Crystal Ball, by Circulating, from Flickr, Creative Commons license

“People assume that time is a strict progression of cause to effect – but actually, from a non-linear, non-subjective viewpoint, it’s more like a big ball of wibbly-wobbly…timey-wimey…stuff.”

The websites that we create around the narratives we tell currently focus on the objects within those narratives, like the playing pieces in a set of toys. More often than not, these are hand-crafted, static pages about certain editorially defined objects. Although we can record the links between objects implicitly as things that the audience can travel along, we tend not to expose them as things that the audience can explore and see in context. This, however, is what we are really interested in when describing narratives or telling stories. We define the objects within the world of the narrative, and then describe the interactions and changes between the objects. The intriguing thing is not necessarily the objects themselves, but the ways in which they change, or otherwise. A truly engaging website would therefore allow the audience to explore the world of the narrative not only by navigating between the objects, but by exposing and analysing the links between them, in order to derive more satisfaction.

Outside of the web, when we focus on an object, our minds give it context, and naturally establish the links between relative objects. For instance, when on a train journey, if I look out the window, I can see that branch of that tree which is placed there. We are instantly aware of both the object and its context, the thing and its links. Do the same thing with a computer, and it could identify and create a URL for a branch of a tree, but this would exist in a vacuum. It is up to us to give it the context. Using the principles and technologies underlying the Semantic Web, however, we can start to embed the context, the links, the meaning, so that, when using the web, we do not have to define these things every time. Instead, we can concentrate on uncovering and analysing those links, so that we can derive greater understanding and enjoyment from them.

Currently, websites such as www.bbc.co.uk/programmes define the objects, their contexts and links in a semantic web fashion, so that we can uniquely identify a particular object. Essentially, it provides the building blocks upon which we can establish the type of website I hae described above. Unfortunately, as far as I am aware, these building blocks are the limits of what we can currently, reliably, achieve. Emerging technologies such as SPARQL and RDF/graph visualisations will help us to build upon these blocks, but I do not think we currently have an established, reliable ‘toolkit’ or process that we can use to do this. However, this does not mean it cannot be done – it needs further experimentation. In the meantime, we can set about ensuring that the websites we build now will allow us to achieve the ideas mentioned above.

In the context of the BBC, there are two areas in which I can imagine the benefits of such an approach. The first, I will only give a overview of, as I have only thought briefly about the possibilities. The other, regarding fictional narrative, has been the focus of my previous blog posts, and I will continue the discussion here.

The first area is sport, particularly football. The BBC Football website contains a wealth of information, covering what is, in effect, the (almost) closed-off world of football. Fans essentially are following a narrative which spans matches, clubs, leagues, seasons, cup competitions etc. There is, obviously, some organisation taking place on the website – organising the clubs into their leagues, for instance. However, the links between these things – and here I mean not just the clubs, but the players, the action – are rarely revealed. We know that a team is relegated from a division because on one day their page exists within the ‘Premier League’ section, whereas the next, they are in the ‘Championship’ section. Their history may be recorded on the team’s page, or preserved in the numbers of a league table for a particular season, but there is no way of effectively (and, most importantly, engagingly) charting their fortunes. Of course, we can present these things in the numbers and bar charts and graphs, but they do not take advantage of the existence of the narrative behind them – which is really what people are interested in. Similarly with players. When two players go in for a tackle, we know that they have a history of confrontation, or perhaps an embarrasing own goal incident – what if we could provide the context around that tackle as and when, and after, it happens – filling in the back story, and getting the audience excited and engaged.

Similarly, by identifying and putting objects and events in context, we can give the audience something to latch on to. Take, for instance, a penalty incident. Say that the match was being covered on 5 Live with a commentary, it was shown and discussed on Match of the Day by pundits, and then also talked about on forums and 606 by fans. If we had an identifiable ‘hook’ for the incident, then potentially we could build a page which brought together all these different interpretations and discussions of the same event. That way, the audience would have an effective overview of the incident, with informed (and perhaps ill-informed!) opinions – their understanding and enjoyment would be enhanced, and of course, they could make their own contribution.

Back to the fiction – in my last post, I linked to a couple of images within which, I tried to explain what I aim to achieve, and where the benefits could be found. The first diagram establishes the episodes as a whole, regardless of series – and then drills down to a particular series, and a particular episode. A website that deals with a fictional narrative needs to remember that the episodes are merely a window onto the universe for the audience. If we intend to allow the audience to fully explore the universe, then apart from pointers leading them from/to episodes, as a form of ‘way-in’ (which, incidentally, should probably be through /programmes) the episodes themselves should (probably) not be included – all that exists are the objects (the places, the times, the characters) and the events.

The first diagram, once an episode has been specified, identifies the characters and events within the episode that are crucial to the narrative. For this, I limited myself to a handful of events and characters, which meant that I did not fully get the richness of the narrative across. However, potentially, we could identify as many events etc. as we require. Below the timeline of events (as presented to the viewer) there are coloured blobs, representing the characters in the events. This view shows us how the characters come and go throughout the episode (for instance, the Doctor only really appearing at certain points in the beginning, middle and end).

The second diagram gets closer to the value of this kind of site. Here, we see that the way in which each character experiences the events of the episode is quite different. This is crucial both to the plot and to the audience’s understanding and enjoyment of the episode. If, for instance, you wondered exactly how things tied together, then exploring this kind of site would allow you to piece together the parts of the puzzle. Perhaps on each character’s page, we would show their timeline, and how things happened to them. From the Doctor’s perspective, for instance, the event at the end of the episode is the first thing that happens to him – and the last from Sally’s point of view. Also, by showing these different timelines in the context of each other, we see the intricate way in which Steven Moffat (the writer) is able to weave the story together – giving the audience a greater appreciation of the story as a whole.

Obviously, Blink (so far) is an atypical episode of Doctor Who. By and large, the stories do not tend to concern themselves with the ‘timey-wimey’ stuff. However, over the course of a series, or indeed several series, characters, events etc re-appear – for instance the ‘Bad Wolf’ motif – the reason that the cliffhanger to ‘Turn Left’ works so well, is because it draws together elements of continuity established throughout several series. The audience gets maximum enjoyment out of such a moment because they are aware of the links and the context.

So what of the original series, whereby both ‘timey-wimey’ stuff and ‘story-arcs’ were at a minimum? Well, there are still instances of recurring themes, but overall, stories are self contained. That’s fine – they could be slotted into this kind of website just like everything else, because it essentially gives us a great pool of narrative to draw upon – if and when needed. Crucially, though, they represent a pool of ideas that future writers can draw upon if they wish. Continuity should not restrict the writing of future stories – the previous stories merely open out the fictional universe, creating more richness for authors. As such, when feeding the ‘classic’ stories into the website, the site becomes a form of ‘official’ wiki. Users can and should be encouraged to contribute, as a form of writing their own stories, but a distinction can be drawn between the events depicted on screen (it is, after all, and should not be forgotten, a television show..) and those where people ‘fill in the gaps’. The series itself has touched upon this, with the idea of certain events being ‘fixed points’ and others being ‘in flux’. As long as the narrative is not disrupted (i.e. breaks down so that it no longer makes sense to the audience) or becomes to insular (i.e. relying too heavily on continuity, so that new audiences are driven away), then continuity can enhance the fictional narrative universe as a whole.

Finally, a new diagram which, on a very basic level, tries to illustrate the idea that the website could be explored and presented through the model of, as quoted above, “a big ball of wibbly-wobbly…timey-wimey…stuff.” The diagram is quite obviously incomplete, but the idea is that the objects and the links between them are visualised, and the audience can then choose to look at a particular object, and see how it ties in to everything else – seeing both the object and its changing context and perspectives at the same time.

Phew. That’s enough for now. Till next time…