myCMS and the Web of Data - IKS Community Workshop review

Recently GOSS attended the latest IKS Project workshop in Paris.

RSS blog syndicate Subscribe to the feed.

The IKS Project is an open source community, focused on building an open and flexible technology platform for Semantically Enhanced Content Management Systems. It's currently funded by the European Union until the end of 2012. More information about the workshop can be found on the project wiki.

We were there to demonstrate our (prototype) integration of the IKS FISE/Stanbol Entity Extraction and Enhancement engines into GOSS iCM, to learn about the future of the IKS Project and to contribute to 'Semantic Web' conversation. I blogged about how the prototype worked last month.

But what is the 'Semantic Web' and what are the benefits for website managers? The basic idea is very simple. Most of the content on a site is very hard for a machine, like a search engine, to understand. HTML provides information about how the content should be displayed but very little about what that content means without a detailed understanding of the language in which the content was written. By language we are talking about English, Welsh etc. Adding additional information to pages to help search engines is not a new idea, META tags have been around a long time. The concepts behind the semantic web takes this a step further by adding additional information to a page's content, article text in GOSS iCM terms, to further identify and classify it. Done correctly this additional mark-up can link to external information about parts of the content. For example, mentioning a prominent person's name may generate a link to their page on Wikipedia. Essentially this is embedding data in the webpage. This data can aid search engines like Google and Microsoft Bing. The use of the word data is deliberate; there is a close relationship between the semantic mark-up of web content and the publishing of Open Linked Data. This relationship exists through common technology and shared ambitions.

Most of the conference focused on the technology being created as part of the IKS Project to make it easier to semantically enhance content and how that technology is being used. There were also a number of thought provoking presentations about how and why semantically enhanced content should be used.

On the technology side most of the planned infra-structure now exists and the emphasis is on making it perform better. Our own demonstration deliberately used article text that involved globally significant people and organisations in the form of the rock band R.E.M. We'd exhausted our supply of British bands during testing! Our presentation ended by highlighting the fact that the existing technology only works well with globally significant data and struggles with more domain specific article text. An analysis of our results can be found on the IKS site. Thankfully solutions to this problem will be appearing in some of the enhancements presented in the 'IKS in the Lab' session, particularly the Taxonomy Engine and Entity Disambiguation and Document Categorisation features. It was also clear that sometimes more pragmatic solutions to some of the problems can be used. For example within a borough council there are a relatively small number of councillors. This simplifies identification of references to these people even allowing for variations in how they are addressed.

However there is an issue with all the talk about the semantic web, the issue actually is the technology. A lot of it is very complex or, more importantly, the perception is that it is very complex in the eyes of mainstream developers and managers. This complexity combined with the lack of a compelling reason to adopt semantic mark-up for the majority of web sites is holding it back. A number of presentations considered this.

Lynda Moulton's presentation, based on many years experience with semantic technology, highlighted the problems of delivering usable solutions and a tendency to over promise and under deliver. To some extent this was based on experience of very large systems dealing with huge amounts of content. It was interesting to compare this with Mark Greaves' presentation showing how semantically enhanced wikis have been used successfully within fairly narrow domains on smaller sites where the identified semantic information is used to drive a traditional faceted search interface. This relates to the presentation from Seth Grimes where he quotes statistics showing that 76% of visitors rated 'makes it easy for me to find what I want' as the most important factor in the design of a site. Search and navigation improvements are an area where semantic technology can help and this is likely to become the initial reason for adopting it.

The final presentation, from Janus Boye, highlighted his view of the current challenges facing CMS users: Social Media, Mobile, Engagement and Analytics. Social Media and Analytics are integrated into our CMS, our templates are designed to work on mobile devices out-of-the-box and we specialise in online engagement. Good news for users of GOSS iCM v9.1!

There was an Elephant in the Room that, only occasionally, made its presence felt:  Schema.org. This is a proposal from various companies, but predominantly Google and Microsoft for a lighter weight standard for semantic mark-up. It is quite deliberately aimed at improving the information available to search engines. It is also being described in ways that make it much more approachable to the average developer. Technically speaking it does have limitations that are making experienced semantic web developers and standards organisations uncomfortable, which is discussed in this blog. It also doesn't integrate well with technologies used to publish Open Linked Data. But it does have the potential to get the message across about the benefits of semantic mark-up. If users believe their search ranking will improve they will demand that their CMS vendor implements it.

All in all a very interesting couple of days. We were pleased with our GOSS iCM demonstration and feedback, and are already planning the next steps to provide a practical solution to help with adding useful semantic mark-up and to facilitate Open Linked Data publishing and we are sure to be talking about this at the next GOSS User Group.

Posted by Gary Ratcliffe, 18th July 2011