Saturday, June 2, 2007

Semantic Desktop and KDE 4 - State and Plans of Nepomuk-KDE

(For full text with comments please click on the title)

Nepomuk-KDE is the basis for the semantic technologies we will see in KDE 4. Sebastian Trüg, the main developer behind Nepomuk-KDE, provided me with some up2date information about the current state and future plans.

The Semantic Desktop describes the idea where users will not only be able to search existing information, but also to search for the meaning and relation of these information. The Nepomuk project creates open standards and APIs around this idea.
And Nepomuk-KDE is the implementation of these standards for KDE.

Nepomuk-KDE: Basics

Technically Nepomuk-KDE uses mainly RDF/S for storing the aggregated data. RDF/S is the standard for storing meta data for the Semantic Web and is therefore also used as the standard in the Semantic Desktop.
The current implementation of Nepomuk-KDE contains an implementation of an RDF repository which stores all the data. According to Sebastian the data can be accessed by DBUS which is the default way to communicate in Nepomuk-KDE (of course). But there are also other ways which might be more convenient to KDE developers:

For a KDE developer, however, it is much simpler to use the knepomuk library which provides convenience wrapper classes to access the repository.
Additionally there is the KMetaData library which is yet another wrapper library which provides easy access to the metadata by a resource-centric view. This is what is supposed to be used for implementing stuff like tagging or rating in applications.

The definitions (aka Ontologies) how the data like tags, comments and so on should be stored in the repository can be found in kmetadata/ontologies in KDE’s svn or in the directory $KDEDIR/share/apps/knepomuk/ontologies if you install kmetadata on your hard disk.

Last but not least, if you integrate Nepomuk-KDE with an application KMetaData can help generating code that hides all “nasty meta data type and property named and type conversion”. See the tutorial KmetaData First Steps at techbase. Also, have a look at the KMetaData apidox.

Besides these entry links and the homepage itself the best place to entry the development is, of course, subscribing to the Nepomuk-KDE e-mail list. Btw., one of the current topics is about a possible new, more catchy name for Nepomuk-KDE :)

Nepomuk-KDE: Current State

So much about the basics and development stuff - now to the sparkling bits:
In the current implementation Nepomuk-KDE enables the user to store additional meta data in form of tags, comments or ratings. See Nepomuk-KDE in action within Dolphin:

Nepomuk-KDE - Dolphin integration with rating

The music file is rated, has a comment and also a tag below the comment field. And this cannot only be done with music files but with all kinds of files:

Nepomuk-KDE - Dolphin integration with txt file

These comments and tags can be searched of course:

Nepomuk-KDE - search after environment in comment

Nepomuk-KDE - search after KDE 4 in tags

In the first image everything is searched for the term “environment” - and a file with a comment containing this string is listed. In the second search example the string “KDE4″ is searched, and the result shows files which are tagged “KDE4″.

So, finally, tagging has also reached KDE 4 - I must admit that I’m very pleased with this result because I’ve waited for tag support in KDE for much too long.

Nepomuk-KDE: Future

At the moment Sebastian works at a backend for strigi. The final goal is to share the data backend so that strigi uses RDF as well. This would create one single data pool for all meta data on your machine.

The next steps are to integrate Nepomuk-KDE further with the applications of the desktop: There is a Google Summer of Code project to replace digikam’s rating and tagging system with the one from Nepomuk-KDE. Amarok already has exactly the features supported by Nepomuk-KDE for all files (rating, tagging, comments) therefore it only makes sense to merge the data as well. And of course all PIM applications contain a huge amount of data which should be analyzed semantic wise: think of displaying not only an e-mail by a contact but also all related e-mails by other contacts and also all related files which were sent and received.
You can extend this list of applications with any program which needs or wants to store any kind of additional information to used or modified files.

Besides integrating Nepomuk-KDE into other files there is also work ongoing to bring other meta data to Nepomuk-KDE: while atm tags, rating and comments are supported there will be many more types in the future. As already mentioned above with the PIM example data can be grouped around discussions (which e-mail is a forward or reply to which) but also around origin (where does this file comes from and where have it been before).
Also, think of an image viewer which does not onl display the given image but also all images taken at similar times or with similar people on it (digikam supports this already with person tags) or even taken at similar places (geo tagging or identifying names like barcelona07.jpg).
And if you really dare to have a look at a possible but yet far away future: IBM, a Nepomuk partner, has text analyse tools which could be used to analyze the content of for example mails to get a better understanding of what the e-mail is really about.

And there are other things which have to be done as well: Data export, cooperation with other desktop environments, etc. For example, the Nepomuk project itself plans to create a P2P based solution for sharing files together with their meta data, and at some point in the future Nepomuk-KDE should implement this part of the standard as well.

As you see there are lots of things to do, and there is room for almost every kind of participation. Just send an e-mail to the developers list. You can also simply leave a comment at this post if you are interested, the developers will keepn an eye on the comments.

0 comments:

 
Blogging Secret