Jeffrey Yasskin’s blog

12/17/2004

RDFPath

Filed under: Semantic Web — Jeffrey Yasskin @ 5:05 pm

Given a set of RDF triples, there must be a way to present the data in them. Of course, one can write a program to extract the data, but a DSL is likely to be more efficient. A very good DSL for transforming and presenting XML already exists, XSLT. Its output format is XML, which is what I want. Its input is represented as trees, which are very similar to RDF’s graphs. Thus, I think the RDF presentation language should be built as extensions to XSLT. In particular, I want to design a set of extensions to XPath that will massage an RDF graph into a tree-ish structure that XSLT can understand. I call these extensions RDFPath.

Design goals

  1. A node must be easy to get from a URI and should be easy to get from a qname. A node should include a reference to its model so simple traversals never have to reference the model again.
  2. To the child axis, the tree should appear to be infinite. That is, there should never be a need to call a function like RDF Twig’s rt:twig() or rt:branch().
  3. Nodes should be the fundamental addressed thing in RDFPath (i.e. no striping). This produces some tension. I want to be able to say $me/foaf:mbox and get back <mailto:jyasskin@mail.utexas.edu>. But the node I got back has nothing to do with the property foaf:mbox, whereas in XPath, the node you get back has a qname of the last path step. I think my behavior is what people expect for RDF, but it’s still theoretically tricky. Another problem with this is that now it’s harder, but not impossible, to select the union of two arcs out of a given node.
  4. Even though nodes are the primary entity, arcs must be addressable too. Ideally, I’d like new axes in-arc and out-arc, but if that’s not possible, two functions that return node-sets of the relevant arcs would be acceptable.
  5. It must be possible to perform all operations with either a qname explicitly in the XSLT, or with a URI or resource assigned to a variable. Including them in the location paths would be ideal, but if that’s not possible, functions are acceptable.
  6. The URI of a node (or node-id of a blank node) must be easily accessible. An @rdf:about attribute would be ideal.
  7. The datatype and language of a literal must be accessible but must not interfere with getting the string value.
  8. It must be possible to give options to the RDF store in a store-independent and backward- and forward-compatible manner. Probably this will mean passing a result-tree-fragment stored in an XSLT variable to the load-model() function.

It is explicitly not a design goal to deal with RDF inferencing. I believe that’s the store’s responsibility, not the presentation language’s. If this inferencing needs parameters, they can be passed as options during model creation.

More speculative features

  1. A way to more easily filter nodes by type. Something like arc[rp:has-type(foaf:Person)]. My first thought of how to do this otherwise is: arc[rdf:type[string(.) = ‘http://xmlns.com/foaf/0.1/Person’]].
  2. Streamlined access to RDF lists as node sets
  3. A descendant-through axis. You could specify a set of properties, and this axis would return a subgraph that only contained these properties. It would probably work best as a function that returned a model to be assigned to a variable.
  4. A qname expansion function. i.e. rp:qname('foaf:Person') = 'http://xmlns.com/foaf/0.1/Person'
  5. For debugging, it would be nice if xsl:copy-of worked, probably like rt:branch(). It doesn’t have to be pretty.

Tentative axis assignments

child
Defined on resources. Traverses out-arcs: the context node is the subject; the specified qname is the property; the object is returned.
parent
Defined on resources. Traverses in-arcs: the context node is the object; the specified qname is the property; the subject is returned. The ../ shorthand from XPath is not allowed since the parent node isn’t unique.
descendant
Defined on models and sub-models. Finds all objects of a given property. Could be defined on resources, but cycles make it tricky.
ancestor
Defined on models and sub-models. Finds all subjects of a given property. Could be defined on resources, but cycles make it tricky.
attribute
Defined on resources and (if possible) on literals. Provides access to out-of-band RDF information. So far @rdf:about is the URI of a resource, and @rdf:datatype and @xml:lang are as defined in RDF/XML.

Other axes are undefined and should return errors.

Other attempts

RDF Twig
Pretty good, but requires a lot of extra function calls. It’s a pain to have to keep extending the current subtree with rt:branch() or run out of memory because the branch you already made was too big. It makes an annoying distinction between @rdf:about and @rdf:resource. Datatypes are appended to the literal’s value (value+’:'+datatype), so it’s inconvenient to just use the value. It’s also not quite complete: certain function calls are missing (rt:branch(node-set)), and it doesn’t take advantage of all of the RDF store’s (Jena’s) features. Nonetheless, it works, so I’m using it in my current XSLT scripts.
RDF Template Language
Completely replaces XSLT. Uses a striped path syntax. Requires RDFS inference, but doesn’t mention any other inference types. There is an implementation in PHP.
RxPath
Defined in terms of a transformation from RDF to a striped, infinite XML tree. Uses types as the node element names. I think this is bad because nodes may have 0 or many types. A resource node’s string value is its URI.
TreeHugger
Like RxPath, it’s defined in terms of a transformation to an infinite striped tree. However, only subjects are children of the root node, not all resources. Because it treats the graph as a root node with a whole lot of children, it doesn’t need to make my distinction between models and nodes. Uses three different model-creation functions to determine what kind of inference to do. Uses an inv: namespace to do what I intend the parent axis to do. This looks interesting. If/when I start implementing my solution, I may start from TreeHugger. [spotter=Dan Brickley]
Pondering RDF Path
Uses a striped syntax. Doesn’t try to integrate with XSLT, just uses the model of XPath. Doesn’t explicitly deal with backward traversal of arcs, although it is expressible. Distinguishes between subject and object resources.
Versa
Not XPath compatible. Uses ASCII arrows to express triples: subj-pred->obj. Each part of the triple is a boolean predicate which filters the applicable resources. You can do backward traversal by pointing the arrow in the other direction. Has a traverse() function to do what my descendant-through axis would do.

2 Comments »

  1. Another exercise in this vein (including an implementation) is Damian Steer’s “TreeHugger”, http://rdfweb.org/people/damian/treehugger/

    Comment by Dan Brickley — 12/18/2004 @ 6:33 am UTC

  2. Thanks. I’ve added a section on it.

    Comment by Jeffrey Yasskin — 12/18/2004 @ 5:46 pm UTC

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress