Woozle Wuzzle
BDBXML and document modifications

There are a number of problems with updating XML documents in an XML store. The primary problem is that XQuery 1.0 does not have a facility for updating documents. The XQuery Update Facility Requirements exists but it clearly states "the WG does not intend to produce a Recommendation from this Working Draft" which leaves me with a big fat question mark over my head. This has caused each XML DB vendor to provide their own update mechanism. And this leads to the topic of this post:

BDBXML is a decent XML DB but it's somewhat rough around the edges. When updating documents it is common to get the following error:

Cannot perform a modification on an XmlValue that isn't either Node or Document type, errcode = INVALID_VALUE

To make a long story short, you cannot specify a doc('...') in the XmlQueryExpression that you specify to any of the XmlModify modification methods. The documentation implies this but does not drive the point home. Given that all non-modification uses of XmlQueryExpression require some sort of "navigation function" (collection('...'), doc('...'), etc) it feels odd not to specify it for modfications.

More on XSD, PSVI and non-native attributes

If you have been following along with the trials and tribulations of XSD, PSVI and non-native attributes then you have been left with wondering about case where you have a non-native attribute and no <xsd:annotation>. For example:

<xsd:element name="something" myNS:name="Something" ...>
  <xsd:simpleType>
    ..
  </xsd:simpleType>
</xsd:element>

Since there is no <xsd:annotation> one might expect that there is no XSAnnotation object. This is exactly what occurs. So then, how does one access the non-native attribute? After a brief session of splunking through Xerces I stumbled across the notion of a "synthetic annotation". A quick hop over to Google and one quicly finds out about the generate synthetic annotations feature which purports to "[generate a synthetic annotation] when a schema component has non-schema attributes but no child annotation".

So we're done, right? Well, no. The page from which the "generate synthetic annotations" is taken is actually for the SAX parser which is not what we use. A quick search of the intersection between XSModel, XSLoader (which is used to parse the XSD into the XSModel) and "feature" reveals nothing. Broadening the search to include all of the synonyms of "feature" (such as "parameter", "attribute" and "property") finally reveals a hit. XSLoader exposes a DOMConfiguration which allows one to set "parameters". Listing all parameter names (via getParameterNames()) shows the sought after "generate-synthetic-annotations". Whew!

To round this out, the synthetic annotation appears as:

<xsd:annotation myNS:name="Something" ...>
  <xsd:documentation>SYNTHETIC_ANNOTATION</xsd:documentation>
</xsd:annotation>

and the setup code to get an XSModel from an XSD is:

System.setProperty(DOMImplementationRegistry.PROPERTY, 
                   "org.apache.xerces.dom.DOMXSImplementationSourceImpl");
final DOMImplementationRegistry registry = 
        DOMImplementationRegistry.newInstance();
final XSImplementation xsImpl = 
        (XSImplementation)registry.getDOMImplementation("XS-Loader");
final XSLoader schemaLoader = 
        xsImpl.createXSLoader(null/*all XML Schema Versions*/);

// NOTE:  synthetic annotation nodes MUST be created for non-native 
//        attributes to be parsed and added to the XSAnnotation object
//        (for cases where there is no )
final DOMConfiguration config = schemaLoader.getConfig();
    config.setParameter("http://apache.org/xml/features/generate-synthetic-annotations", 
                        true);

final XSModel xsModel = schemaLoader.loadURI(xsdURI.toString());
The problem with XML Schema

Let's just clear the air up front: I like XML Schema. It's convient. It's solves about 90% of all XML validation concerns. It's concise.

I have been doing work lately that leverages the validation provided by XSD by supplimenting it with JavaScript and Java. The problem with XSD is that it's not just simple XML. Sure, it's written in XML, but that's not the point. You can't just rip through an XSD with XPath an extract out the information that you want. This becomes obvious when you think about the structure that XSD represents: there's inheritance and references and all kinds of things that go "Bump" in the night. So the primary way that you can get at the guts behind what XSD is providing is via the Post-Schema-Validation Infoset (PSVI). But there's a problem: it appears that there is no way to access non-native (i.e. non-xsd) attributes. It appears that if you have extended XSD in any way then there's no way to access this information. Why allow it to be specified if one cannot access it?

Problem solved

I have been banging my head looking for a way to access non-native attributes of an XSD via PSVI. The problem I kept hitting was that the XSD API defines a seemingly limited XSAnnotation. There is only a annotationString method for retrieving the annotation. Taken literally (which is what I was doing) this will return the information within the <xsd:annotation> and that's it.

I started looking for alternative techniques. I found some interesting information regarding XML Beans, SchemaAnnotation and non-native attributes. And this got me to thinking. I started to do some code splunking in Xerces and found that the member that backs getAnnotationString() looks like:

// the content of the annotation node, including all children, along
// with any non-schema attributes from its parent
private String fData = null;

Well that certainly does not match the PSVI description of "A text representation of the annotation". The key is the "along with any non-schema attributes from its parent". If your XSD looks like:

<xsd:element name="something" myNS:name="Something">
  <xsd:annotation>
    <xsd:appinfo>Stuff</xsd:appinfo>
  <xsd:annotation>
  ...
</xsd:element>

then getAnnotationString() will return:

<xsd:annotation myNS:name="Something" ... >
  <xsd:appinfo>Stuff</xsd:appinfo>
<xsd:annotation>

No, really.

Now those of you that are careful readers are likely sitting there wondering how this is even possible since it's inconsistent. You're wondering what happens when your XSD looks like:

<xsd:element name="something" myNS:name="Something">
  <xsd:annotation myNS:name="Something else">
    <xsd:appinfo>Stuff</xsd:appinfo>
  <xsd:annotation>
  ...
</xsd:element>

Well, I'm sure you've already guessed the answer:

<xsd:annotation myNS:name="Something else" ... >
  <xsd:appinfo>Stuff</xsd:appinfo>
<xsd:annotation>

Yup, the attribute from the <xsd:element> is "overwritten" and lost. And people wonder why I get bitter about these things. Always follow the rule of thumb: anytime that you do something "crafty" (such as updating the annotation to include the "annotations" (non-native attributes) from the parent element) you're shooting yourself in the foot.

Why doesn't XSObject have a XSObjectList getNonNativeAttributes() where the XSObjectList contains perhaps XSNonNativeAttribute objects, I don't know. But that would have certainly saved me a few hours.

This is continued in More on XSD, PSVI and non-native attributes.

NamespaceContext and XPath

I am using Xerces to parse an XML Schema annotation. I stumbled across two intesting situations when dealing with Xerces PSVI:

  1. The contents of an annotation can only be retrieved as a string. It would have been nice to have access to a Node object instead.
  2. When using XPath, one must provide their own NamespaceContext object when using namespaces. Why a trivial implementation that was backed with a Map of strings was not provided I cannot guess. (This isn't specific to PSVI but this is the first time that I'm using Xerces rather than dom4j which provides SimpleNamespaceContext via jaxen.) Stefan Podkowinski has felt my pain. The O'Reilly Network has code for an implementation.

As a side note to the NamespaceContext: if you have a default namespace in the XML that you are parsing then you must have a blank namespace (not null but "") registered with the NamespaceContext.

Creative Commons License Unless otherwise expressly stated, all original material of whatever nature created by Rob Grzywinski and included in this weblog and any related pages, including the weblog's archives, is licensed under a Creative Commons License.