Chris's Rants

Thursday, May 06, 2004

Sorry, I still don't get it

Mark writes about the motivation for the Representation Header.

This is one of the rare occasions when I have to disagree with him. I still don't understand why he feels it necessary to duplicate MIME in SOAP.

Note: the usual caveats apply here as well... these are my opinions and not necessarily those of my employer, or for that matter my esteemed IBM colleagues on the XML Protocol WG.

I would agree that many of the implementations of SOAP Messages with Attachments are a mess. The reason for this is (I believe) due to the very reason that Mark cites:
People started to build a lot of software using SwA, and as a result had to model applications as an XML message + attachments.
Of course, I don't believe for a moment that they had to model their applications in this manner; but indeed many did which is why we are where we are today. I don't consider this a failing of SOAP Messages with Attachments; I consider it a failure of those who simply refuse to model their applications as being on the Web.

I hear the arguments echoing in my head to this day: "we don't want to have to force the developer to resolve a URI to get the image...". Why not? Is it really that hard? No, that's not the reason. No, I think it's because many refuse to treat "attachments" as separate resources in the first place.

Let's consider for a moment some classical motivations for using "attachments". A common use case is: "I have this really huge MPEG encoded MRI scan that I need to transfer... I don't want to base64 encode the MPEG encoded data and in-line it in the SOAP message because a) the base64 encoding bloats the size of the MPEG encoded data by an additional 1/3 and b) it will impede performance of processing the SOAP message since the XML Parser will need to parse over the bloated base64 encoded data."

Consider the case where you have a web page that displays an image. The HTML would look something like this:

<html>
<head>
<title>Foo</title>
</head>
<body>
<img src="http://example.org/images/img.jpeg"/>
</body>
</html>


As you can see, the image is a separate resource, with its own URI. The browser can choose to resolve that URI and inline the image (or not) as it sees fit, or as configured by the user. The browser has no problems dealing with this abstraction. Sure, there's an extra HTTP GET to resolve the image resource... a small price to pay really. In this case, the image is modeled as a separate resource.

Using RFC2557, the RFC upon which SwA is based, you could package up the HTML and the image in a multipart/related MIME package; with the packaging software simply resolving each of the img/@src URI and storing the retrieved representation away as separate body parts in the multipart message each identified with its corresponding img/@src URI. The rendering software that processes this message is/can be the same software used in a browser context; the only difference is that the URI resolver is made aware of the multipart/related MIME package so that before it looks on the Web for the URI, it checks the multipart/related MIME package to see if there's a body part that has a content-location with the same URI value and returns that body part instead of retrieving it over the Web. It doesn't get much simpler than that.


0 Comments:

Post a Comment

<< Home