Proposal for a RESTful BIRN XML document registration service

Introduction

The proposed XML registration service is a simple storage service for well-defined XML documents, accessible via a REST interface.
Documents can be downloaded, uploaded, queried using XPath expressions, and deleted. Collections of entries can be downloaded as plain XML, in Atom Syndication Format (ASF), and in Really Simple Syndication Format (RSS).
Certain operations of the storage service require authentication and authorization.
Why REST? Existing tools, like curl, can be leveraged as clients. No separate client needs to be provided and maintained or installed by users.

Operations

Overview

{Base} Base URL of the service (currently running at https://info.nbirn.org:8443/cxfiis/reg)
{Name} This defines the name of the registry. Each registry manages XML documents of different types, while all entries within the same registry are of the same type. We support the names 'CapabilityRegistry' and 'CapabilityDefinitionRegistry', which are the root elements of the currently supported XML types.
{Id} Unique identifier for a document in a registry

Url GET PUT POST DELETE
{Base}/{Name} X   X  
{Base}/{Name}/schema X      
{Base}/{Name}/atom X      
{Base}/{Name}/rss X      
{Base}/{Name}/xpath?expression=<XPATH_EXPRESSION> X      
{Base}/{Name}/entry/{Id} X     X
{Base}/{Name}/entry/{Id}/html X      
{Base}/{Name}/entry/{Id}/xpath?expression=<XPATH_EXPRESSION> X      

Details

{Base}/{Name}

GET Get a collection of all entries currently stored in the registry. The format is plain XML. No authentication or authorization required.
POST Store the serialized XML entity from the body in the registry. Authentication and authorization required. We currently don't make a difference between adding a new resource to a registry and updating an existing resource. POST applies to both use cases.

{Base}/{Name}/schema

GET Get the schema of the registry in plain XML format. No authentication or authorization required.

{Base}/{Name}/atom

GET Get a collection of all entries currently stored in the registry in Atom syndication format. No authentication or authorization required.

{Base}/{Name}/rss

GET Get a collection of all entries currently stored in the registry in RSS format. No authentication or authorization required.

{Base}/{Name}/xpath?expression=<XPATH_EXPRESSION>

GET Evaluate XPATH_EXPRESSION on all entries currently stored in the registry. No authentication or authorization required.

{Base}/{Name}/entry/{Id}

GET Get a serialized XML string representation of the entry with id {Id}. The format is plain XML. No authentication or authorization required.
DELETE Delete entry {Id} from the registry. Authentication and authorization required.

{Base}/{Name}/entry/{Id}/html

GET Get an HTML representation of the entry with id {Id}. No authentication or authorization required.

{Base}/{Name}/entry/{Id}/xpath?expression=<XPATH_EXPRESSION>

GET Evaluate XPATH_EXPRESSION on entry {Id} in the registry. the registry. No authentication or authorization required.

Data Types and Validation

For each type of registry there exists an XML schema. Elements to be added to the registry are validated against the schema. Only schema-compliant documents are accepted.

Data Storage

Currently there are two ways to store the data:

Resource Lifetime

Current concept

Each registry can be configured to allow a limited lifetime for its documents. The XML schema of a document must specify an element/attribute for lifetime handling.
Admins can set a per-registry max lifetime. If a client specifies an invalid expiration date in a request, i.e. expiration date is in the past or too far in the future, the document is rejected.
Clients don't need to specify an expiration date. If no expiration date is specified, the configured max lifetime for the registry is applied to the document.
Periodic checks for expiration are performed on each resource in a registry.

Better concept

Lifetime properties should not be required to be part of the XML resources of a registry, i.e. we should not require XML schemas of registries to define an element/attribute for lifetime handling. The lifetime of a resource of a registry should be an external configuration parameter. Lifetime is a per-registry parameter and not a per-resource parameter, because the resources of a registry are static, and not dynamic like jobs or transfers which may require individual lifetime depending on the duration of the task.
Therefore, a client should not be able to specify individual lifetime for resources in a registry. The registry defines a default lifetime in seconds which is applied automatically to all of its resources.

A client must have ways to

Removal of expired resources can be scheduled using Quartz. Quartz can persist its scheduled jobs, so that jobs survive a container/server restart. The advantage of scheduling and persistence is that no periodic expiration checks must be performed per resource which could be a performance issue with large registries. When a new resource is POSTed to a registry, a removal job is scheduled for this resource.

Upon a client request to get the remaining lifetime of an existing resource the quartz scheduler is queried for the scheduled removal of an existing job.

Lifetime handling should be an optional capability of a registry, i.e. a registry admin should be able to configure a registry that does not apply lifetime to its resources.

Security

User credentials

Authentication and authorization is based on HTTP Basic Authentication. It is recommended to use HTTP over SSL to make sure the credentials are protected.

Authentication

On the server-side users are authenticated at a MyProxy server. Using the username and password from the HTTP Basic Authentication header the server connects to the BIRN-wide MyProxy server. If the connection is successful, the user is authenticated. Otherwise the user is rejected.

Authorization

Authorization decisions are based on the XML data that is being stored in the registry, i.e. the content of a certain element of an XML document is being used to decide whether a certain user is authorized to POST or to DELETE an entry.
For each registry type different XML elements can be used for authorization, because each XML type can of course contain different information.
The documents of the currently supported XML types contain an element, which uniquely identifies a certain document from other documents of the same type.
Mappings of usernames to a list of regular expressions are stored in a file maintained by an admin for each registry. Upon a request that must be authorized, it is checked whether one of the regular expressions defined for the user who sent the request matches the specific element in the XML document that is being used for authorization decisions. If at least one of the regular expression matches, the user is authorized, otherwise the user is rejected.
Authorization decisions can also be based on a collection of elements of a document. For each type an individual AuthorizationHandler can be defined which is used for authorization decisions.

HTTP Status Codes

Success

200 - Ok Upon all successful requests. We currently don't return '201 - Created' upon a successful POST request.

Errors

401 - Unauthorized The user cannot be authentication or is not authorized to for a request to a certain resource
404 - Not Found The requested resource does not exist
500 - Internal Server Error All other errors, including server errors

Error handling

In addition to the HTTP status codes the HTTP response body contains an error element including a human readable message that indicates the root of the problem.

ToDo