Guile-RDF
============

Guile RDF is an implementation of the RDF format defined by the W3C for
GNU Guile.  RDF stands for Resource Description Framework and it is a way
to represent information.  RDF structures include triples (facts with a
subject, a predicate and an object), graphs which are sets of triples, and
datasets, which are collections of graphs.

Each node in the graph represents a "thing", a concept or concrete object,
and edges represent relations between them.  Each node and relation is either
an IRI, a blank node or an RDF literal. An RDF literal itself has a type,
represented by an IRI.

RDF specifications include the specification of concrete syntaxes and of
operations on graphs.  This library is not yet complete, but already has some
basic functionalities: an internal representation of RDF datasets, some
predicates and an initial parser for turtle files.

Installing
----------

In order to install, your best option is to use the [Guix](https://guix.gnu.org)
package manager.  It can run on any existing Linux distribution, and is guaranteed
to not interact with its host distribution.  Installing Guix is as simple
as running the
[installation script](https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh).
Once installed, you can run:

```bash
guix install guile guile-rdf
```

Otherwise, your package manager might have guile-jsonld available.  You can also
build it from source, like so:

```bash
git clone https://framagit.org/tyreunom/guile-rdf
autoreconf -fiv
./configure
make
sudo make install
```

You will need guile and guile-json for it to work.  Again, the best way to obtain
the dependencies is to use Guix from this repository:

```bash
guix environment -l guix.scm
```

`guix.scm` is a file that is provided with this repository. You can use it to
setup a development environment, as shown above, or to build the package, using
maybe a different source, like this:

```bash
guix build --with-sources=guile-rdf=$(PWD) -f guix.scm
```

Testing
-------

The tests include running the official
[test suite](https://w3c.github.io/json-ld-api/tests/).  It requires network
access.  To run it, use:

```bash
make check
```

Please [report](https://framagit.org/tyreunom/guile-jsonld/issues) any failure!

Documentation
-------------

This section documents the RDF library.  It is mostly based on the different
recommendations from the W3C.

### RDF Structures

The RDF Structure is defined in module `(rdf rdf)`.

#### **Scheme Datatype**: `rdf-dataset`

An RDF dataset is a set of graphs, with one default graph and a set of named
graphs.  This type has the following fields:

* **default-graph**: the default graph, an RDF graph
* **named-graphs**: the list of named graphs, an alist where keys are names and
  values are graphs.

#### **Scheme Datatype**: `rdf-triple`

An RDF triple is a truth assertion that a subject is linked to an object by
a certain predicate.  This type has the following fields:

* **subject**: The subject, which can be a blank node, an IRI, a datatype or
  a literal
* **predicate**: The predicate, which can have the same type of values
* **object**: The object, which can have the same type of values

Note that the recommendation restricts the possible values for predicate
further (it should not be a blank node for instance), but also introduces the
notion of *generalized RDF*, which corresponds to our definition of a triple.
This is useful for entailment.  A valid RDF triple can still be represented with
this datatype.

#### **Scheme Datatype**: `rdf-literal`

An RDF literal is the value of a node.  This type has the following fields:

* **lexical-form**: The lexical form of the literal, a unicode string.
* **type**: The type of the literal.  This can be either an IRI or an RDF
  datatype (described later).
* **langtag**: An optional language tag.  Note that when `langtag` is defined,
  the type is necessarily rdf:langString.

Note that the `langtag` restriction only applies semantically.  Operations on
RDF graphs and datasets as implemented in this library do not check that it is
well-formed.  Parsers and producers will fail to execute when the type is not
as expected though.

#### **Scheme Procedure**: blank-node? node

Returns wether a node is a blank node or not.  Blank node representation is
internal and should not be relied upon as it might change without prior
notice.  Two blank nodes can be compared for equality (or unequality) with
`equal?`.  Other procedures are not guaranteed to work on blank nodes.

#### **Scheme Procedure**: rdf-graph? graph

Returns whether a scheme value is an RDF graph.  This does not check the
consistency or validity of the graph, but merely that it is syntactically
correct.

### RDF Datatypes

Datatypes are used to add semantics to literals.  The `(rdf rdf)` further defines
them, as well as some base datatypes.

#### **Scheme Datatype**: `rdf-datatype`

This type has the following fields:

* **iris**: A list of IRIs that represent this type
* **description**: A string describing that datatype, usually taken from
  documentations or recommendations
* **literal?**: A procedure to check whether a string is a literal of that type
* **value?**: A procedure to check whether a value is of that type
* **lexical->value**: A procedure to transform a valid literal into a value value
* **value->lexical**: A procedure to transform a valid value into a valid literal

Note that there might be more that one valid value or literal to transform into.
The last two procedures will choose one canonical representation.

The documentation does not refer to `value->lexical`.  It is an addition of this
implementation.

#### **Scheme Datatype**: `rdf-vocabulary`

A vocabulary is a collection of datatypes.  This implementation also equips a
vocabulary with utility functions.  This type has the following fields:

* **datatypes**: A list of RDF datatypes
* **order**: A procedure that returns whether the first datatype's value space is
  included in the value space of the second (i.e. whether it is smaller).
* **compatible?**: A procedure that returns whether the two datatypes passed as
  parameters are compatible, i. e. their value space is not disjoint.

Compatibility is assumed to be total (it always answers for any pair of recognized
datatype in the vocabulary).  One of the consistency conditions of a graph is
that when a node has multiple types, they must have at least one value in
common (for instance, a node can be both an integer and a decimal, because
integer values are both integers and decimals, but it cannot be a boolean and an
integer).

The type consistency of a node is mathematically expressed as the non-emptyness
of the intersection of value spaces of all the types of the node.  It is assumed
in this implementation that, when all the types or two-by-two compatible, that
intersection is not empty.  This is not true in general, but works at least of
the base vocabulary included in guile-rdf.

**Help wanted**: if you can come up with a better algorithm, please share!

#### Available Datatypes in `(rdf rdf)

* rdf:langString
* rdf:XMLLiteral

#### Available Datatypes in `(rdf xsd)`

When you import this module with `#:prefix xsd:`, you can easily use these
literals with that prefix, in the same way you would write it in a concrete
RDF document.  For instance, the following is a valid triple:

```scheme
(make-rdf-triple
  "http://example.org/a"
  "http://example.org/prop"
  (make-rdf-literal "10" xsd:integer #f))
```

Representing (in turtle syntax):

```
@prefix xsd: http://www.w3.org/2001/XMLSchema#
<http://example.org/a> <http://example.org/prop> "10"^^xsd:integer .
```

Available datatypes are:

* xsd:boolean
* xsd:string
* xsd:decimal
* xsd:integer
* xsd:int

### Graph Operations

The `(rdf rdf)` module also defines some graph operations.  They are presented
below.

#### **Scheme Procedure**: `merge-graphs g1 g2`

Merges two graphs.  As graphs are collections of RDF triples, this is very
similar to appending the two sets.  However, we must ensure that we don't
accidentaly merge blank node identifiers that should not be merged, as two
distinct blank nodes can have the same internal representation in both graphs.

#### **Scheme Procedure**: `rdf-isomorphic? g1 g2`

Returns whether two graphs are the same.  Two graphs can have a different
representation because of order and because of differing blank node
representations.  For instance the following graphs (in turtle format) are
isomorphic, even though their representation is different:

```
_:a1 <http://example.org> "10"^^<xsd:integer>
```

and

```
_:bn <http://example.org> "10"^^<xsd:integer>
```

However, the following is *not* isomorphic with any of the previous graphs:

```
_:a1 <http://example.org> "010"^^<xsd:integer>
```

Because the literal representation of `10` differs.

#### **Scheme Procedure**: `rdf-dataset-isomorphic? d1 d2`

Returns whether two datasets are the same.  Two datasets can have a different
representation because of order and because of differing blank node
representations.  They are isomorphic when there is a one-to-one mapping
between them, such that blank nodes from one map to blank nodes of the other,
and vice-versa.  Note that the mapping can map differently named blank nodes
even when the name a named graph.

#### **Scheme Procedure**: `recognize graph vocabulary`

Transforms a graph to replace every instance of recognized IRIs in the
vocabulary by an RDF datatype.

### RDF Semantics

RDF gives a semantics to graphs.  It defines four entailment regimes where
the concepts of a *valid graph* and *entailment* are defined.  An entailment
is a similar concept to an implication, when we interpret graphs as statements
about the world.  A graph G entails E, if in any world where G is "true", E is
also "true".

In order to prove an entailment, we need to check the validity of the claims of
every triples of E, with regards to G.  There is only one rule common to every
entailment regime: any triple is valid with regards to G if it is not valid.

#### The Simple Entailment Regime

The first entailment regime is the *simple entailment regime*, defined in
`(rdf entailment simple)`.  In this regime, any graph is valid, so we canot
derive False.  Since E can contain blank nodes, we need to create a mapping
from blank nodes in E to nodes (or blank nodes) in G.  G entails E if and
only if such a mapping exists and is valid, i. e. every mapped triple of E is
a triple of G.

The following procedures are available:

**Scheme Procedure**: `consistent-graph? graph`

Returns whether a graph is consistent in the simple entailment regime.

**Scheme Procedure**: `entails? G E`

Returns whether a graph G entails another graph E.

#### The D Entailement Regime

The second entailment regime is the *D entailment regime*, defined in
`(rdf entailment d)`.  This regime is parameterized by a vocabulary D (defined
datatypes).  A graph is valid if and only if all its recognized literals
(whose type is in D) have their lexical value in their lexical space.

For instance the following is not a valid graph:

```
_:a1 <http://example.org/prop> "ten"^^xsd:integer .
```

because the lexical space of `xsd:integer` does not include `"ten"`.

Entailments work in a similar fasion to the simple entailment regime, but,
for literals of a recognized datatype, it is sufficient to have the same value
(the simple entailment regime restricts literals to having the same lexical
form).  For instance, the two triples are equivalent in the D entailment regime:

```
_:a1 <http://example.org/prop> "010"^^xsd:integer .
_:a1 <http://example.org/prop> "10"^^xsd:integer .
```

because their objects both have the same value `10` (but a different lexical
form).

The following procedures are available:

**Scheme Procedure**: `consistent-graph? graph vocabulary`

Returns whether a graph is D-consistent, with regards to the vocabulary, an
`rdf-vocabulary` object.

**Scheme Procedure**: `entails? G E vocabulary`

Returns whether a graph G D-entails another graph E, with regards to the
vocabulary, an `rdf-vocabulary` object.

#### The RDF Entailment Regime

The third entailment regime is the *RDF entailment regime*, defined in
`(rdf entailment rdf)`.  This regime is parameterized by a vocabulary.  A graph
is valid if it is D-valid and if the types of every nodes are compatible.

In RDF, a node can have zero, one or more types.  When it has more than one type,
it is only valid if its types are compatible, meaning that there is at least
one value (in the value space, not the lexical space) that is in the value
space of all its types.  For instance, a node can be both an integer and a
decimal because `10` is in the value space of both types.  A node cannot be
a decimal and a boolean because no value is in both spaces at the same time.

Entailment in this regime is more complex and we will not describe it here.
Suffices to say that some derivation rules are added, and we can implement them
by first extending the graph G with new facts about the world that can
be derived from it.  Once we have exhausted all possible extension of G, we can
apply the D entailment regime.

The following procedures are available:

**Scheme Procedure**: `consistent-graph? graph vocabulary`

Returns whether a graph is RDF-consistent, with regards to the vocabulary, an
`rdf-vocabulary` object.

**Scheme Procedure**: `entails? G E vocabulary`

Returns whether a graph G RDF-entails another graph E, with regards to the
vocabulary, an `rdf-vocabulary` object.

#### The RDFS Entailment Regime

The last entailment regime is the *RDFS entailment regime*, defined in
`(rdf entailment rdfs)`.  this regime is parameterized by a vocabulary.  A graph
is valid if it is RDF-valid and if the subclasses are compatible.

In RDFS, nodes can have a class, and a class system exists that orders classes
in terms of subclasses.  The class system is valid if and only if, for any type
B which is a subclass of A, its value space is included in that of B.  For instance,
xsd:int is a subclass of xsd:integer (because its value space, a finite interval,
is a subset of the value space of xsd:integer, which is infinite), but
xsd:int is not a subclass of xsd:string.

As with RDF, the RDFS entailment regime adds more deduction rules and we use them
to exted the graph G.  When the graph is fully extended, we use the D-entailment
regime to check whether the extended G entails E.

The following procedures are available:

**Scheme Procedure**: `consistent-graph? graph vocabulary`

Returns whether a graph is RDFS-consistent, with regards to the vocabulary, an
`rdf-vocabulary` object.

**Scheme Procedure**: `entails? G E vocabulary`

Returns whethe a graph G RDFS-entails another graph E, with regards to the
vocabulary, an `rdf-vocabulary` object.

### Turtle Format

Turtle is a textual format to represent RDF graphs.  We include a parser and
a generator in guile-rdf.  The `(turtle tordf)` module defines a parser:

#### **Scheme Procedure**: `turtle->rdf str-or-file base`

Generates an RDF graph from the file or string passed as first argument
(we first check whether the string is a file on the filesystem, then we
parse it as a string).  The `base` is the document base or `#f` if there is
none.  When a document is downloaded from the internet, the base is typically
the URl of that document, or the value of a base header.

#### **Scheme Procedure**: `rdf->turtle graph`

Generates a string representing a turtle document for the `graph`.  This is more
accurately a N-Triples representation of the graph, but that format is a subset
of Turtle.

### N-Quads Format

N-Quads is a textual format to represent RDF datasets.  We include a parser and
in guile-rdf.  The `(nquads tordf)` module defines a parser:

#### **Scheme Procedure**: `turtle->rdf str-or-file`

Generates an RDF dataset from the file or string passed as first argument
(we first check whether the string is a file on the filesystem, then we
parse it as a string).