The main objective of the semantic web is to make data on the web “comprehensible” for machines as a form of artificial intelligence.

This is possible by annotating information and creating links between structured data. Applications like search engines or other software can therefore provide their users with more relevant results. Nowadays, companies are very interested in the localization of this new technology. It is a way for customers to have better access to their goods or services, but how is this technology even possible? We will see how the semantic web is built and why it is difficult to localize the multilingual semantic web.

What is Needed to Build a Semantic Web?

In order to structure resources (text, image, audio, video…), humans have to add information describing them. This information is called metadata and the action of adding it to a resource is called ‘annotation’. There are different types of annotation: one of them gives information about the resource itself (author, date, title…) and the others give information about the properties of the content. Content metadata for a text can describe the morphology, syntax and semantic meaning of the words.

Semantic annotations need to be treated in a specific way. They can’t only be based on the perception of one person. They have to respect certain conventions that everyone will use. Moreover, semantic annotations need to be formalized because it is difficult for machines to use natural language, since natural language tends to be too ambiguous.

According to Laublet, Reynaud, & Charlet (2002), these annotations are therefore based on ontologies, in order for them to be shared and well understood by every system. According to TheFreeDictionary, an ontology is a rigorous and exhaustive organization of a particularly knowledge domain that is usually hierarchical and contains all the relevant entities and their relations.

As an example, there are ontologies that are structured with subject/verb/object triplets like “the baby of the cow is a calf”. All these triplets help to conceptualize our language and together form a set of references. Metadata has to be non-ambiguous to facilitate its exploitation by software tools.

Localization is essential for those who want to reach more users with their web content. In the same way, ontologies must be localized so that the semantic webs can work with as many users as possible. A monolingual ontology certainly represents a significant amount of work, but is more simple than a multilingual ontology. Let’s have a closer look at its complexity.

Ontology Localization

According to an article by Caracciolo et al. (2007), the creation of a multilingual ontology must fulfill several requirements.

Some of them are achievable, for instance, by representing lexical items that refer to the same object in multiple languages (e.g. ‘dog’ (English), ‘chien’ (French), ‘நாய்’ (Tamil), and ‘cane’ (Italian) refer to the same animal).

However, some requirements still pose a real challenge. Notably, representing relationships between lexical items (synonyms, acronyms, spelling variances, different types of names) within, and especially across languages is quite a difficult task. In the same paper, Caterina Caracciolo et al. affirm that “we are not able to set relationships between terms in different languages that may be used to represent the same object” (2007).

With the development of the ontology, it would be very useful to have the possibility to add new languages. Here is the challenge. Indeed if we start to build an ontology with English, French and German for example, the ontology has to be sufficiently language-independent to allow any languages to be added to the original work. Caracciolo et al. illustrate this with the following example: “the animal ‘scorpion’ can be seen as related to food in some African countries or in China, but not in other parts of the world (therefore not in other languages)” (ibid.). There are many types of issues like this one.

In the next years, translators will have to face many issues with the semantic web, but as for the multilingual web, the research has already begun (see some relevant Google results) and researchers are working to surpass these challenges soon.

Illustration : Visual representation of the Conceptual Layer of the Friend of a Friend (FOAF) vocabulary