How to manage XLIFF Files?

How to manage XLIFF Files?

XLIFF stands for XML Localization Interchange File Format. An OASIS open standard for the exchange of Localization and Translation content. XLIFF is considered the standard native file format for almost all Translation CAT Tools as well as for translators.

The concept behind the XLIFF format.

The idea here is to standardize the format of the content that is going to be translated. Almost all types of content may be subject to translation. Based on this, the CAT tool and translators must be aware of all types of files which is not possible! The original content can be created natively in any format like HTML, XML, InDesign, XML, Properties, and many other formats. To save time and cost we need to have only one format that can be recognized and manipulated by all in the localization industry and from here came the XLIFF.

The Structure of the XLIFF file

An XLIFF file has ONE standard structure regardless of the native file type. It always has the following pattern.

Managing XLIFF Files

In an XLIFF file, each segment of the content is saved within a Translation-Unit. Each Translation-Unit has 2 sub-elements SOURCE and TARGET. Before translation, both elements will have the same text. The translator will read the content in the source element and overwrite the content in the target element with the equivalent translation.

Converting Content to XLIFF format

Content is originally created in different formats and needs to be converted into XLIFF before translation. What is the process? Nowadays, the new versions of CAT Tools facilitate the process of converting different types of files into XLIFF. But in some cases, it is not that easy. For example, website content saved from Content Management Systems. Here we use an Add-On to Export the content and use it again to import the translation back. This add-on needs to identify first what content needs to be translated by checking the related settings of the property that holds the content. This is important to avoid exporting non-relevant content like JavaScript. Then, the tool will export the content in the XLIFF format. After translation, the tool must be able to import the translation back to the CMS.

It is well-known that each CMS has its own specifications in handling content. To speed up the process of developing add-ons, that deal with different Content Management Systems, you can develop a standalone module that includes all the actions and properties of creating XLIFF files regardless of the CMS characteristics. So you can reuse it each time you are creating a connector for a CMS. Like this, you don’t need to reinvent the wheel each time.

How to prepare XLIFF files?

Now we converted the content into XLIFF. How to prepare it for translation? In most cases, the engineering preparation will be minimum. But again, for some Web or Software Content, there is more to do. Check the below piece of content in an XLIFF file:

All the content in red is not for translation and need to be excluded. The CAT tool is not able to detect and protect it. The Localization Engineer will have to do additional steps. Before the evolution of the CAT tools, we were developing and using custom scripts using C# to detect such content and protect it, then convert the file to the bilingual format which was TTX format for Trados. Now using the new versions of the CAT Tools like Trados Studio, I am using the Regular Expressions and XPATH in order to create rules to detect and protect such types of non-relevant content.

Why should we do additional steps during the preparation?

Ignoring and not protecting the non-relevant text for translation will cause the following:

  • The Translation Memory will not be accurate.
  • The total word count will be higher without reason due to this additional amount of non-translatable content.
  • If any of this content (code) was changed by accident during the translation, the structure of the web page or the mobile App will be corrupted and the application or the web page will not work properly.

To conclude, XLIFF is the standard format for translation. It is recognized worldwide by all CAT tools and translators. Content must be converted from its original format to XLIFF before translation and the process depends here on the medium placeholder: Web, App, etc…

Nowadays, CAT Tools facilitates the process of localization engineering. In some cases, additional effort need to be done to ensure the accuracy of the final localized product.

Feel free to leave a comment or to Contact Me for an open discussion!

Leave a Reply

Your email address will not be published.