Dealing with multiple languages in Sitecore Search with XM Cloud

Dealing with multiple languages in Sitecore Search with XM Cloud

Unless you have been replaced by AI, there is a chance that you will be the one to add multiple languages to your Sitecore website. When you want to integrate this with Sitecore Search, there are a few things you need to know. Let's try to see how we can introduce additional languages into your search results.

Languages in XM Cloud

Adding additional languages in XM Cloud is not much different than in any previous Sitecore solution. You are most likely already an expert in this, so I will not go into details here. If it has been a while since you added a language, you can have a look at the official documentation.

Don't forget to also add the configuration to the Next.js application as described here. For our example, we will introduce two languages:

  • en-us
  • de-ch

To configure locales, your account needs to have the Tech Admin role in Sitecore Search. If your Sitecore Search instance is hooked up to the Cloud Portal, you either get that as a Cloud Portal admin, or as a Cloud Portal user with the Tech Admin role.

With the needed permissions, you can navigate to the Administration section and access the Domain Settings. In the General Settings tab, you can find an entry called Locale. Click Edit to configure the locales.

We add our two locales and enable the locale settings with the toggle on top. Don't forget this last part, as it is disabled by default and it will make you scratch your head. (Totally not speaking about myself here 👀). Save it and publish your changes.

Update the sources

The sources in Sitecore Search define where data is coming from and how it is added to the search index. For every source you have set up, you now want to configure the available locales. You can do this in the settings of the source. Add the locales, save the configuration and publish it.

Add a locale extractor

As soon as you define multiple locales for a source, a new entry called Locale Extractors will become available on the left side. This will define how the crawler extracts the locale information from a content item. A locale extractor can work based on the URL, a header or JavaScript.

For our example we extract the language from a meta attribute in the HTML through JavaScript with a JS extractor type:

function extract(request, response) {
    $ = response.body;
    var searchLang = $('meta[name="search_locale"]')?.attr('content')?.toLowerCase();
    return searchLang;
}

Update the document extractors

The document extractor configuration needs to be aware that it is supposed to be language aware when indexing. To achieve this, edit the Document Extractor and enable the Localized switch in the tagger.

Save and publish this change. Rescan the source to update the index. To verify that everything worked, you can filter your content in the Content Collection by locale and look at the result.

XM Cloud gives you the option to add languages with a country code or without. This means we can have language definitions like en-us or de-ch, but also more general en or de.

☝️
Languages in Sitecore Search always need a country code.

If your website support both variants (e.g. de-ch as well as de), you can chose a country in Sitecore Search to represent the general language.

In this example, we use Trinidad and Tobago (country code tt) which leaves us with the following mapping:

  • en (XM Cloud) ➡️ en_tt (Sitecore Search)
  • en-us (XM Cloud) ➡️ en_us (Sitecore Search)
  • de (XM Cloud) ➡️ de_tt (Sitecore Search)
  • de-ch (XM Cloud) ➡️ de_ch (Sitecore Search)

Because the XM Cloud language code is different than the Sitecore Search, we chose to write it to and extract it from a meta tag instead of the URL.

Summary

Adding additional languages to your composable website is not very complex. There are only a few places where you will need to update the configuration in Sitecore Search. As every source can have its own method of extracting the locale information, integrating systems with different approaches is also not an issue.