Friday, May 21, 2021

Migrate Data from a Cosmos DB Azure Table API

If you need to migrate data from or into Azure Cosmos DB you can use Microsoft’s data migration tool to do this. The tool is versatile but the documentation isn't providing all the answers. Specifically, in a scenario when you need to migrate data from Cosmos DB instance configured as an Azure Table API to a JSON file, you can use the tool, but the settings you need to provide are not obvious. Here are the settings which worked for me for its two main tabs: Source Information and Target Information (you would see them if you run the tool dtui.exe):

Source Information

  1. Prepare data for assembling a connection string:

    1. Grab the value of Azure Table Endpoint from Overview page, for example: https://name-of-your-cosmos-db-account.table.cosmos.azure.com:443/

    2. Modify this URL, replacing table.cosmos.azure.com with documents.azure.com

    3. Grab the value of PRIMARY KEY from Connection String page

    4. Grab the name of the root node on the Data Explorer page, this will be your database name.

2. Assemble the connection string as follows:

AccountEndpoint=https://name-of-your-cosmos-db-account.documents.azure.com:443/;AccountKey=primary-key-goes-here;Database=Your-DB-Name

3. Expand the root node (Your-DB-Name) and take a note of a table you want to export, for example My-Table-Name

4. Fill in the form on the Source Information tab:

Import from: Azure Cosmos DB

Connection String: use connection string created in step 2. Click Verify button, it should work.

Collection: My-Table-Name

Other fields: leave them to defaults, or you can optionally specify a query to limit the export

5. Click Next button to configure Target Information

Target Information

  1. Export To: JSON file

  2. Choose Local File radio button option, specify path, optionally select Prettify JSON.

  3. Click Next to complete the wizard and run through the export.

The JSON file should be saved in the directory you specified, or if you didn’t - in the folder you have started the data management tool from.

UPDATE: importing from JSON into Azure Table storage in Cosmos DB also works. Same manipulations with connection string as described above for exporting scenario are needed. In addition, depending on your situation you may want to fill out extra parameters describing whether to regenerate Ids or not, etc. The tutorial on using the data migration tool covers these well.

Monday, January 4, 2021

Command-Line Utility to Validate LUDOWN Files

Microsoft has released Bot Framework Composer in May 2020, and since then the tool was under active development. It allows to rapidly create rich conversational bots that leverage adaptive dialogs, language generation, skills and more.

In working with the Composer I have found that one of practical challenges was the "teaching" LUIS to recognize intents and entities: the labelling was quite verbose, the number of training examples was in dozens per intent (at least), and on top of that the documentation of the new LUDOWN format could be improved, even though the format is by far more convenient than JSON.

I've done some digging thanks to Composer being released as open source, and found a library that the Composer uses for parsing and validating the .lu files: @microsoft/bf-lu.

I thought it was easier to validate the .lu files in CLI mode, so I ended up writing my own CLI to help with quick and verbose validation. As a "bonus" it can also create a temporary LUIS app from an .lu file since the @microsoft/bf-lu validation may miss certain errors, which LUIS would complain about when attempting to create an app.

Here is the utility: https://www.npmjs.com/package/@softforte/lu  

While the new Bot Framework CLI offers similar and comprehensive features, I still find this utility handy for day-to-day LUIS development and hope that it will save you some troubleshooting time. Check it out!


Sunday, November 24, 2019

Single Sign-On for Two Angular Apps with Local Accounts in Azure B2C Tenant

In this day and age Single Sign-On (SSO) is thought of as a commodity, a "flag" an admin turns on somewhere, which makes logging into multiple related applications automatic to the end user. Indeed, mainstream identity providers support SSO for many protocols and across them for several years now.

That's the mindset I had when I was approaching the SSO configuration in Azure B2C tenant. It ended up being a much more cumbersome task than I have expected, hence this post. While in a way it is a regurgitation of information already available on the subject on the Internet, I hope that the description of my "SSO journey" that follows will help reducing the research and experimentation time associated with SSO setup in Azure B2C that otherwise may be needed in order to get it working.

Applications and SSO objective

I have two Angular 8 SPA applications hosted independently on two different domains app1.mydomain.com and app2.mydomain.com. I needed SSO between them, so that when a user signs into one, and then browses to another either in the same browser tab or in a new tab, the user should not be prompted for credentials.
Both applications are registered in the same Azure B2C tenant, and use the same policy. Importantly, they only use local accounts for authentication, this was my constraint. I use MSAL library for authentication/authorization. The application is redirecting users to the B2C policy's sign-in page.

What I wish have worked but didn't...

So I have started with using the built-in Sign up and Sign in user flow, also tried Sign up and Sign in v2 flow with same results. If you go to properties of your flow in B2C web UI, there is a Single sign-on configuration setting under Session behavior. I've set it to Policy as I had two applications sharing the same policy, then saved the user flow. 


It is when there was still no single sign-on I have realized that I was up for a longer ride here.

What worked, but was the wrong path

MSAL documentation describes the library's support for SSO. There are two ways to indicate SSO intention to MSAL library: by using login hint or session identifier (SID). Obviously the MSAL library supports this because the underlying identity provider (IdP) does, or it would be pointless.
So the idea here is to log in to the first application with user's credentials, then pass the SID or login hint to the second application, and B2C should authenticate the user to the second application without displaying prompts.

Cannot obtain SID from Azure B2C

I tried hard, but could not find a way to get SID value from the Azure B2C IdP. I would think it is a claim emitted by the IdP in response to a successful sign on, which appears to be the case for Azure AD IdP, but I had not much luck with Azure B2C IdP.

Extra call to obtain login hint value

The other option, the login hint I could work with. Just get the login claim from the identity or access JWT token returned by B2C and use it as a hint, right? Well, to my surprise the login claim was not present in JWT tokens returned by B2C IdP configured with a built-in Sign up or sign in policy.
That's OK, we can make an MS Graph profile API call and get our login that way, paying with a few hundred milliseconds of page load time for this. Hmmm.....

MSAL Hurdles

It is logical to start with MSAL-Angular if you are in an Angular application... Unfortunately the library is behind the MSAL core, and when it comes to SSO, and specifically passing on login hint, it just does not work.
While the MSAL Angular is appending the login hint as a login_hint extra query parameter to the IdP call, the core Angular library expects the hint as a property of the AuthenticationParameters object. This results in ServerRequestParameters.isSSOParam() call returning false, resulting in the core MSAL library not understanding the login hints and not attempting to establish SSO.
I had to refuse from relying on MSAL-Angular and interact directly with MSAL core library. This got it to work, but as we will see later on, MSAL-Angular "will be back" on the scene.

Sharing the Login Hint between Apps

OK, if I hardcode the login name as a login hint for the second application, then it works, I get the single sign-on as advertised, (or almost!) Now the challenge is to grab the username obtained upon successful logon to the first application through the Graph API call, and share it with the second application before the user authenticates to it. 
Since the apps are on separate domains they do not see each other's state, even if it is in localStorage. Probably the simplest way around this is by using messaging API to communicate between the current window of the first app and a hidden iFrame pointing to the second app, making the latter set the username in its localStorage in response to a received message to use later on as a login hint. Here is an example of this technique
At this point, the whole process was feeling too fragile and complex to me for what it does: too many obstacles, as if Microsoft was trying to implicitly warn me against this path "hinting" that there was a better way. 

PII and Sign Out Concerns

And should I persevere and get over the MSAL-Angular incompatibility, the login hint sharing complexity, and accept the extra time that it takes to make a profile Graph call, I would still face the following issue: the login hint that I am sharing between the applications is what is classified as Personally Identifiable Information (PII). Immediately this becomes a concern from compliance perspective.
Last but not least there is a sign out complexity here: since in the above approach I store the login hint in localStorage, I need to make sure to clear it when a user signs out, or closes her browser tabs. 
Under the pressure of the above considerations, which would have turned a seemingly simple identity solution to a needlessly complex subsystem with potential vulnerabilities, I had to look for an alternative.

Custom Identity Experience Framework Policies to the Rescue

Once I've understood that I've exhausted the options available in the built-in policies (or user flows as they are also referred to), I had to turn to custom Identity Experience Framework (IEF) policies.
First things first, to take advantage of custom policies, one needs to follow this Azure B2C preparation guidance word for word to get the environment ready for creation of custom policies.
Next, make sure to configure Azure Application Insights for monitoring B2C custom policies, as otherwise it will be quite hard to troubleshoot them.

Get signInName Claim in Access Token

I was looking for a way to avoid having to make the MS Graph call. I came across this great StackTrace thread, which shows how to emit the signInName claim as a part of access and id tokens for the local Azure B2C accounts
The detailed instructions in the thread allow adding a signInName claim to the tokens, which is quite helpful. And if you like me happen to hit the following error in process of getting it to work:

 Orchestration step '1' of in policy 'B2C_1A_signup_signin of tenant 'xxxxxxxxxx.onmicrosoft.com' specifies more than one enabled validation claims exchange

Then the following thread contains the remedy

Single Sign-On "Just Works"

Yes it just works as a much welcomed side effect. it was not obvious to me, as the thread was solving a different issue, namely the lack of username in the claims. I did have to modify one line in the SelfAsserted-LocalAccountSignin-Username Technical Profile in TrustFrameworkExtensions.xml (see the highlighted line below):


This is all that I had to do. Now:
  • There is no need to share login hints and deal with associated compliance risks
  • There is no need to make MS Graph API calls and deal with latency
  • MSAL-Angular library "is back in the picture" and can be used again.
Life is good!

Tuesday, November 12, 2019

MSAL acquireTokenSilent() and Azure B2C Permission Scopes

One thing that was not obvious to me when securing an Angular app with Azure B2C tenant had to do with using permission scopes.

Let's say that you have authenticated through loginRedirect(), but need to make a call to acquireTokenSilent() MSAL API from within your SPA app. Perhaps you are writing your own route guard or something... You need to pass an array of scopes to the method call. There are two ways to get this to work: 

1. When you register your app in Azure B2C, it creates a scope for it named user_impersonation. You can take its value (https://yourdomain.onmicrosoft.com/your-app-name/user_impersonation) and pass it to the acquireTokenSilent()method as a single-item array. Or you can create your own scope instead...

You may get an error back from the B2C when you call acquireTokenSilent() with this scope: AADB2C90205: This application does not have sufficient permissions against this web resource to perform the operation. To fix it you need to grant admin consent to the scope through the B2C tenant.

2. There is another way. Check out how MsalGuard class is implemented. It calls acquireTokenSilent()with a single-item array consisting of the app's clientId which we've got through the app registration. That works without any additional consents.

So both ways work, but there are important differences between them:

In the former case, we are making a call to the https://yourdomain.b2clogin.com/yourdomain.onmicrosoft.com/yourpolicy/oauth2/v2.0/authorize endpoint and pass 3 space-separated values in the scope query string argument: https://yourdomain.onmicrosoft.com/your-app-name/user_impersonation openid profile.

In the latter case, the call to the endpoint is not made at all in my case. MSAL "knows" it is authorized as it has got the access token from preceding call to loginRedirect(). Actually, let's take a look at what Fiddler shows when we call loginRedirect(), specifically I am interested in which scopes it passes on:

  • In the former case it is https://yourdomain.onmicrosoft.com/your-app-name/user_impersonation openid profile
  • In the latter case, it is only openid profile

Here is a good description of the meaning of these scopes

With that, here is my takeaway: MSAL converts the clientId scope we pass in a call to its loginRedirect(), acquireTokenSilent() etc. calls to the openid and profile scopes known to Microsoft Identity Platform. It then also is smart enough to resolve calls for access token locally as long as it is valid. 

We can also present our SPA app as an API to the identity platform, create a permission for it, consent it, then acquire token for accessing it. But in a basic authentication scenario such as "is user logged in or not?", there is no benefit in doing so. It may be useful if we have complex permissions in our application and want to be dynamically calling different permission scopes we define for various parts of our application.

Wednesday, June 26, 2019

Extract and Inspect All SharePoint Solutions with PowerShell

Migration or upgrades of SharePoint content databases commonly involve provisioning of WSP solutions. At times you may find yourself in need to search for a particular feature GUID, which is burried somewhere inside of one of the dozens solution files that you have extracted from a farm in question.

If you are on Windows Server 2012 or higher, you can leverage expand.exe command to extract CAB files (WSP files are CAB files).  Here is an one-liner PowerShell command to extract contents of your WSP solutions to respective folders:

dir *.wsp | % { New-Item -Type directory -Path ".\$($_.Name.Remove($_.Name.Length - 4))"; expand.exe $_.Name -F:* $_.Name.Remove($_.Name.Length - 4)}

How to use: First place your solutions to a folder, CD to it, then run the above command, which will create a folder per solution extracted and dump its contents in there.

Now you can quickly tell whether the feature Id you are after is among the ones extracted. For example, the following one-liner command will list all feature Ids, Titles as well as paths to Feature.xml files in a table format:

dir Feature.xml -Recurse | % { $path=[system.io.path]::combine($_.Directory, $_.Name); [xml]$doc = Get-Content -Path $path; $obj = New-Object PSObject -Property @{Path=$path; Id=$doc.Feature.Id; Title=$doc.Feature.Title;}; $obj} | select Id, Title, Path

Oh, and almost forgot that this may also be handy: you can use this line to dump all farm solution files to your current directory, once you make sure you are running it inside of elevated SP PowerShell session:

(Get-SPFarm).Solutions | ForEach-Object{$var = (Get-Location).Path + "\" + $_.Name; $_.SolutionFile.SaveAs($var)}

Happy migrating!

Sunday, June 16, 2019

Azure AD Authentication and Graph API Access in Angular and ASP.NET Core

Wow, it's been quiet here... Enough with the intro ;) and onto the subject, which I find interesting and worthy of writing about...

Consider this scenario, which I think makes a lot of practical sense: a web single-page application (SPA) authenticates users against Azure AD using OpenID Connect implicit grant flow. Then some of the SPA's client-side components make queries to Graph API, while others hit its own server-side Web API.

What follows is highlights from my experience implementing this scenario. These are the packages I was using:
Client-side components obtain access tokens from Azure AD and pass them along with calls to MS Graph API, or to the ASP.NET Web API. The former case is standard and well-explained, while the latter one is less so, and therefore more interesting. ASP.NET is configured to use bearer token authentication and creates user identity, which the rest of server-side logic can then use for its reasoning.

When validating tokens coming down from client components of the application, I used code similar to the one shown below, inside of ConfigureServices method:

// Example of using Azure AD OpenID Connect bearer authentication.
services.AddAuthentication(sharedOptions =>
{
    sharedOptions.DefaultScheme = JwtBearerDefaults.AuthenticationScheme;
}).AddJwtBearer(options => 
{
    options.Authority = "https://login.microsoftonline.com/11111111-1111-1111-1111-111111111111";
    options.TokenValidationParameters = new TokenValidationParameters
    {
        ValidIssuer = "https://login.microsoftonline.com/11111111-1111-1111-1111-111111111111/v2.0",
        ValidAudiences = new string ["22222222-2222-2222-2222-222222222222"]
    };
});

where 11111111-1111-1111-1111-111111111111 is the tenant Id, and 22222222-2222-2222-2222-222222222222 is the client Id of application registration.

One of motivations for this post was the issue I kept getting with this authentication logic. I originally also had  an extra property setting on TokenValidationParameters object:

IssuerSigningKey = new X509SecurityKey(cert)

The above line assigns a public key encoded as X.509 certificate chain to be used later to decode a signature applied to the token by the Azure AD. Check out the always-excellent insights from Andrew Connell, where he explains the need for the key-based signature checks when validating tokens and a mechanism to obtain the public key (the "cert" in the line of code above).

My logic was however failing with the error IDX10511 "Signature validation failed. Keys tried...", and my research into nuances of RSA algorithm implementation in ASP.NET and JSON Web Token encoding was fruitless until I have found this thread on GitHub, and this related thread.

It turned out that my signature validation was fine, although the line above was not needed, because the library I rely on for token validation, the Microsoft.IdentityModel.Tokens, takes care of it automatically by making a call to obtain the Azure  JSON Web Key Set, and deserializing response to .NET public keys used for signature checking.

The actual wrong part had to do with my usage of access tokens: an access token obtained for a Microsoft Graph API resource happens to fail signature validation when used against a different resource (ASP.NET custom Web API in my case). This fact and that ASP.NET error message here  could be improved is covered in detail in the above GitHub threads.

What I had originally, which I refer to as "naive" configuration, is shown on figure below.


On this image, I have an Azure app registration for my web application, requesting some graph permission scopes. Then during execution I acquire token on the client (1), use it when sending requests to Graph API (2), but fail to do the same against my ASP.NET Web API (3), which results in IDX10511 error.

What is interesting here, is that:
  1. This setup kind of makes sense: I have an app, it is registered, and it wants to use access token it gets from Azure to let its own API "know" that a user has logged in.
  2. The problem can be fixed by sending an ID token instead of access token in step (3). OpenID Connect protocol grants ID token upon login, which signifies authentication event, while access token signifies authorization event. ID token's signature is validated without errors, and ASP.NET creates a claims identity for the signed in user.
What is not good about this design, is that the ID token is not meant to be used in this way. While one can choose to deviate from protocol's concept, it is not wise to do so without a compelling reason, since all tooling and third party libraries won't do the same.

Specifically, here are the problems I could identify with the above design:
  1. OpenID Connect, and OAuth 2.0 by extension use different grant flows depending on types of clients used. For a web browser, it is Implicit Grant, then for a server-side client it is one of other flows, depending on a scenario. We are in essence trying to use a token issued to one audience, when calling another audience. In my example, the Angular SPA and Web API are on the same domain. If they were hosted on different domains, this issue would have been more obvious.
  2. Microsoft uses an OAuth 2.0 extension, and on-behalf-of flow (aka OBO flow), which will be useful in scenarios when we decide to have our ASP.NET Web API enhanced by having it also access Graph API or another Microsoft cloud API. The current setup is not going to work with the OBO flow. 

The figure below shows an improved design:


This time we treat server-side Web API as a separate application as far as Azure AD is concerned. We do have to make our SPA application acquire access token twice as shown in calls (1) and (3), doing it so for each audience: once for Graph, and second time - for our own API. Then both calls to Graph (2) and to our own API (4) succeed.

Also, this design is fitting well with the OAuth paradigm. In fact, by the time we decide to augment our Web API and start making on-behalf-of calls from within it, we have already implemented its "first leg".

Lastly, a couple notes about the MSAL Angular configuration. Here is mine:


    MsalModule.forRoot({
      clientID: environment.azureRegistration.clientId,
      authority: environment.azureRegistration.authority,
      validateAuthority: true,
      redirectUri: environment.azureRegistration.redirectUrl,
      cacheLocation: 'localStorage',
      postLogoutRedirectUri: environment.azureRegistration.postLogoutRedirectUrl,
      navigateToLoginRequestUrl: true,
      popUp: false,
//      consentScopes: GRAPH_SCOPES,
      unprotectedResources: ['https://www.microsoft.com/en-us/'],
      protectedResourceMap: PROTECTED_RESOURCE_MAP,
      logger: loggerCallback,
      correlationId: '1234',
      level: LogLevel.Verbose,
      piiLoggingEnabled: true
    }),

MSAL will automatically acquire access token right after an id token is acquired after calling MsalService.loginPopup() with no scopes passed in as arguments. Commenting out or removing the consentScopes config option results in MSAL defaulting to using apps's client Id as an audience and returning a somewhat useless access token with no scopes in it.

I did this as I wanted to explicitly request separate access tokens for Graph and for my Web API. The way to do it is through passing scopes corresponding to an application to a call to MsalService.acquireTokenSilent(scopes). I am now thinking of changing it to pass the scopes of the Graph API  initially, so that my first access token is useful. For the second one I have no choice but to call the  MsalService.acquireTokenSilent(myWebApi_AppScopes)again.

Tuesday, June 25, 2013

Thoughts about Building Multilingual Publishing Site on SharePoint 2013 - Part 3 of 3

This is a third part of the series describing my experience in planning and building a multilingual WCM site on SharePoint 2013.

Part 1 discusses basic requirements and architecture of the authoring site.
Part 2  focuses on managed navigation for the authoring site.
This part discusses Cross-Site Publishing (XSP) and publishing sites.

A Quick Introduction to Cross-Site Publishing

Microsoft has released a blog series describing in detail how to set up a fictitious Contoso site leveraging XSP. Contoso is a product-centric web site using a concept of category pages and catalog item pages to illustrate the applicability of the XSP. A product sold at Contoso electronic store can be categorized by a hierarchy of categories. For example, a specific product, the "Datum SLR Camera X142", is categorized as Cameras >> Digital Cameras. There are only  two kinds of pages that we need here: a product catalog item page showing product information, and a category page showing products matching the category. So, if you pick "Cameras" category from the navigation menu, you will see products qualifying as cameras; if you pick "Digital Cameras" - you will see a narrower set of products qualifying as digital cameras. Regardless of which category in the hierarchy you pick the principal is the same. So we need to figure the current category and render matching products for it. Category pages do exactly that. Next, you click on a specific product listed by the current view of the category page, and then the product details are rendered by the catalog item page, which accepts a unique product identifier. And so you can surface the entire product database on the SharePoint site by using just two pages - a category page and catalog item page.

There are two "magic ingredients" here. Firstly, there is the ability to publish and consume lists as catalogs. What this does is it creates a search result source in the consuming site, and optionally pins the terms from the term set used to categorize the catalog as navigation terms to the navigation term set of the site consuming the catalog. Also behind the scenes, the Microsoft.SharePoint.Publishing.Navigation.TaxonomyNavigation class callable from SharePoint.Publishing.HttpPublishingModule class "becomes aware" of how to detect requested URLs constructed of categories and unique identifiers aka "Friendly URLs" on this site, and route them to appropriate category or catalog item pages. Secondly, it is that the pages "can be made aware" of their current context and use this knowledge when issuing search queries for catalog items. This ability is there thanks to a set of well-known search query variables available to developers or information workers placing search web parts on pages.

Cross-Site Publishing as a Web Content Management Approach

Things started to look a bit confusing when I tried to apply the XSP concept to the site publication scenario that I have described in Part 1. Here are the hurdles I faced:

1. A web site page is hard to fit in the product-centric model described above, because a page can often be both a category and a catalog item at the same time. Commonly sites use section landing pages, the ones the top navigation menus point at, which contain content and at the same time act as logical parents to child pages down the site map hierarchy. Let's say we make a landing page to be an XSP catalog category page. Then, according to the product-centric model we access its child pages as we would access catalog items  - by clicking on a list of these pages rendered by our landing page. Well, this is usually not how information architects expect users to navigate their web sites, unless they are online product stores. What we often need instead is navigation menus, the familiar top navigation and current navigation controls to let us access all pages on the site.

2. Now think for a second about the managed navigation discussed in Part 2 and the requirement we had about maintaining the fidelity between authoring and publishing sites. Every page on the authoring site has a corresponding term used for navigation and forming a friendly URL. Because we use the managed navigation on the authoring site, we get our top and current navigation menus populated by these terms. We want the same behavior on the publishing site. This brings the question: "Which pages should be the category pages, and which ones should be the catalog item pages?" If we were to follow the classical product-centric approach, and designate pages corresponding to the top-level nodes as category pages, and leaf pages as item pages, we would lose the drop-down menus on the publishing site:



When I say "designate" I mean that we "tag" the pages with the terms or, in other words, assign term values to corresponding pages by setting managed metadata fields of these pages.

3. On another extreme, if we tag each and every page on the authoring site with a term, then when we consume the pages library catalog, all our pages will logically become the category pages. How do we now render the catalog item information?

We resolved the confusion by tagging all the pages on the authoring site, effectively making all of them to become the category pages on the publishing site, and modified their page layouts to make them simultaneously act as catalog item pages. This looked like a promising strategy, because by making all of our pages into the category pages, we would get exactly the same top and current navigation elements as on the authoring site, for free. The only thing left to do was to make the category pages render catalog item information - should be doable on a search-driven site.

If you examine an automatically created content item page, you will see that it is based on a page layout, which in turn leverages Catalog Item Reuse (CIR) web parts. Each CIR web part renders contents of a specific managed search property specified in the SelectedPropertiesJSON property. Provided that the managed properties correspond to the columns on the authoring site, this results in rendering the same content on the publishing page as on the authoring page. Below is an example of a CIR web part rendering a value of a managed property corresponding to the Page Content column.


Note that the value of UseSharedDataProvider property should be set to True on all CIR web parts on the page except for one, which serves as a data provider for the rest of them. All CIR web parts on a page form what’s called a query group, with one CIR web part acting as a data provider, and the rest of them – as data consumers. The data provider CIR web part in addition to SelectedPropertiesJSON property has DataProviderJSON property set as shown in the following example.



The value of DataProviderJSON property sets properties on objects of DataProviderScriptWebPart type. The key properties here are:
  • QueryTemplate – defines keyword search filtering criterion using a query variable such as {URLTOKEN.1} in the above example.
  • SourceID – a unique ID of the search result source corresponding to the catalog being consumed. The search result source is created automatically when the catalog is connected to.
  • Scope – a URL of the catalog source list.
An easy way to get started with the CIR web parts is to let SharePoint auto-generate catalog Category and Item pages and page layouts when connecting to a catalog, then harvest the web part markup from the auto-generated page layouts.

So conceptually the problem is solved now: we can create our own page layout and then a category page based on it, and configure CIR web parts to select the managed properties of interest from the catalog.  If the markup of the page layout and master page is the same as on the authoring site, then CIR web parts essentially replace the content fields, and the publishing page appears visually identical to its authoring counterpart, and the navigation is  working, except that only a single page exists on the publishing site. Pretty cool.

Now let's consider some practical aspects of getting the XSP-based publishing site up and running.

XSP Navigation Term Set and Vanity Names

To properly publish pages library as a catalog we need to designate a field uniquely identifying each page, and a managed metadata field used for page categorization. We have come up with two fields for this purpose and defined them at the site collection level in order to make sure the required managed properties get created automatically when we do a full crawl:
  • Vanity Name - this is a text field where we enter a unique friendly name of each page.
  • XSP Category  - is a managed metadata field using a new term set named XSP Navigation.
On the authoring site we are creating and managing a master term set, which is applied to the source variation, then its terms are getting re-used and translated on the target variations. We certainly want to avoid duplication of the effort required to manage the terms when we tag our pages, but we cannot reuse the existing master term set, because SharePoint complains about it being already used when we consume the catalog. We need a new term set, and so we just pin top-level terms with children from the master Site Navigation Term set. Another important thing we need to do is to create a Root term in the new XSP Navigation term set so that we could hook up to it when we consume the catalog. This is illustrated on the figure below:


The convention we have used was that the  value of the Vanity Name field must match the value of the XSP Category Node for any given page on the site. This is important because it allows us to structure the query we issue from the data provider CIR web part as follows: 

VanityNameOWSTEXT:{URLTOKEN.1}

This query means "select items where the value of the Vanity Name managed property contains the last segment of the current URL". So for the URL http://www.softforte.com/softforte-corporate/about-us the {URLTOKEN.1} == "about-us", and due to the convention the Vanity Name == "about-us" as well. The page is a category page in the XSP terms, and has a managed term created when we were consuming the catalog, which points at the URL /softforte-corporate/about-us. This is exactly how we have hacked the category pages, giving them the ability to act as catalog item pages at the same time.


Navigation Translation on Publishing Sites

The terms used on the publishing sites are already translated by virtue of setting their labels for each culture, and customizing term-driven page settings down the re-use chain originating from the authoring source variation's navigation term set. The challenge however exists with regards to selecting the proper translation for the corresponding publishing site.

In order to localize content and navigation of a variations-based site you do not need to install a language pack. The locale such as fr-CA is selected when a variation label is created. This is different for a publishing site, which does not rely on variations. The only way I found to get terms to translate to French was to install the French language pack, then create a site collection using the French target language template.  In order to still manage the site in English, the Alternate Language can be set under Site Settings >> Language Settings. This forces the SharePoint to read the language preferences the browser is sending to it. So if my preferred language is English, I will see the the managed navigation in English, if it is French then it will automatically use the French term translation if one is available. The limitation here is the language packs - we can only publish in the languages supported by language packs, at least to the best of my knowledge.

Do I Really Need a Single Page on My Publishing Site? 

Since most of the real-life sites use multiple page layouts, the same page layouts need to be mirrored  over to the publishing sites and CIR web parts configured there instead of content fields. Then, if all the content on the authoring site is static, i.e. there are no web parts, just the content fields, then we can simply create catalog category pages, one page per page layout, and that would be all we need to do.

Real web sites use web parts to render dynamic content. Since web parts are not stored in index, cross-site publishing will not render them. This means that the web parts need to be re-created on the publishing site. Needless to say that since publishing and authoring sites have different security requirements, the web parts would need to be created differently on publishing and authoring sites.

Where ever possible, we should therefore utilize Content Search (CS)web parts, which would get the content from the SharePoint index just as the CIR web parts do. We were able to meet 100% of requirements to the dynamic elements on the site using CS web parts, thanks to their great flexibility. When a CS web part is used on an authoring site page, it runs in a different context than when it is
used on a publishing site. Therefore  it needs to be adapted to the publishing site, and simply copying it over won't work. Most of the time the adaptation is quite simple however, as the same managed properties are available on both sites, and therefore the search queries are similar.

How big of an issue is the fact that the dynamic elements on content pages need to be recreated on a publishing site? We found it to be quite manageable in our case. The level of work duplication was minor for us, although it depends on how many dynamic web parts are you planning for. A general corollary here is you need to plan thoroughly your pages in advance, to get the most out of the XSP. 

So while you could only use a single category page to render all pages based on a specific page layout, once you add dynamic web parts to the mix you need to create a new page in the publishing site for each corresponding authoring page with the web part. If the same web part is used on multiple pages, then it makes sense to embed it into publishing page layout directly in order to reduce the number of publishing pages you need to create.

Here is an example of how an authoring page corresponds with a  publishing page layout and a publishing page:


On the above illustration, the legend is as follows:

  • Purple color - content fields;
  • Gray color - items inherited from a master page or page layout;
  • Green color - web parts;
  • Blue color - navigation control.

The Inconvenient Result Source ID Value

In the code snippet above the DataProviderJSON property included a SourceID parameter, which is the ID of a Search Result Source that is automatically configured when a catalog is consumed on the site. This property controls the scope of search queries issued by the web parts participating in the query group by adding filtering to select items from the list  with a specific ID and source web site URL. If you let the SharePoint create catalog item and category pages automatically when connecting to a catalog, this Source ID would be embedded in the page layout. Pretty inconvenient, especially as each time you re-connect the Source ID changes, and you cannot control it by exporting and importing site search configuration.

After researching alternatives, we ended up using a provisioning script, which would first provision the publishing page layouts, then determine the result source ID, then check out the layouts, replace the ID in them, and check them back in... like I've done in the old days...

The Big Picture

Getting back to our business requirements described in Part 1, by utilizing the XSP approach we have created 3 publishing sites, one for each variation. The publishing site corresponding to the source variation is not accessible to the Internet users, but the information workers can preview English-level content on it exactly how it would appear to anonymous visitors on the live site. This architecture required us to turn Pages libraries on each variation label into catalogs, and consume the catalogs from the corresponding publishing sites, meaning that we had to have 3 different setups of the publishing page layouts, pages and catalogs. This may seem like a lot, yet since the pages set up on the publishing sites are not created by information workers, but instead are viewed as a part of the application provisioning exercise, and since they are almost identical across the three publishing sites, it was a quite reasonable decision.

Conclusion

The XSP is new, and it can be difficult to depart from the product-centric model used in Microsoft demos. Practically it all gets down to retrieving content from a search index, something we've been doing for a while with SharePoint. The XSP takes the concept to the next level of manageability and implementation convenience, and does not require us to write compiled code any more to take advantage of it. And of course SharePoint sticks to its traditions: planning is key to properly setting up the publishing sites and minimizing duplication of effort. As a result we get modern-looking dynamic publishing web sites and an enterprise-class content management process.