Tuesday, December 7, 2021

Royalty-Free PDF Conversion and Manipulation Web Service

PDF generation, anyone? There are tons of libraries to do all sorts of things with PDF. But time and again I witness the pain that PDF handling task causes organizations, even though it should long be totally mundane, one would think... 

Why is this so? You have to pay for a good PDF tool. Sometimes a lot. And the many free alternatives either focus on narrow things, or do not produce the best quality, or are hard to use, or are not for the platform of choice at the organization. The biggest obstacle is of course the licensing. If you cannot get a completely free license, then there is often a lot of friction  when trying to get a paid-for variant... Here is one example of how a conversation could evolve: 

- "We already have a license for PDF converter X!". 

- "But it does not do what we need/or it is too hard to use!" etc., etc. 

And needless to say that this sort of complexity typically arises in the kind of organizations that do a lot of document management and depend on PDF generation.

Having been through this recently myself I have got inspired to try and help my customer, Transport Canada, and other teams that may find themselves in a similar situation, and share a way to reliably generate high quality PDF documents and do some basic manipulations on them, absolutely free of charge.

Indeed, there has long been several useful PDF utilities made available on Linux, and there is the prominent LibreOffice suite that can generate great quality PDF documents free of charge. So I thought why not give it a try and put these applications inside of a Docker container and write a service that would accept HTTP requests and launch them? 

Such service would allow tapping into the richness of available open source tools, many of which have been out there for decades, and if you are a developer then it would let you easily adjust which specific tools would you want to run, or how would you want to scale the service. Pretty flexible. 

The service is now being adopted by Marine department at Transport Canada, and I hope it will evolve and serve them well. And since Transport Canada has a great policy of sharing the source code for some of their applications, I am happy to share a link to its public Github repository, which also has a detailed description of how it works and how to handle it. I will just list its "no frills" but much sought-after basic  capabilities here:

  • Conversion of popular office and image formats to PDF (thanks to LibreOffice!)
  • Merging of office documents and images into a single PDF document (thanks to GhostScript!)
  • Populating and "flattening" of fillable PDF forms (thanks to pdftk-java and PDF Toolkit!)

Have fun converting to PDF for free!


 

Friday, May 21, 2021

Migrate Data from a Cosmos DB Azure Table API

If you need to migrate data from or into Azure Cosmos DB you can use Microsoft’s data migration tool to do this. The tool is versatile but the documentation isn't providing all the answers. Specifically, in a scenario when you need to migrate data from Cosmos DB instance configured as an Azure Table API to a JSON file, you can use the tool, but the settings you need to provide are not obvious. Here are the settings which worked for me for its two main tabs: Source Information and Target Information (you would see them if you run the tool dtui.exe):

Source Information

  1. Prepare data for assembling a connection string:

    1. Grab the value of Azure Table Endpoint from Overview page, for example: https://name-of-your-cosmos-db-account.table.cosmos.azure.com:443/

    2. Modify this URL, replacing table.cosmos.azure.com with documents.azure.com

    3. Grab the value of PRIMARY KEY from Connection String page

    4. Grab the name of the root node on the Data Explorer page, this will be your database name.

2. Assemble the connection string as follows:

AccountEndpoint=https://name-of-your-cosmos-db-account.documents.azure.com:443/;AccountKey=primary-key-goes-here;Database=Your-DB-Name

3. Expand the root node (Your-DB-Name) and take a note of a table you want to export, for example My-Table-Name

4. Fill in the form on the Source Information tab:

Import from: Azure Cosmos DB

Connection String: use connection string created in step 2. Click Verify button, it should work.

Collection: My-Table-Name

Other fields: leave them to defaults, or you can optionally specify a query to limit the export

5. Click Next button to configure Target Information

Target Information

  1. Export To: JSON file

  2. Choose Local File radio button option, specify path, optionally select Prettify JSON.

  3. Click Next to complete the wizard and run through the export.

The JSON file should be saved in the directory you specified, or if you didn’t - in the folder you have started the data management tool from.

UPDATE: importing from JSON into Azure Table storage in Cosmos DB also works. Same manipulations with connection string as described above for exporting scenario are needed. In addition, depending on your situation you may want to fill out extra parameters describing whether to regenerate Ids or not, etc. The tutorial on using the data migration tool covers these well.

Monday, January 4, 2021

Command-Line Utility to Validate LUDOWN Files

Microsoft has released Bot Framework Composer in May 2020, and since then the tool was under active development. It allows to rapidly create rich conversational bots that leverage adaptive dialogs, language generation, skills and more.

In working with the Composer I have found that one of practical challenges was the "teaching" LUIS to recognize intents and entities: the labelling was quite verbose, the number of training examples was in dozens per intent (at least), and on top of that the documentation of the new LUDOWN format could be improved, even though the format is by far more convenient than JSON.

I've done some digging thanks to Composer being released as open source, and found a library that the Composer uses for parsing and validating the .lu files: @microsoft/bf-lu.

I thought it was easier to validate the .lu files in CLI mode, so I ended up writing my own CLI to help with quick and verbose validation. As a "bonus" it can also create a temporary LUIS app from an .lu file since the @microsoft/bf-lu validation may miss certain errors, which LUIS would complain about when attempting to create an app.

Here is the utility: https://www.npmjs.com/package/@softforte/lu  

While the new Bot Framework CLI offers similar and comprehensive features, I still find this utility handy for day-to-day LUIS development and hope that it will save you some troubleshooting time. Check it out!