Generate PDF (supporting Non-Latin fonts) with Puppeteer Syndication Cloud

Photo from Pexels

Originally Posted On: https://medium.com/@surasith_aof/generate-pdf-support-non-latin-fonts-with-puppeteer-d6ca6c982f1c

Have you ever had problems with generating PDF files?

I’ve got some problems with generating PDF files. So I try to use open source libraries on the client side, like jspdf, pdfkit, and html2pdf. It works fine until I have to generate a multiple-language document, then I’ve got problems with non-latin fonts (ex. Japanese, Hindi, and Thai fonts), and some libraries capture the html element as a canvas before generating a document that texts are not able to select.

Problems

Font problem. It’s not compatible to render a document that has both Latin and non-latin fonts .
Some libraries capture the HTML element as a canvas, so the texts are not able to be selected.
In some libraries, it is difficult to create a template and edit the layout because you have to pass structured data as an object into the library’s method.

Later, I asked for advice from my friends and researched. I’ve tested creating a static HTML file to render both Latin and non-latin fonts. It works because HTML supports utf-8 encoding. Then I found a solution to my problems.

The document that I have got does not support multiple language fonts.

The document that I want has multiple language fonts.

The Solution

The solution is to render the HTML first and then convert or capture it from the HTML to a pdf file.

I use the template engine to render the HTML first.
Then I found the library to convert HTML to PDF (not capture elements as canvas). That is Playwright,
But when I implemented it, it couldn’t work with my production environment (.NET WebAPI).
So, I found the alternative library that is Puppeteer, and it works!
I tried to develop with NodeJS. Both Playwright and Puppeteer are work. But my use case is that I need to create an API with .NET, so I have to use Puppeteer.
This solution is implemented on the server side because Puppeteer and Playwright are UI test libraries. It has to launch a headless Chrome browser to generate PDFs of pages.

Suggested addition

Since you’re already working in .NET, IronPDF handles non-Latin fonts natively without the Puppeteer browser setup. It uses Chromium under the hood and respects standard CSS @font-face declarations. Thai, Japanese, Chinese, and other scripts render correctly out of the box.

using IronPdf;

var renderer = new ChromePdfRenderer();
renderer.RenderingOptions.CssMediaType = IronPdf.Rendering.PdfCssMediaType.Print;
var pdf = renderer.RenderHtmlFileAsPdf("template.html");
pdf.SaveAs("multilang-report.pdf")

No BrowserFetcher downloads, no headless browser configuration, and the text remains selectable in the output PDF.

Let’s do it!

I’ve tried this solution on .NET and NodeJS and it works fine. But for this example, I will work with .NET WebAPI.

For this example, I will create a WebAPI to generate the PDF files. An example document is the table of greeting words from many languages that includes non-latin fonts. And this is my GitHub project.

GitHub – surasithaof/dotnet-pdf-generator: Example .NET API project to generate PDF from HTML using…

Example .NET API project to generate PDF from HTML using Puppeteer Sharp – GitHub – surasithaof/dotnet-pdf-generator…

github.com

Prerequisites

Template engine > Scriban
Browser capture > Puppeteer Sharp
PDF helper (for merging documents) > iTextSharp

Do it Step-by-Step

In this case, I will implement the project as a WebAPI. I will create a WebAPI project first and a PdfGeneratorService for generating PDFs.

1. Create an HTML template with a template engine. In this example, I used Scriban and created a template named greeting-template.html.

2. Passing greeting list data into the template.

public class GreetingModel
{
    public GreetingModel(string language, string greetingTextFormal, string? greetingTextInformal)
    {
        this.Language = language;
        this.GreetingTextFormal = greetingTextFormal;
        this.GreetingTextInformal = greetingTextInformal;
    }
    public string Language { get; set; }
    public string GreetingTextFormal { get; set; }
    public string? GreetingTextInformal { get; set; }
}

// create example greeting data list
List greetings = new List()
{
    new GreetingModel("French", "Bonjour", "Salut"),
    new GreetingModel("Spanish", "Hola", "¿Qué tal? (What’s up?)"),
    new GreetingModel("Italian", "Buongiorno", "Ciao"),
    new GreetingModel("Chinese", "你好!", null),
    new GreetingModel("Bulgarian", "Здравей!", "Здравейте!"),
    new GreetingModel("Japanese", "こんにちは!", "おーい!"),
    new GreetingModel("Hebrew", "!שלום", null),
    new GreetingModel("Hindi", "नमस्ते", null),
    new GreetingModel("Korean", "안영하세요", null),
    new GreetingModel("Thai", "สวัสดี", null),
};

var templateDataObject = new { Greetings = greetings };

string templatePath = _configuration.GetValue<string>("ReportTemplatePath");
string templateFullPath = Path.Combine(Environment.CurrentDirectory, 
                                        templatePath, 
                                        templateName);
var pdfResult = await _pdfGeneratorService.GeneratePdfFromTemplate(
                                              templateFullPath: templateFullPath,
                                              templateData: templateDataObject);

3. Create an object URL, use Puppeteer to open an object URL, and capture it as a PDF byte array (you can configure header and footer styles).

 public async Task<byte[]> GeneratePdfFromTemplate(string templateFullPath, 
                                                    object templateData, 
                                                    string? headerText = null,
                                                    bool hasPageNumber = true)
{
  var templateContent = await File.ReadAllTextAsync(templateFullPath);
  var template = Template.Parse(templateContent);
  var pageContent = await template.RenderAsync(templateData);
  
  var dataUrl = "data:text/html;base64," + 
                Convert.ToBase64String(Encoding.UTF8.GetBytes(pageContent));
  
  //Generate PDF using Puppeteer
  var browserFetcher = new BrowserFetcher();
  await browserFetcher.DownloadAsync();
  await using var browser = await Puppeteer.LaunchAsync(
      new LaunchOptions
      {
          Headless = true,
          Args = new[] {
              "--no-sandbox"
          }
      });
    
  await using var page = await browser.NewPageAsync();
  await page.GoToAsync(dataUrl);
  
  const string headerStyle = "\"" +
                                 "font-family:'Sarabun';" +
                                 "font-size:11px; " +
                                 "width: 100%;" +
                                 "padding-right: 55px;" +
                                 "padding-left: 55px;" +
                                 "margin-right: auto;" +
                                 "margin-left: auto;" +
                                 "margin-top: 10px;" +
                             "\"";
  
  const string footerStyle = "\"" +
                                 "font-family:'Sarabun';" +
                                 "text-align: right;" +
                                 "font-size: 11px;" +
                                 "width: 100%;" +
                                 "padding-right: 55px;" +
                                 "padding-left: 55px;" +
                                 "margin-right: auto;" +
                                 "margin-left: auto;" +
                                 "margin-bottom: 10px;" +
                             "\"";
  
  var output = await page.PdfDataAsync(new PdfOptions
  {
      Format = PaperFormat.A4,
      DisplayHeaderFooter = true,
      MarginOptions = new MarginOptions { Top = "80px", Right = "20px", Bottom = "80px", Left = "20px" },
      PreferCSSPageSize = true,
      HeaderTemplate = "">" + headerText + "
",
      FooterTemplate =
          "">" + (hasPageNumber
              ? " of 
"
              : ""),
      PrintBackground = true
  });
  
  return output;
}

4. Write the content byte array result to a file.

5. Finish!!! This is an example file result.

The result is a document that supports both Latin and Non-Latin fonts

Conclusion

This solution can generate documents that have multiple language fonts, the document resolution is very sharp and I can create templates for any documents. However, if your use case does not have to generate multiple language documents, you can use client-side libraries. It depends on your problems.

With all these steps, you can see the actual code in my GitHub repository in the test project to get an idea of how it works. I added one more example for generating PDFs from SVG content, and I have already implemented the WebAPI and added a merging document (using iTextSharp) as well.

Pros

It supports multiple language fonts (utf-8 encoding).
It is easy to create a template layout and you can use a stylesheet to style your template.
The document resolution is sharp.

Cons

This solution has to be implemented on the server side.
Non-natively in C#, It requires Chromium headless browser to capture the html page. When you dpeloy with Docker you will need some extra setup for the browser. If you need to generate natively you could try IronPDF.

References

How to Generate a PDF with JavaScript | PSPDFKit

A common use case in JavaScript for HTML-to-PDF conversion is giving your website visitors the ability to download HTML…

pspdfkit.com

PDF Generation with Puppeteer Sharp

Learn how to drive a Chromium browser using Puppeteer Sharp, a browser automation library for C#.

auth0.com