Back

Streaming Data to Excel and Uploading to Google Cloud Storage

Anurag Dash / January 15, 2025

Introduction

Handling large datasets in modern applications often requires efficient streaming solutions. In this blog, we will explore how to:

  1. Stream data from a PostgreSQL database.

  2. Write the streamed data to an Excel file using ExcelJS.

  3. Upload the generated file to Google Cloud Storage (GCS).

This approach ensures memory-efficient handling of large datasets while integrating database and cloud storage operations seamlessly. This approach also supports creating excel file with multiple sheets at the same time.

Prerequisites

Ensure you have the following:

  • Node.js installed.

  • A PostgreSQL database with data to query.

  • A Google Cloud Platform (GCP) project with a storage bucket configured.

  • The following npm packages:

npm install exceljs @google-cloud/storage pg pg-query-stream

Code

Step 1: Dependencies and Imports

Import the required libraries for PostgreSQL, Excel, and Google Cloud Storage, along with project-specific utilities:

import ExcelJS from 'exceljs'
import { Storage } from '@google-cloud/storage'
import pgQueryStream from 'pg-query-stream'
import { Pool } from 'pg'

import { logger } from '@/plugins/logger'
import { errorCodes, errorHandler } from '@/utilities/errorHandler'
import CONFIG from '@/config'

This sets the stage for handling database queries, file creation, and cloud storage.

Step 2: Configure Google Cloud Storage

Set up the GCS file and bucket:

const storage = new Storage()
const {
  UPLOAD_HOST: hostName,
  UPLOAD_FOLDER: folderName,
  UPLOAD_BUCKET: bucketName
} = CONFIG.GCP
const file = storage
  .bucket(bucketName)
  .file(`${folderName}/${uploadedFileName}`)

Step 3: Initialize Excel Workbook

Create an Excel workbook:

const workbook = new ExcelJS.Workbook()

Step 4: Helper Function - Configure Excel Sheet

Define how to set up an Excel worksheet:

const configureSheet = (sheet, config) => {
  sheet.columns = Object.entries(config.COLUMNS).map(([key, value]) => ({
    header: value,
    key: key,
    width: config.WIDTH[key]
  }))
}

This function abstracts the logic for setting up columns.

Step 5: Helper Function - Create Worksheet from Data Stream

A utility function to query the database and populate an Excel sheet using a data stream:

const createWorksheet = (client, config, query, transformFunction) => {
  const sheet = workbook.addWorksheet(config.NAME)
  configureSheet(sheet, config)
  const stream = new pgQueryStream(query)
  client.query(stream)
  stream.on('data', row => {
    sheet.addRow(transformFunction(row))
  })
  return new Promise((resolve, reject) => {
    stream.on('end', () =>
      resolve(`successfully fetched all records for sheet ${config.NAME}`)
    )
    stream.on('error', error =>
      reject(errorHandler(errorCodes.DATABASE_QUERY_ERROR, error))
    )
  })
}

Step 6: Process Sheets in Parallel

Fetch data for each sheet concurrently:

const sheet1Promise = createWorksheet(
  client,
  CONFIG.SHEET.SHEET1,
  query1(/* params */),
  transformationFn1
)
const sheet2Promise = createWorksheet(
  client,
  CONFIG.SHEET.SHEET2,
  query2(/* params */),,
  transformationFn2
)
await Promise.all([sheet1Promise, sheet2Promise])

Step 7: Upload to GCS

Write the workbook directly to GCS:

return new Promise((resolve, reject) => {
  const writeStream = file.createWriteStream()
  writeStream.on('finish', () =>
    resolve(`${hostName}/${bucketName}/${folderName}/${uploadedFileName}`)
  )
  writeStream.on('error', error =>
    reject(errorHandler(errorCodes.GCS_UPLOAD_ERROR, error))
  )
  workbook.xlsx.write(writeStream)
})

Example UtilityFn and CONFIG used:

const transformRow = row => {
  return { ...row, transformedField: row.originalField.toUpperCase() }
}
CONFIG.SHEET = {
  SHEET1: {
    NAME: 'Sheet1',
    COLUMNS: { id: 'ID', name: 'Name' },
    WIDTH: { id: 10, name: 30 }
  }
}

Conclusion

By combining pg-query-stream, ExcelJS, and GCS, this solution efficiently handles large datasets without memory bloat, making it ideal for modern data-intensive applications. Happy coding!