Introduction
Handling large datasets in modern applications often requires efficient streaming solutions. In this blog, we will explore how to:
-
Stream data from a PostgreSQL database.
-
Write the streamed data to an Excel file using ExcelJS.
-
Upload the generated file to Google Cloud Storage (GCS).
This approach ensures memory-efficient handling of large datasets while integrating database and cloud storage operations seamlessly. This approach also supports creating excel file with multiple sheets at the same time.
Prerequisites
Ensure you have the following:
-
Node.js installed.
-
A PostgreSQL database with data to query.
-
A Google Cloud Platform (GCP) project with a storage bucket configured.
-
The following npm packages:
npm install exceljs @google-cloud/storage pg pg-query-stream
Code
Step 1: Dependencies and Imports
Import the required libraries for PostgreSQL, Excel, and Google Cloud Storage, along with project-specific utilities:
import ExcelJS from 'exceljs'
import { Storage } from '@google-cloud/storage'
import pgQueryStream from 'pg-query-stream'
import { Pool } from 'pg'
import { logger } from '@/plugins/logger'
import { errorCodes, errorHandler } from '@/utilities/errorHandler'
import CONFIG from '@/config'
This sets the stage for handling database queries, file creation, and cloud storage.
Step 2: Configure Google Cloud Storage
Set up the GCS file and bucket:
const storage = new Storage()
const {
UPLOAD_HOST: hostName,
UPLOAD_FOLDER: folderName,
UPLOAD_BUCKET: bucketName
} = CONFIG.GCP
const file = storage
.bucket(bucketName)
.file(`${folderName}/${uploadedFileName}`)
Step 3: Initialize Excel Workbook
Create an Excel workbook:
const workbook = new ExcelJS.Workbook()
Step 4: Helper Function - Configure Excel Sheet
Define how to set up an Excel worksheet:
const configureSheet = (sheet, config) => {
sheet.columns = Object.entries(config.COLUMNS).map(([key, value]) => ({
header: value,
key: key,
width: config.WIDTH[key]
}))
}
This function abstracts the logic for setting up columns.
Step 5: Helper Function - Create Worksheet from Data Stream
A utility function to query the database and populate an Excel sheet using a data stream:
const createWorksheet = (client, config, query, transformFunction) => {
const sheet = workbook.addWorksheet(config.NAME)
configureSheet(sheet, config)
const stream = new pgQueryStream(query)
client.query(stream)
stream.on('data', row => {
sheet.addRow(transformFunction(row))
})
return new Promise((resolve, reject) => {
stream.on('end', () =>
resolve(`successfully fetched all records for sheet ${config.NAME}`)
)
stream.on('error', error =>
reject(errorHandler(errorCodes.DATABASE_QUERY_ERROR, error))
)
})
}
Step 6: Process Sheets in Parallel
Fetch data for each sheet concurrently:
const sheet1Promise = createWorksheet(
client,
CONFIG.SHEET.SHEET1,
query1(/* params */),
transformationFn1
)
const sheet2Promise = createWorksheet(
client,
CONFIG.SHEET.SHEET2,
query2(/* params */),,
transformationFn2
)
await Promise.all([sheet1Promise, sheet2Promise])
Step 7: Upload to GCS
Write the workbook directly to GCS:
return new Promise((resolve, reject) => {
const writeStream = file.createWriteStream()
writeStream.on('finish', () =>
resolve(`${hostName}/${bucketName}/${folderName}/${uploadedFileName}`)
)
writeStream.on('error', error =>
reject(errorHandler(errorCodes.GCS_UPLOAD_ERROR, error))
)
workbook.xlsx.write(writeStream)
})
Example UtilityFn and CONFIG used:
const transformRow = row => {
return { ...row, transformedField: row.originalField.toUpperCase() }
}
CONFIG.SHEET = {
SHEET1: {
NAME: 'Sheet1',
COLUMNS: { id: 'ID', name: 'Name' },
WIDTH: { id: 10, name: 30 }
}
}
Conclusion
By combining pg-query-stream, ExcelJS, and GCS, this solution efficiently handles large datasets without memory bloat, making it ideal for modern data-intensive applications. Happy coding!