Wednesday, July 21, 2021

Read all sheets of an excel file in NodeJs using XLSX package

Scenario
We have an excel file in our application in a temp folder and we need to process data of each sheet of the excel file. I am giving here the screenshots of folder path of our excel file "candidate.xlsx" with its content.


Here, We need to process data from each sheets of excel file candidate.xlsx present under folder "temp_file".  Below are the screenshots of its two sheets named as "candidates" and "address".



Note: In this sheet, the main headers and data like Name, Email, Phone, Date of birth are starting from 3rd row considering first row as 0. So, header start index is 2. This is going to be used later in code where you can understand it.


Note: In this sheet, the main headers and data like Country, State, City are starting from 4rth row considering first row as 0. So, header start index is 3. This is going to be used later in code where you can understand it.

Solution
There are so many npm packages available to process excel file in Javascript but I found XLSX as more reliable and robust. I will write another blog about how to select the best npm package among several available packages.

The task of processing data of both sheets of above excel file is being performed in following steps:

Step 1:
Installed XLSX package by using command like npm install xlsx

{
  "name""node_excel_read",
  "version""1.0.0",
  "description""",
  "main""index.js",
  "scripts": {
    "test""echo \"Error: no test specified\" && exit 1"
  },
  "author""Jitendra Kumar Singh",
  "license""ISC",
  "dependencies": {
    "xlsx""^0.17.0"
  }
}

Step 2:
Get the list of names of all the excel files present in folder "temp_file". Here, I am not recursively traversing in "temp_file" and I am finding the list of file names present directly inside "temp_file".

/**
 * Creating the list of names of files present directly in temp_file dir
 * @returns 
 */
const getFiles = () => {
  try {
    let files = Array();
    fs.readdirSync(sourceDirPath).forEach(file => {
      if (!fs.lstatSync(path.resolve(sourceDirPathfile)).isDirectory()) {
        files.push(file);
      }
    });
    return files;
  } catch (error) {
    throw error;
  }  
}

Step 3:
Here, I am going to process only one excel file present inside "temp_file" folder. I will write another blog for recursively traversing "temp_file" folder to get list of all file names and processing all files. Now, I am going to read the excel file by using "xlsx" package.

/**
 * Getting the list of name of files present directly in temp_file folder
 * Iterating this list of name of files and then processing each sheet in each file
 * Each sheet is being processed by corresponding function
 * @returns 
 */
const readTestFile = async () => {
  try {
    const files = getFiles();
    if(files && files.length === 0) {
      console.log("No file present");
      return true;
    }
    const filePath = sourceDirPath + "/" + files[0];
    //Reading the excel file candidate.xlsx
    const file = xlsx.readFile(filePath);
    if(file && file.SheetNames && file.SheetNames.length > 0) {
      for(let sheetName of file.SheetNames) {
        if(sheetName === 'candidates') {
          processCandidateSheet(file.Sheets[sheetName]);
        } else if(sheetName === 'address') {
          processAddressSheet(file.Sheets[sheetName]);
        }
      }
    }
  } catch (error) {
    throw error;
  }  
}

Here, I am reading the "candidate.xlsx" file by 
const file = xlsx.readFile(filePath);

Here, the variable "file" is a list of sheets present in this excel file. I am iterating this "file" variable and processing both the sheets "candidates" and "address" by functions "processCandidateSheet()" and "processAddressSheet()"

Step 4:
Here, I am processing "candidates" sheet.

/**
 * Processing the data of Candidate sheet
 * @param {*} candidateSheet 
 */
const processCandidateSheet = (candidateSheet=> {
  try {
    const headerStartIndexInCandidateSheet = 2;//based on position of header in the sheet
    let candidateJsonData = convertSheetToJson(candidateSheetheaderStartIndexInCandidateSheet);
    for(let data of candidateJsonData) {
      console.log(data);
    }
  } catch (error) {
    throw error;
  }
}

If you go through this code block, here I am calling one function convertSheetToJson(). This function actually converts the data of the sheet in JSON format so that it becomes very easy to play with the resultant JSON data. 

/**
 * Converting sheet data in JSON format
 * @param {} sheetName 
 * @param {*} headerIndex 
 * @returns 
 */
const convertSheetToJson = (sheetNameheaderIndex=> {
  try {
    let config = {
      header: "A"
    }
    let sheetData = xlsx.utils.sheet_to_json(sheetNameconfig);
    if (sheetData && sheetData.length > 0) {
      const header = sheetData[headerIndex];
      let loopIndex = headerIndex + 1;
      let jsonRows = Array();
      for (loopIndexloopIndex < sheetData.lengthloopIndex++) {
        let tempObj = {};
        for (const property in sheetData[loopIndex]) {
          tempObj[header[property]] = sheetData[loopIndex][property];
        }
        jsonRows.push(tempObj);
      }
      return jsonRows;
    }
  } catch (error) {
    throw error;
  }
}

Step 5:
Here, I am processing the "address" sheet.

/**
 * Processing the data of Address sheet
 * @param {*} addressSheet 
 */
 const processAddressSheet = (addressSheet=> {
  try {
    const headerStartIndexInAddressSheet = 3;//based on position of header in the sheet
    let addressJsonData = convertSheetToJson(addressSheetheaderStartIndexInAddressSheet);
    for(let data of addressJsonData) {
      console.log(data);
    }
  } catch (error) {
    throw error;
  }
}

Step 6:
Now, I am writing here the function that I will trigger from NodeJs terminal to read file.

const initiate = async () => {
  await readTestFile();
}

setTimeout(() => {
  initiate();
}, 2000);

Step 7:
All the above code I have written in a file named as "readTestFile.js". Here, I have imported following dependencies.

const fs = require('fs');
const path = require('path');
const xlsx = require('xlsx');
const sourceDirPath = path.join(__dirname'../../temp_file');

To run this file I am using command like "node readTestFile"

Output:
Once I run the command like "node readTestFile", it triggers the function "readTestFile()" and starts reading both the sheets of the excel file and printing the data of both the sheets in JSON format.

D:\jitendra\project\poc\node_excel_read\src\scriptsnode readTestFile

Data from candidates sheet

{
  Name'AAA AA',       
  Email'aaa@aa.com',  
  Phone1111111111,    
  'Date of birth'29504
}
{
  Name'BBB BB',
  Email'bbb@bb.com',
  Phone2222222222,
  'Date of birth'44145
}
{
  Email'ccc@cc.com',
  Phone3333333333,
  'Date of birth'40157
}

Data from address sheet

Country'India'State'Karnataka'City'Bangalore' }
Country'India'State'Delhi'City'New Delhi' }
Country'London'State'AAAA'City'BBBB' }



No comments:

Post a Comment

Please provide your precious comments and suggestion