Bulk Import a CSV File Into MongoDB Using Mongoose With Node.js

Final product image
What You’ll Be Creating

This topic is a really enjoyable one for me. It’s quite common in many web applications to accept user input and save a single record to your database. But what about when your users (or you) want to perform multiple inserts in a single command? 

Enter this article, which will demonstrate how to create a CSV template and a form to upload the CSV file, and how to parse the CSV into a Mongoose Model that will be saved to a MongoDB database.

This article assumes that you have a basic understanding of Mongoose and how it interacts with MongoDB. If you don’t, I would suggest reading my Introduction to Mongoose for MongoDB and Node.js article first. This article describes how Mongoose interacts with MongoDB by creating strongly-typed Schemas which a Model is created from. If you already have a good understanding of Mongoose, then let’s proceed.

Getting Started

To begin, let’s instantiate a new Node.js application. In a command prompt, navigate to where you want to host your Node.js applications and perform the following commands:

mkdir csvimport
cd csvimport
npm init

I’ve left all the defaults in place, so my application will start with index.js. Before creating and parsing CSV files, some initial setup needs to be done first. I want to make this a web application; to do that, I’m going to use the Express package to handle all of the nitty-gritty server setup. In your command prompt, install Express by running the following command:

npm install express --save

Since this web application will accept files via a web form, I’m also going to use the Express sub-package Express File Upload. Let’s install that now as well:

npm install express-fileupload --save

I now have done enough initial configuration to set up my web application and create a basic web page that will create my file upload form.

Here is my index.js file that sets up my web server:

var app = require('express')();
var fileUpload = require('express-fileupload');
var server = require('http').Server(app);

app.use(fileUpload());

server.listen(80);

app.get('/', function (req, res) {
  res.sendFile(__dirname + '/index.html');
});

This example imports Express and the Express File Upload libraries, configures my web application to use the File Upload, and listens on port 80. This example has also created a route using Express at “/” which will be the default landing page for my web application. This route returns an index.html file that contains the web form that will allow a user to upload a CSV file. In my case, I am running on my local computer, so when I visit http://localhost I will see the form that I create in the next example.

Here is my index.html page that creates my form for uploading a CSV file:

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Upload Authors</title>
</head>
<body>
	<p>Use the form below to upload a list of authors.
        Click <a href="/template">here</a> for an example template.</p>
	<form action="/" method="POST" encType="multipart/form-data">
		<input type="file" name="file" accept="*.csv" /><br/><br/>
		<input type="submit" value="Upload Authors" />
	</form>
</body>
</html>

This HTML file contains two important things:

  1. A link to “/template” that, when clicked, will download a CSV template that can be populated with the information to be imported.
  2. A form with the encType set as multipart/form-data and an input field with a type of file that accepts files with a “csv” extension.

As you may have noticed, the HTML makes reference to an Author template. If you read my Introduction to Mongoose article, I created an Author Schema. In this article, I am going to recreate this Schema and allow the user to bulk import a collection of authors into my MongoDB database. Let’s take a look at the Author Schema. Before we do that, though, you probably guessed it—we need to install the Mongoose package:

npm install mongoose --save

Creating the Schema and Model

With Mongoose installed, let’s create a new author.js file that will define the Author Schema and Model:

var mongoose = require('mongoose');

var authorSchema = mongoose.Schema({
    _id: mongoose.Schema.Types.ObjectId,
    name: {
		firstName: {
			type: String,
			required: true
		},
		lastName: String
	},
	biography: String,
	twitter: {
		type: String,
		validate: {
			validator: function(text) {
				if (text !== null && text.length > 0)
					return text.indexOf('https://twitter.com/') === 0;
				
				return true;
			},
			message: 'Twitter handle must start with https://twitter.com/'
		}
	},
	facebook: {
		type: String,
		validate: {
			validator: function(text) {
				if (text !== null && text.length > 0)
					return text.indexOf('https://www.facebook.com/') === 0;
				
				return true;
			},
			message: 'Facebook Page must start with https://www.facebook.com/'
		}
	},
	linkedin: {
		type: String,
		validate: {
			validator: function(text) {
				if (text !== null && text.length > 0)
					return text.indexOf('https://www.linkedin.com/') === 0;
				
				return true;
			},
			message: 'LinkedIn must start with https://www.linkedin.com/'
		}
	},
	profilePicture: Buffer,
	created: { 
		type: Date,
		default: Date.now
	}
});

var Author = mongoose.model('Author', authorSchema);

module.exports = Author;

With the Author Schema and Model created, let’s switch gears and focus on creating the CSV template that can be downloaded by clicking on the template link. To aid with the CSV template generation, I’m going to use the JSON to CSV package. Let’s install that now:

npm install json2csv --save

I’m now going to update my previously created index.js file to include a new route for “/template”:

var template = require('./template.js');
app.get('/template', template.get);

I’ve only included the new code for the template route that is appended to the previous index.js file.

The first thing this code does is include a new template.js file (to be created next) and create a route for “/template”. This route will call a get function in the template.js file.

With the Express server updated to include the new route, let’s create the new template.js file:

var json2csv = require('json2csv');

exports.get = function(req, res) {

	var fields = [
		'name.firstName',
		'name.lastName',
		'biography',
		'twitter',
		'facebook',
		'linkedin'
	];

	var csv = json2csv({ data: '', fields: fields });

	res.set("Content-Disposition", "attachment;filename=authors.csv");
	res.set("Content-Type", "application/octet-stream");

	res.send(csv);

};

This file first includes the previously installed json2csv package. I then create and export a get function. This function accepts the request and response objects from the Express server.

Inside the function, I’ve created an array of the fields that I want to include in my CSV template. This can be done one of two ways. The first way (that is done in this example) is to create a static list of the fields to be included in the template. The second way is to dynamically create the list of fields by extracting the properties from the Author Schema.

The second way could be done with the following code:

var fields = Object.keys(Author.schema.obj);

I would have liked to use this dynamic method, but it becomes a bit complicated when I don’t want to include multiple properties from the Schema to my CSV template. In this case, my template doesn’t include the _id and created properties because these will be populated via code. However, if you don’t have fields you wish to exclude, the dynamic method will work as well.

Creating the CSV Template

With the array of fields defined, I use the json2csv package to create my CSV template from my JavaScript object. This csv object will be the results of this route.

And finally, using the res property from the Express server, I set two header properties that will force the download of an authors.csv file.

At this point, if you were to run your Node application and navigate to http://localhost in your web browser, the web form would be displayed with a link to download the template. Clicking the link to download the template will allow you to download the authors.csv file to be populated before it is uploaded.

Here is an example of a populated CSV file:

name.firstName,name.lastName,biography,twitter,facebook,linkedin
Jamie,Munro,Jamie is a web developer and author,,,
Mike,Wilson,Mike is a web developer and Node.js author,,,

This example, when uploaded, will create two authors: myself and a friend who wrote a book on Node.js a few years back. You may notice that at the end of each line are three commas “,,,”. This is done to abbreviate the example. I have not populated the social network properties (twitter, facebook, and linkedin).

The puzzle pieces are starting to come together and form a picture. Let’s get to the meat and potatoes of this example and parse that CSV file. The index.js file requires some updating to connect to MongoDB and create a new POST route that will accept the file upload:

var app = require('express')();
var fileUpload = require('express-fileupload');
var mongoose = require('mongoose');

var server = require('http').Server(app);

app.use(fileUpload());

server.listen(80);

mongoose.connect('mongodb://localhost/csvimport');

app.get('/', function (req, res) {
  res.sendFile(__dirname + '/index.html');
});

var template = require('./template.js');
app.get('/template', template.get);

var upload = require('./upload.js');
app.post('/', upload.post);

With a database connection and a new POST route configured, it’s time to parse the CSV file. Luckily there are several great libraries that assist with this job. I have chosen to use the fast-csv package that can be installed with the following command:

npm install fast-csv --save

The POST route was created similarly to the template route which invokes a post function from the upload.js file. It is not necessary to place these functions in separate files; however, I like to create separate files for these routes as it helps keep the code nice and organized.

Submitting Data

And finally, let’s create the upload.js file that contains the post function that is called when the previously created form is submitted:

var csv = require('fast-csv');
var mongoose = require('mongoose');
var Author = require('./author');

exports.post = function (req, res) {
    if (!req.files)
        return res.status(400).send('No files were uploaded.');
	
	var authorFile = req.files.file;

	var authors = [];
		
	csv
	 .fromString(authorFile.data.toString(), {
		 headers: true,
		 ignoreEmpty: true
	 })
	 .on("data", function(data){
		 data['_id'] = new mongoose.Types.ObjectId();
		 
		 authors.push(data);
	 })
	 .on("end", function(){
		 Author.create(authors, function(err, documents) {
			if (err) throw err;
		 });
		 
		 res.send(authors.length + ' authors have been successfully uploaded.');
	 });
};

Quite a bit is happening in this file. The first three lines include the necessary packages that will be required to parse and save the CSV data.

Next, the post function is defined and exported for use by the index.js file. Inside this function is where the magic takes place.

The function first checks that there is a file contained in the request body. If there is not, an error is returned indicating that a file must be uploaded.

When a file has been uploaded, a reference to the file is saved to a variable called authorFile. This is done by accessing the files array and the file property in the array. The file property matches the name of my file input name that I first defined in the index.html example.

I’ve also created an authors array that will be populated as the CSV file is parsed. This array will be used to save the data to the database.

The fast-csv library is now called by leveraging the fromString function. This function accepts the CSV file as a string. I’ve extracted the string from the authorFile.data property. The data property contains the contents of my uploaded CSV file.

I’ve included two options to the fast-csv function: headers and ignoreEmpty. These are both set to true. This tells the library that the first line of the CSV file will contain the headers and that empty rows should be ignored.

With the options configured, I have set up two listener functions that are called when the data event and the end event are triggered. The data event is called once for every row of the CSV file. This event contains a JavaScript object of the parsed data.

I update this object to include the _id property of the Author Schema with a new ObjectId. This object is then added to the authors array.

When the CSV file has been fully parsed, the end event is triggered. Inside the event callback function, I call the create function on the Author model, passing the array of authors to it.

If an error occurs trying to save the array, an exception is thrown; otherwise, a success message is displayed to the user indicating how many authors have been uploaded and saved to the database.

If you would like to see the full source code, I’ve created a GitHub repository with the code.

Conclusion

In my example, I’ve only uploaded a couple of records. If your use case requires the ability to upload thousands of records, it might be a good idea to save the records in smaller chunks.

This can be done several ways. If I were to implement it, I would suggest updating the data callback function to check the length of the authors array. When the array exceeds your defined length, e.g. 100, call the Author.create function on the array, and then reset the array to empty. This will then save the records in chunks of 100. Be sure to leave the final create call in the end callback function to save the final records.

Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *