Node.js: learning MongoDB text search and async/await by analysing statistical data

Monitoring access and visits to our Node-powered websites is a good way to improve our knowledge of the inner functioning of MongoDB text search. The problem is with analysing the collected data.

Suppose that we're collecting data in a specific MongoDB collection by saving the URL of the page and the current time of the hit.

First, we can create a text index on our collection.

db.visits.createIndex({ url: "text"})

We are interested in specific keywords that may occur within a page URL. We need a function to perform a text search on our collection.

Then we also need to calculate the percentage of positive searches with respect to the total number of documents.

'use strict';

const mongoose = require('mongoose');
mongoose.connect('mongodb://localhost:27017/data', {useNewUrlParser: true});

const visitSchema = new mongoose.Schema({
    url: String,
    date: Date
});

const visits =  mongoose.model('visits', visitSchema);

const percentage = (x, y) => {
    return Math.floor((parseInt(x, 10) / parseInt(y, 10)) * 100);
};

const getTopicsStats = async keyword => {
    try {
        let total = await stats.countDocuments();
        let search = await stats.countDocuments({ $text: { $search: keyword }});
        return {
            topic: keyword,
            found: search,
            percentage: percentage(search, total)
        }
    }  catch(err) {
        console.log(err);
    }
};

Keywords/topics are usually part of our URLs. In this case we're counting how many hits we can find with a single search over the entire collection.

Our dedicated function makes use of async/await in order to benefit from the Promise-like behavior of the Mongoose's collection methods.

Since an asynchronous function generates a new Promise, we can take advantage of such a feature to perform batch searches.

const getAllTopicsStats = () => {
    let keywords = ['css', 'javascript', 'jquery', 'php', 'wordpress', 'nodejs'];
    let stats = [];
    for(let i = 0; i < keywords.length; i++) {
        stats.push(getTopicsStats(keywords[i]));
    }
    return Promise.all(stats);
};

Finally, you can get all your data with a single routine.

const sortStats = stats => {
    let sorted = stats.sort((a, b) => b.found - a.found;);
    return sorted;
};

const displayTopicsStats = async () => {
    try {
        let stats = await getAllTopicsStats();
        console.log(sortStats(stats));
    }catch(err) {
        console.log(err);
    }
};

If you want to extend these features a little further, you can try to collect also the User-Agent string from the corresponding HTTP header. Then you can use a new text index on the newly added field to get information about the various browsers used by your visitors.

Articles