How to export data from WordPress to MongoDB

Exporting data from WordPress to MongoDB is a relatively simple task to perform.

WordPress comes with a really consistent and well-structured MySQL database (here you can see its structure).

Our target database will be MongoDB, a NoSQL database. MongoDB uses JSON to structure its data. The quickest way to export data from WordPress is to run custom Loops using the WP_Query class.

First, however, we need to create our MongoDB database:

use myblog;

With the proper user permissions:

db.createUser(
   {
 user: "username",
 pwd: "password",
 roles: [ "readWrite", "dbAdmin" ]
   }
);

And with our first collection:

db.createCollection('posts');

I recommend using authentication when your site goes live but during the migration process you can skip this step in order to avoid any unnecessary typing with the mongoimport tool.

Now we need to create a schema for our posts. This could be:

{
  "title": String,
  "date": Date,
  "content": String,
  "excerpt": String,
  "slug": String,
  "thumb": String,
  "category": Array,
  "tag": Array
}

As you can see, our structure is actually a map to the corresponding WordPress structure: the post title can be retrieved with get_the_title(), the post content with get_the_content(), the category list — an array of objects in MongoDB — with get_the_category() and so on.

On our WordPress site we need to create an hidden page with our code for exporting data. If you have a lot of posts, I recommend using posts_per_page and offset when creating the Loop. This is our page template to be used in your current theme:

<?php /* Template Name: Export */
get_header();
?>

<?php
    $export_file = 'posts.json';
    $loop = new WP_Query( array( 'post_type' => 'post', 'posts_per_page' => -1 ) );
    $data = array();
    while( $loop->have_posts() ):
       $loop->the_post();
       $id = get_the_ID();
       $upload_dir = wp_upload_dir();
       $base_upload_url = $upload_dir['baseurl'] . '/';
       $thumb = wp_get_attachment_image_src( get_post_thumbnail_id( $id ), 'full' );
       $image = str_replace( $base_upload_url, '/uploads/', $thumb[0] );
       $cats = get_the_category( $id );
       $tags = get_the_tags();

       $categories = array();
       $post_tags = array();

       if ( !empty( $cats ) ) {
           foreach( $cats as $cat ) {
              $categories[] = array(
                  'name' => $cat->name,
                  'slug' => $cat->slug,
                  'description' => $cat->description
              );
           }
       }

       if( $tags ) {
          foreach( $tags as $tag ) {
                      $post_tags[] = array(
                          'name' => $tag->name,
                          'slug' => $tag->slug,
                          'description' => $tag->description
                      );    
          }
       }

       $datum = array();

       $datum['title'] = get_the_title();
       $datum['date'] =  get_the_date( 'Y-m-d g:i:s', $id );
       $datum['content'] = get_the_content();
       $datum['excerpt'] = strip_tags( get_the_excerpt() );
       $datum['slug'] =  basename( get_permalink( $id ) );
       $datum['thumb'] = $image;
       $datum['category'] = $categories;
       $datum['tag'] = $post_tags;

       $data[] = $datum;

    endwhile;
    wp_reset_postdata();

    $json = json_encode( $data );
    file_put_contents( TEMPLATEPATH . '/' . $export_file, $json );
?>
<?php get_footer(); ?>

If everything works normally, you’ll get a JSON file directly under your current theme’s directory.

Now you can import your posts into MongoDB:

mongoimport -d myblog -c posts --file posts.json --jsonArray

The date field has been imported as a string, so we need to fix this:

use myblog;
db.posts.find().forEach(function(post){
  post.date = ISODate(post.date);
  db.posts.save(post);
});    

The procedure outlined above is almost identical for custom posts types and pages, except for the fact that tags and categories may be replaced by custom taxonomies or simply ignored as for static pages.

You should consider the possibility of creating separate collections for data types different from standard posts.

A good example are attachments. WordPress stores images, documents, videos and other files under the /wp-content/uploads/ directory using a structure based on the current year and month of the uploaded file (that’s why I’ve stripped out the base directory structure from the URL of the post’s featured image).

We can create the following collection:

db.createCollection('media');

With the following basic document structure:

{
  "date": Date,
  "type": String,
  "url": String
}

Then we edit our template as follows:

$export_file = 'media.json';
$attachments = get_posts( array(
        'post_type' => 'attachment',
        'posts_per_page' => -1,
        'post_status' => 'any', 
        'post_parent' => null 
    ) );
    $data = array();

    if( $attachments ) {
        foreach( $attachments as $attachment ) {
            setup_postdata( $attachment );
            $id = get_the_ID();
            $upload_dir = wp_upload_dir();
       $base_upload_url = $upload_dir['baseurl'] . '/';
       $att_url = wp_get_attachment_url( $id );           $url = str_replace( $base_upload_url, '/uploads/', $att_url );
            $datum = array();
            $datum['date'] = get_the_date( 'Y-m-d g:i:s', $id ); 
            $datum['type'] = $attachment->post_mime_type;
            $datum['url'] = $url;

            $data[] = $datum;
        }
        wp_reset_postdata();
        $json = json_encode( $data );
    file_put_contents( TEMPLATEPATH . '/' . $export_file, $json );
    }

We can now import our data as shown earlier:

mongoimport -d myblog -c media --file media.json --jsonArray

Again, the date field has been imported as a string, so we need to fix this:

use myblog;
db.media.find().forEach(function(m){
  m.date = ISODate(m.date);
  db.media.save(m);
});    

As long as you get the whole picture of the export process, there’s nothing really difficult with the entire procedure.

Prev Articles Next