After much sweat and frustration, I've finally figured out how to tweak my tried and true Drupal import script to work for Drupal 6.

I often find myself creating Drupal sites for groups with either existing web sites (in other content management systems) or with file libraries that they want to use to import into their new site.

Drupal's bootstrap function and node api makes it really easy to create a script that can be run from the command line to handle the import.

Here are the key components for making it work with Drupal 6.

The import script has to be located in the web directory. Since I don't want people to accidentally or on purpose run it from a web session, I include the followig line:

    // prevent this from running under apache:
    if (array_key_exists('REQUEST_METHOD', $_SERVER)) {
    echo 'nope.  not executing except from the command line.';
    exit(1);
    }

Unfortunately, Drupal's boostrap function will complain if it doesn't detect the HTTP_HOST variable, so I add it here. It doesn't matter what that variable is:

    // set HTTP_HOST or drupal will refuse to bootstrap
    $_SERVER['HTTP_HOST'] = 'example.org';

Next comes the boostrap function will brings in all the Drupal libraries:

    include_once 'includes/bootstrap.inc';
    drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

Although not stricly necessary, some modules require a $user variable, so it makes sense to create one. Setting the user id to 1 will guarantee that you'll have access to do what you need to do. Depending on what you're doing, you might want to use a user with less privileges to prevent any bug from destroying all your data.

    global $user;
    $user = user_load(1);

Here is the basic node creation:

    $node = new stdClass();
    $node->type = 'story';
    $node->status = 1;
    $node->uid = 1;
    $node->title = 'My Title';
    $node->body = 'My body;
    $node->created = time();
    $node->changed = $node->created;
    $node->promote = 1;
    $node->sticky = 0;
    $node->format = 1;
    $node->language = 'en';

Here's an example of a CCK field:

    $node->field_date = array(
        0 => array(
         'value' => '2009-02-09T00:00:00',
        ),
    );

And at last, here's the elusive file attachment code:

    $file = '/path/to/your/file.odt';

    // Get the file size
    $details = stat($file);
    $filesize = $details['size'];

    // Get the path to your Drupal site's files directory 
    $dest = file_directory_path();

    // Copy the file to the Drupal files directory 
    if(!file_copy($file,,$dest)) {
        echo "Failed to move file: $file.\n";
        return;
    } else {
        // file_move might change the name of the file
        $name = basename($file);
    }

    // Build the file object
    $file_obj = new stdClass();
    $file_obj->filename = $name;
    $file_obj->filepath = $file;
    $file_obj->filemime =  file_get_mimetype($name);
    $file_obj->filesize = $filesize;
    $file_obj->filesource = $name;
    // You can change this to the UID you want
    $file_obj->uid = 1;
    $file_obj->status = FILE_STATUS_TEMPORARY;
    $file_obj->timestamp = time();
    $file_obj->list = 1;
    $file_obj->new = true;

    // Save file to files table
    drupal_write_record('files', $file_obj);

    // change file status to permanent
    file_set_status($file_obj,1);

    // Attach the file object to your node
    $node->files[$file_obj->fid] = $file_obj;

Lastly, save the node:

    node_save($node);
    echo "Savied node: $node->nid\n";

That's it. Below I've copied a real life working version that takes all the files in a given directory and creates a node for each file in which the node's title is the title of the file, the date of the file is entered as a CCK date field, and the body of the node is a text version of the document (if it's pdf, doc, or wpd).

    <?php
    /* 
     * This script is used to manually import files 
     *
     */

    // edit the following two lines
    // set the path where the files you want to  import exist. 
    $target = '../import-files-from-mbox/files';

    // what user id should the files be imported as?
    $uid = 1;

    // prevent this from running under apache:
    if (array_key_exists('REQUEST_METHOD', $_SERVER)) {
        echo 'nope.  not executing except from the command line.';
        exit(1);
    }

    // set HTTP_HOST or drupal will refuse to bootstrap
    $_SERVER['HTTP_HOST'] = 'example.org';
    include_once 'includes/bootstrap.inc';
    drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

    // create a user that will be the author of the files/nodes
    // created

    global $user;
    $user = user_load($uid);

    // iterate over directory
    $d = dir($target);
    while (false !== ($name = $d->read())) {
        if($name != '.' && $name != '..') {
            $errors = file_validate_name_length($name);
            if(count($errors) > 0) {
                echo "Invalid name length, skipping: $name\n";
                continue;
            }

            // set some defaults for the file we will be importing
            $file = "$target/$name";
            $details = stat($file);
            $filesize = $details['size'];
            $mtime = $details['mtime'];
            $date_value = date('Y-m-d\T00:00:00',$mtime);

            // create the node object
            $node = new stdClass();
            $node->type = 'lib_item';
            $node->status = 1;
            $node->uid = 1;
            $node->title = $name;
            $node->body = extract_body($file);
            $node->created = time();
            $node->changed = $node->created;
            $node->promote = 1;
            $node->sticky = 0;
            $node->format = 1;
            $node->language = 'en';

            // custom node fields
            $node->field_date = array(
                0 => array(
                 'value' => $date_value,
                ),
            );  

            // handle the file upload
            $dest = file_directory_path();
            // copy the file to the files directory 
            if(!file_copy($file,$dest)) {
                echo "Failed to move file: $file\n";
                continue;
            } else {
                // file_move might change the name of the file
                $name = basename($file);
            }

            // build file object
            $file_obj = new stdClass();
            $file_obj->filename = $name;
            $file_obj->filepath = $file;
            $file_obj->filemime =  file_get_mimetype($name);
            $file_obj->filesize = $filesize;
            $file_obj->filesource = $name;
            $file_obj->uid = 1;
            $file_obj->status = FILE_STATUS_TEMPORARY;
            $file_obj->timestamp = time();
            $file_obj->list = 1;
            $file_obj->new = true;

            // save file to database
            drupal_write_record('files', $file_obj);

            // change file status to permanent (default is temporary)
            file_set_status($file_obj,1);

            $node->files[$file_obj->fid] = $file_obj;
            node_save($node);
            echo "Savied node: $node->nid\n";
            exit;
        }
    }

    function extract_body($name) {
        $pos = strrpos($name,'.');
        $ext = strtolower(substr($name,$pos+1));
        $cmd = '';
        if($ext == 'doc') {
            $cmd = 'antiword';
        } elseif($ext == 'pdf') {
            $cmd = 'pdftotext';
        } elseif($ext == 'wpd') {
            $cmd = 'wpd2text';
        } else {
            return '';
        }
        exec(escapeshellcmd($cmd) . ' ' . escapeshellarg($name),$ret,$error);
        if($error != 0) return '';
        return implode("\n",$ret);
    }
    ?>