Measuring SQLite insertion performance with C++ code

I did a quick Google search on “SQLite performance” and found the following:

Then to benchmark SQLite performance by myself I used the following C++ code that inserts 1000 batches of 1000 000 rows to a single table with an integer primary key:

#include <iostream>
#include "sqlite3.h"
#include "Awl/Exception.h"
#include "Awl/StopWatch.h"

static int callback(void *NotUsed, int argc, char **argv, char **azColName)
{
    return 0;
}

int main()
{
    sqlite3 *db;
    int rc;

    rc = sqlite3_open("familyGuy.db", &db);

    if (rc)
    {
        std::cout << "Can't open database: " << sqlite3_errmsg(db) << "\n";
        return 1;
    }
    else
    {
        std::cout << "Open database successfully\n\n";
    }

    auto exec = [&db](const char * query)
    {
        char *zErrMsg = nullptr;
        
        int rc = sqlite3_exec(db, query, callback, 0, &zErrMsg);
        
        if (rc != SQLITE_OK)
        {
            std::cout << "SQL error: " << sqlite3_errmsg(db) << "\n";
            awl::GeneralException e(zErrMsg);
            sqlite3_free(zErrMsg);
            throw e;
        }
    };
    
    exec("create table myTable (id INTEGER PRIMARY KEY AUTOINCREMENT, FirstName varchar(30), LastName varchar(30), Age smallint, Hometown varchar(30), Job varchar(30))");

    auto insertBatch = [&db, &exec]()
    {
        exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Peter', 'Griffin', 41, 'Quahog', 'Brewery')");
        exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Lois', 'Griffin', 40, 'Newport', 'Piano Teacher')");
        exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Joseph', 'Swanson', 39, 'Quahog', 'Police Officer')");
        exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Glenn', 'Quagmire', 41, 'Quahog', 'Pilot')");
        exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Vasya', 'Nastroykegulyal', 11, 'Partizansk', 'Gopnik')");
    };

    awl::StopWatch sw;
    
    for (size_t j = 0; j < 1000; ++j)
    {
        exec("BEGIN;");
        
        for (size_t i = 0; i < 200 * 1000; ++i)
        {
            insertBatch();
        }

        exec("COMMIT;");

        std::cout << j << ": " << sw.GetElapsedSeconds<int>() << "s, ";

        sw.Reset();
    }

    sqlite3_close(db);

    return 0;
}

With that code the performance did not degraded, the batch insertion time was always 4 seconds and the console output was:

0: 5s, 1: 4s, 2: 4s, 3: 4s, ... 651: 4s, 652: 4s, ... 997: 4s, 998: 4s, 999: 4s

at row #700 000 000 (batch #700) the database size was 30GB and at row #1000 000 000 (batch #1000) the database size was 43GB.

When the table was filled with 1000 000 000 rows

select * from myTable

took a very long time (probably 10 minutes or more), so looks like we should use LIMIT in real-life applications. Deletion all the rows from the table also took significant time.

To do more tests I complicated the insert queries a bit with the following code:

std::uniform_int_distribution<size_t> first_name_dist(0, 1000);
std::uniform_int_distribution<size_t> last_name_dist(0, 100000);

......................

    auto insert = [&db, &exec](const char * f, const char * l, int age, const char * home_town, const char * job)
    {
        std::string query(awl::format() << "insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('" << 
            f << first_name_dist(awl::random()) << "', '" << 
            l << last_name_dist(awl::random()) << "', " <<
            age << ", '" << 
            home_town << "', '" << 
            job << "');");

        exec(query.c_str());
    };
    
    auto insertBatch = [&db, &exec, &insert]()
    {
        insert("Peter", "Griffin", 41, "Newport", "Brewery");
        insert("Lois", "Griffin", 40, "Quahog", "Piano Teacher");
        insert("Joseph", "Swanson", 39, "Quahog", "Police Officer");
        insert("Glenn", "Quagmire", 41, "Quahog", "Pyaniy Pilot");
        insert("Vasya", "Nastroykegulyal", 11, "Partizansk", "Gopnik");
    };

but the string formatting in this code added about 5 second to the batch insertion time (more than doubled the insertion time), and I got the following output:

0: 10s, 1: 9s, 2: 9s, 3: 9s, 4: 10s, 5: 9s ...

so we need some kind of optimization for building the queries and use parameters, but for testing this code is fine.

I was really surprised when I added indices for FirstName and LastName columns with the following code:

exec("CREATE INDEX i_fn ON myTable(FirstName)");
exec("CREATE INDEX i_ln ON myTable(LastName)");

after that the performance dropped more than ten times and started to degrade from the batch to batch:

0: 61s, 1: 79s, 2: 86s, 3: 93s, 4: 96s, 5: 97s, 6: 101s, 7: 108s, ...

with “PRAGMA synchronous = OFF;”:

0: 52s, 1: 78s, 2: 83s, 3: 87s, 4: 85s, 5: 89s, 6: 99s, 7: 101s, 8: 95s, 9: 101s, 10: 111s, ... 15: 141s, 16: 153s ...

so, I afraid, the indices are the bottleneck that makes SQLite unusable for our application.

Below I provided the full source code for testing the indices:

#include <iostream>
#include "sqlite3.h"
#include "Awl/Exception.h"
#include "Awl/StopWatch.h"
#include "Awl/Random.h"
#include "Awl/StringFormat.h"

static int callback(void *NotUsed, int argc, char **argv, char **azColName)
{
    return 0;
}

std::uniform_int_distribution<size_t> first_name_dist(0, 1000);
std::uniform_int_distribution<size_t> last_name_dist(0, 100000);

int main()
{
    sqlite3 *db;
    int rc;

    rc = sqlite3_open("familyGuy.db", &db);

    if (rc)
    {
        std::cout << "Can't open database: " << sqlite3_errmsg(db) << "\n";
        return 1;
    }
    else
    {
        std::cout << "Open database successfully\n\n";
    }

    exec("PRAGMA synchronous = OFF;");

    auto exec = [&db](const char * query)
    {
        char *zErrMsg = nullptr;
        
        int rc = sqlite3_exec(db, query, callback, 0, &zErrMsg);
        
        if (rc != SQLITE_OK)
        {
            std::cout << "SQL error: " << sqlite3_errmsg(db) << "\n";
            awl::GeneralException e(zErrMsg);
            sqlite3_free(zErrMsg);
            throw e;
        }
    };
    
    exec("create table myTable (id INTEGER PRIMARY KEY AUTOINCREMENT, FirstName varchar(30), LastName varchar(30), Age smallint, Hometown varchar(30), Job varchar(30))");

    exec("CREATE INDEX i_fn ON myTable(FirstName)");
    exec("CREATE INDEX i_ln ON myTable(LastName)");

    auto insert = [&db, &exec](const char * f, const char * l, int age, const char * home_town, const char * job)
    {
        std::string query(awl::format() << "insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('" << 
            f << first_name_dist(awl::random()) << "', '" << 
            l << last_name_dist(awl::random()) << "', " <<
            age << ", '" << 
            home_town << "', '" << 
            job << "');");

        exec(query.c_str());
    };
    
    auto insertBatch = [&db, &exec, &insert]()
    {
        insert("Peter", "Griffin", 41, "Newport", "Brewery");
        insert("Lois", "Griffin", 40, "Quahog", "Piano Teacher");
        insert("Joseph", "Swanson", 39, "Quahog", "Police Officer");
        insert("Glenn", "Quagmire", 41, "Quahog", "Pyaniy Pilot");
        insert("Vasya", "Nastroykegulyal", 11, "Partizansk", "Gopnik");
    };

    awl::StopWatch sw;
    
    for (size_t j = 0; j < 1000; ++j)
    {
        exec("BEGIN;");
        
        for (size_t i = 0; i < 200 * 1000; ++i)
        {
            insertBatch();
        }

        exec("COMMIT;");

        std::cout << j << ": " << sw.GetElapsedSeconds<int>() << "s, ";

        sw.Reset();
    }

    exec("select * from myTable");

    exec("delete from myTable");

    exec("drop table myTable");

    sqlite3_close(db);

    return 0;
}

With an integer index

exec("CREATE INDEX i_ln ON myTable(Age)");

the performance is close to the results with the string indices provided above:

0: 24s, 1: 32s, 2: 34s, 3: 36s, 4: 36s, 5: 39s, 6: 41s, 7: 43s, 8: 42s, 9: 45s, 10: 46s, 11: 48s, 12: 50s, 13: 50s, 14: 49s, 15: 50s, 16: 49s, 17: 56s, 18: 70s, 19: 65s, 20: 65s, 21: 71s, 22: 76s, 23: 78s, 24: 78s, 25: 85s, 26: 85s, 27: 89s, 28: 86s, 29: 88s, 30: 85s, 31: 87s, 32: 89s, 33: 106s, 34: 98s, 35: 97s, 36: 100s, 37: 97s, 38: 105s, 39: 120s, 40: 103s, 41: 110s, 42: 109s, 43: 108s, 44: 129s, 45: 129s, 46: 130s, 47: 118s, 48: 114s, 49: 118s, 50: 117s, 51: 113s, 52: 108s, 53: 117s, 54: 123s, 55: 119s, 56: 122s, 57: 123s, 58: 128s, 59: 133s, 60: 122s, 61: 124s, 62: 135s, 63: 142s, 64: 175s, 65: 181s, 66: 162s, 67: 132s, 68: 126s, 69: 124s, 70: 122s, 71: 136s, 72: 135s, 73: 148s, 74: 179s, 75: 177s, 76: 202s, 77: 179s, 78: 142s, 79: 146s, 80: 127s, 81: 122s, 82: 122s, 83: 121s, 84: 135s, 85: 134s, 86: 139s, 87: 158s, 88: 129s, 89: 130s, 90: 130s, 91: 131s, 92: 181s, 93: 216s, 94: 150s, 95: 168s, 96: 168s, 97: 162s, 98: 135s, 99: 128s, 100: 129s, 101: 126s, 102: 117s, 103: 116s, 104: 116s, 105: 118s, 106: 108s, 107: 112s, 108: 114s, 109: 107s, 110: 105s, 111: 102s, 112: 103s, 113: 120s, 114: 122s, 115: 121s, 116: 127s, 117: 129s, 118: 163s, 119: 154s, 120: 167s, 121: 148s, 122: 130s, 123: 144s, 124: 158s, 125: 169s, 126: 186s, 127: 146s, 128: 149s, 129: 139s, 130: 120s, 131: 116s, 132: 124s, 133: 115s, 134: 115s, 135: 114s, 136: 112s, 137: 114s, 138: 130s, 139: 121s, 140: 111s, 141: 114s, 142: 128s, 143: 120s, 144: 114s, 145: 120s, 146: 120s, 147: 121s, 148: 125s, 149: 123s, 150: 119s, 151: 113s, 152: 112s, 153: 111s, 154: 109s, 155: 125s, 156: 115s, 157: 114s, 158: 114s, 159: 111s, 160: 113s, 161: 114s, 162: 118s, ...

The links on how to optimize SQLite:

I compiled C++ code with SQLite sources version 3.29.0 (sqlite-amalgamation-3290000.zip) and MS Visual Studio 2017.

Leave a Reply

Your email address will not be published. Required fields are marked *