I did a quick Google search on “SQLite performance” and found the following:
- What are the performance characteristics of sqlite with very large database files? (The performance degrades significantly when the database size riches 7GB).
- Some Russian article on habr.com. (To boost the insertion we insert by batches in a separate transactions, set synchronization mode to OFF or NORMAL and probably disable indices).
- Some old article on Database Speed Comparison.
- Appropriate Uses For SQLite (An SQLite database is limited in size to 140 terabytes)
- Faster bulk inserts in sqlite3? (You can also try tweaking a few parameters to get extra speed out of it. Specifically you probably want PRAGMA synchronous = OFF;)
- A trivial C++ example.
Then to benchmark SQLite performance by myself I used the following C++ code that inserts 1000 batches of 1000 000 rows to a single table with an integer primary key:
#include <iostream>
#include "sqlite3.h"
#include "Awl/Exception.h"
#include "Awl/StopWatch.h"
static int callback(void *NotUsed, int argc, char **argv, char **azColName)
{
return 0;
}
int main()
{
sqlite3 *db;
int rc;
rc = sqlite3_open("familyGuy.db", &db);
if (rc)
{
std::cout << "Can't open database: " << sqlite3_errmsg(db) << "\n";
return 1;
}
else
{
std::cout << "Open database successfully\n\n";
}
auto exec = [&db](const char * query)
{
char *zErrMsg = nullptr;
int rc = sqlite3_exec(db, query, callback, 0, &zErrMsg);
if (rc != SQLITE_OK)
{
std::cout << "SQL error: " << sqlite3_errmsg(db) << "\n";
awl::GeneralException e(zErrMsg);
sqlite3_free(zErrMsg);
throw e;
}
};
exec("create table myTable (id INTEGER PRIMARY KEY AUTOINCREMENT, FirstName varchar(30), LastName varchar(30), Age smallint, Hometown varchar(30), Job varchar(30))");
auto insertBatch = [&db, &exec]()
{
exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Peter', 'Griffin', 41, 'Quahog', 'Brewery')");
exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Lois', 'Griffin', 40, 'Newport', 'Piano Teacher')");
exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Joseph', 'Swanson', 39, 'Quahog', 'Police Officer')");
exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Glenn', 'Quagmire', 41, 'Quahog', 'Pilot')");
exec("insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('Vasya', 'Nastroykegulyal', 11, 'Partizansk', 'Gopnik')");
};
awl::StopWatch sw;
for (size_t j = 0; j < 1000; ++j)
{
exec("BEGIN;");
for (size_t i = 0; i < 200 * 1000; ++i)
{
insertBatch();
}
exec("COMMIT;");
std::cout << j << ": " << sw.GetElapsedSeconds<int>() << "s, ";
sw.Reset();
}
sqlite3_close(db);
return 0;
}
With that code the performance did not degraded, the batch insertion time was always 4 seconds and the console output was:
0: 5s, 1: 4s, 2: 4s, 3: 4s, ... 651: 4s, 652: 4s, ... 997: 4s, 998: 4s, 999: 4s
at row #700 000 000 (batch #700) the database size was 30GB and at row #1000 000 000 (batch #1000) the database size was 43GB.
When the table was filled with 1000 000 000 rows
select * from myTable
took a very long time (probably 10 minutes or more), so looks like we should use LIMIT in real-life applications. Deletion all the rows from the table also took significant time.
To do more tests I complicated the insert queries a bit with the following code:
std::uniform_int_distribution<size_t> first_name_dist(0, 1000);
std::uniform_int_distribution<size_t> last_name_dist(0, 100000);
......................
auto insert = [&db, &exec](const char * f, const char * l, int age, const char * home_town, const char * job)
{
std::string query(awl::format() << "insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('" <<
f << first_name_dist(awl::random()) << "', '" <<
l << last_name_dist(awl::random()) << "', " <<
age << ", '" <<
home_town << "', '" <<
job << "');");
exec(query.c_str());
};
auto insertBatch = [&db, &exec, &insert]()
{
insert("Peter", "Griffin", 41, "Newport", "Brewery");
insert("Lois", "Griffin", 40, "Quahog", "Piano Teacher");
insert("Joseph", "Swanson", 39, "Quahog", "Police Officer");
insert("Glenn", "Quagmire", 41, "Quahog", "Pyaniy Pilot");
insert("Vasya", "Nastroykegulyal", 11, "Partizansk", "Gopnik");
};
but the string formatting in this code added about 5 second to the batch insertion time (more than doubled the insertion time), and I got the following output:
0: 10s, 1: 9s, 2: 9s, 3: 9s, 4: 10s, 5: 9s ...
so we need some kind of optimization for building the queries and use parameters, but for testing this code is fine.
I was really surprised when I added indices for FirstName and LastName columns with the following code:
exec("CREATE INDEX i_fn ON myTable(FirstName)");
exec("CREATE INDEX i_ln ON myTable(LastName)");
after that the performance dropped more than ten times and started to degrade from the batch to batch:
0: 61s, 1: 79s, 2: 86s, 3: 93s, 4: 96s, 5: 97s, 6: 101s, 7: 108s, ...
with “PRAGMA synchronous = OFF;”:
0: 52s, 1: 78s, 2: 83s, 3: 87s, 4: 85s, 5: 89s, 6: 99s, 7: 101s, 8: 95s, 9: 101s, 10: 111s, ... 15: 141s, 16: 153s ...
so, I afraid, the indices are the bottleneck that makes SQLite unusable for our application.
Below I provided the full source code for testing the indices:
#include <iostream>
#include "sqlite3.h"
#include "Awl/Exception.h"
#include "Awl/StopWatch.h"
#include "Awl/Random.h"
#include "Awl/StringFormat.h"
static int callback(void *NotUsed, int argc, char **argv, char **azColName)
{
return 0;
}
std::uniform_int_distribution<size_t> first_name_dist(0, 1000);
std::uniform_int_distribution<size_t> last_name_dist(0, 100000);
int main()
{
sqlite3 *db;
int rc;
rc = sqlite3_open("familyGuy.db", &db);
if (rc)
{
std::cout << "Can't open database: " << sqlite3_errmsg(db) << "\n";
return 1;
}
else
{
std::cout << "Open database successfully\n\n";
}
exec("PRAGMA synchronous = OFF;");
auto exec = [&db](const char * query)
{
char *zErrMsg = nullptr;
int rc = sqlite3_exec(db, query, callback, 0, &zErrMsg);
if (rc != SQLITE_OK)
{
std::cout << "SQL error: " << sqlite3_errmsg(db) << "\n";
awl::GeneralException e(zErrMsg);
sqlite3_free(zErrMsg);
throw e;
}
};
exec("create table myTable (id INTEGER PRIMARY KEY AUTOINCREMENT, FirstName varchar(30), LastName varchar(30), Age smallint, Hometown varchar(30), Job varchar(30))");
exec("CREATE INDEX i_fn ON myTable(FirstName)");
exec("CREATE INDEX i_ln ON myTable(LastName)");
auto insert = [&db, &exec](const char * f, const char * l, int age, const char * home_town, const char * job)
{
std::string query(awl::format() << "insert into myTable (FirstName, LastName, Age, Hometown, Job) values ('" <<
f << first_name_dist(awl::random()) << "', '" <<
l << last_name_dist(awl::random()) << "', " <<
age << ", '" <<
home_town << "', '" <<
job << "');");
exec(query.c_str());
};
auto insertBatch = [&db, &exec, &insert]()
{
insert("Peter", "Griffin", 41, "Newport", "Brewery");
insert("Lois", "Griffin", 40, "Quahog", "Piano Teacher");
insert("Joseph", "Swanson", 39, "Quahog", "Police Officer");
insert("Glenn", "Quagmire", 41, "Quahog", "Pyaniy Pilot");
insert("Vasya", "Nastroykegulyal", 11, "Partizansk", "Gopnik");
};
awl::StopWatch sw;
for (size_t j = 0; j < 1000; ++j)
{
exec("BEGIN;");
for (size_t i = 0; i < 200 * 1000; ++i)
{
insertBatch();
}
exec("COMMIT;");
std::cout << j << ": " << sw.GetElapsedSeconds<int>() << "s, ";
sw.Reset();
}
exec("select * from myTable");
exec("delete from myTable");
exec("drop table myTable");
sqlite3_close(db);
return 0;
}
With an integer index
exec("CREATE INDEX i_ln ON myTable(Age)");
the performance is close to the results with the string indices provided above:
0: 24s, 1: 32s, 2: 34s, 3: 36s, 4: 36s, 5: 39s, 6: 41s, 7: 43s, 8: 42s, 9: 45s, 10: 46s, 11: 48s, 12: 50s, 13: 50s, 14: 49s, 15: 50s, 16: 49s, 17: 56s, 18: 70s, 19: 65s, 20: 65s, 21: 71s, 22: 76s, 23: 78s, 24: 78s, 25: 85s, 26: 85s, 27: 89s, 28: 86s, 29: 88s, 30: 85s, 31: 87s, 32: 89s, 33: 106s, 34: 98s, 35: 97s, 36: 100s, 37: 97s, 38: 105s, 39: 120s, 40: 103s, 41: 110s, 42: 109s, 43: 108s, 44: 129s, 45: 129s, 46: 130s, 47: 118s, 48: 114s, 49: 118s, 50: 117s, 51: 113s, 52: 108s, 53: 117s, 54: 123s, 55: 119s, 56: 122s, 57: 123s, 58: 128s, 59: 133s, 60: 122s, 61: 124s, 62: 135s, 63: 142s, 64: 175s, 65: 181s, 66: 162s, 67: 132s, 68: 126s, 69: 124s, 70: 122s, 71: 136s, 72: 135s, 73: 148s, 74: 179s, 75: 177s, 76: 202s, 77: 179s, 78: 142s, 79: 146s, 80: 127s, 81: 122s, 82: 122s, 83: 121s, 84: 135s, 85: 134s, 86: 139s, 87: 158s, 88: 129s, 89: 130s, 90: 130s, 91: 131s, 92: 181s, 93: 216s, 94: 150s, 95: 168s, 96: 168s, 97: 162s, 98: 135s, 99: 128s, 100: 129s, 101: 126s, 102: 117s, 103: 116s, 104: 116s, 105: 118s, 106: 108s, 107: 112s, 108: 114s, 109: 107s, 110: 105s, 111: 102s, 112: 103s, 113: 120s, 114: 122s, 115: 121s, 116: 127s, 117: 129s, 118: 163s, 119: 154s, 120: 167s, 121: 148s, 122: 130s, 123: 144s, 124: 158s, 125: 169s, 126: 186s, 127: 146s, 128: 149s, 129: 139s, 130: 120s, 131: 116s, 132: 124s, 133: 115s, 134: 115s, 135: 114s, 136: 112s, 137: 114s, 138: 130s, 139: 121s, 140: 111s, 141: 114s, 142: 128s, 143: 120s, 144: 114s, 145: 120s, 146: 120s, 147: 121s, 148: 125s, 149: 123s, 150: 119s, 151: 113s, 152: 112s, 153: 111s, 154: 109s, 155: 125s, 156: 115s, 157: 114s, 158: 114s, 159: 111s, 160: 113s, 161: 114s, 162: 118s, ...
The links on how to optimize SQLite:
- SQLite – Here’s How To Optimize It (With Code Example)
- Increasing SQLite Performance
- The SQLite Query Optimizer Overview
- Sqlite3: Disabling primary key index while inserting?
- Improve INSERT-per-second performance of SQLite
I compiled C++ code with SQLite sources version 3.29.0 (sqlite-amalgamation-3290000.zip) and MS Visual Studio 2017.