This article shows the importance of indexing in SQLite databases, particularly when a database is expected to be large.
By understanding and implementing primary keys and indexes, you can significantly enhance the performance of queries, ensuring applications run smoothly and efficiently. Proper indexing practices can make a world of difference in how quickly and effectively you can access your data.
Creating a SQLite Database
Create a SQLite database and a table with a primary key. We’ll use Python’s built-in sqlite3
module for this purpose. (The sqlite3
module is included in Python’s standard library, so there’s no need to install it separately). Create a file named database_setup.py
in your project directory:
import sqlite3
def create_connection(db_file):
"""Create a database connection to the SQLite database specified by db_file."""
conn = None
try:
conn = sqlite3.connect(db_file)
print(f"Connected to {db_file}")
except sqlite3.Error as e:
print(e)
return conn
def create_table(conn, create_table_sql):
"""Create a table from the create_table_sql statement."""
try:
c = conn.cursor()
c.execute(create_table_sql)
except sqlite3.Error as e:
print(e)
def main():
database = "my_database.db"
sql_create_projects_table = """
CREATE TABLE IF NOT EXISTS projects (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
start_date TEXT,
end_date TEXT
);
"""
conn = create_connection(database)
if conn is not None:
create_table(conn, sql_create_projects_table)
else:
print("Error! Cannot create the database connection.")
if __name__ == '__main__':
main()
This script creates a SQLite database named my_database.db
and a table called projects
with a primary key on the id
column.
Inserting Data
Now, let’s insert some data into our table. Create a new file named insert_data.py
:
import sqlite3
def create_connection(db_file):
"""Create a database connection to the SQLite database specified by db_file."""
conn = None
try:
conn = sqlite3.connect(db_file)
return conn
except sqlite3.Error as e:
print(e)
return conn
def insert_project(conn, project):
"""Insert a new project into the projects table."""
sql = ''' INSERT INTO projects(name, start_date, end_date)
VALUES(?,?,?) '''
cur = conn.cursor()
cur.execute(sql, project)
conn.commit()
return cur.lastrowid
def main():
database = "my_database.db"
conn = create_connection(database)
if conn is not None:
project = ('Cool Project', '2024-01-01', '2024-12-31')
project_id = insert_project(conn, project)
print(f"Project inserted with id: {project_id}")
else:
print("Error! Cannot create the database connection.")
if __name__ == '__main__':
main()
This script connects to the my_database.db
database and inserts a new project into the projects
table.
Indexing Practices
Indexes are crucial for improving the performance of database queries. Let’s create an index on the name
column of the projects
table. Create a new file named create_index.py
:
import sqlite3
def create_connection(db_file):
"""Create a database connection to the SQLite database specified by db_file."""
conn = None
try:
conn = sqlite3.connect(db_file)
return conn
except sqlite3.Error as e:
print(e)
return conn
def create_index(conn):
"""Create an index on the name column of the projects table."""
sql = ''' CREATE INDEX idx_name ON projects(name); '''
try:
c = conn.cursor()
c.execute(sql)
conn.commit()
print("Index created successfully")
except sqlite3.Error as e:
print(e)
def main():
database = "my_database.db"
conn = create_connection(database)
if conn is not None:
create_index(conn)
else:
print("Error! Cannot create the database connection.")
if __name__ == '__main__':
main()
This script creates an index named idx_name
on the name
column of the projects
table.
Querying Data
In our previous steps, we created an index on the name
column of the projects
table using the code in the create_index.py
script. Once this index is created, any query that uses the name
column will benefit from the index, allowing the database to quickly locate and access the matching rows.
By using the index, the select_all_projects
(see script below) function can efficiently retrieve rows from the projects
table, improving the performance of the query.
We only have (1) record and scaling out to the number of records needed to see performance improvement isn’t possible in this short article. But imagine, if you have a table with thousands of rows, a full table scan might take several seconds or even minutes. With an index, the same query could be completed in milliseconds.
Let’s query the data using indexing. Create a new file named query_data.py
:
import sqlite3
def create_connection(db_file):
"""Create a database connection to the SQLite database specified by db_file."""
conn = None
try:
conn = sqlite3.connect(db_file)
return conn
except sqlite3.Error as e:
print(e)
return conn
def select_all_projects(conn):
"""Query all rows in the projects table."""
cur = conn.cursor()
cur.execute("SELECT * FROM projects")
rows = cur.fetchall()
for row in rows:
print(row)
def main():
database = "my_database.db"
conn = create_connection(database)
if conn is not None:
select_all_projects(conn)
else:
print("Error! Cannot create the database connection.")
if __name__ == '__main__':
main()
Querying Without an Index
When you query a table without an index, the database engine performs a full table scan. This means it examines each row in the table to find the matching records. For small tables, this might not be a big deal, but as the table grows, full table scans can become very slow and resource-intensive.
Querying With an Index
When you query a table with an index, the database engine uses the index to quickly locate the matching records. An index is like a sorted list that allows the database to find rows much faster than scanning the entire table. This is especially beneficial for large tables.
Example
Let’s consider the projects
table from our previous example. Suppose we want to find all projects with the name “Cool Project”.
Without an Index:
SELECT * FROM projects WHERE name = 'Cool Project'
In this case, the database engine will scan every row in the projects
table to find matches. If the table has thousands or millions of rows, this can be very slow.
With an Index:
CREATE INDEX idx_name ON projects(name);
SELECT * FROM projects WHERE name = 'Cool Project'
Here, the database engine uses the idx_name
index to quickly locate rows where the name
column matches “Cool Project”. This is much faster because the index allows the engine to jump directly to the relevant rows without scanning the entire table.
Performance Impact
The performance improvement from using an index can be significant. For example, if you have a table with a million rows, a full table scan might take several seconds or even minutes. With an index, the same query could be completed in milliseconds.
Trade-offs
While indexes improve query performance, they come with some trade-offs:
- Storage: Indexes require additional storage space.
- Insert/Update Performance: Maintaining indexes can slow down insert and update operations because the index needs to be updated whenever the indexed column changes.
Types of Indexes
Unique Indexes: These ensure that all values in the indexed column(s) are unique. They are automatically created for primary key columns.
CREATE UNIQUE INDEX idx_unique_name ON projects(name);
Composite Indexes: These are indexes on multiple columns. They are useful when queries often filter by more than one column.
CREATE INDEX idx_composite ON projects(name, start_date);
When to Use Indexes
- Frequent Searches: Use indexes on columns that are frequently searched or filtered.
- Joins: Index columns that are used in join conditions to speed up the join operations.
- Sorting: Index columns that are often used in
ORDER BY
clauses to improve sorting performance.
When Not to Use Indexes
- Small Tables: Indexes may not provide significant performance benefits for small tables.
- Frequent Updates: Avoid indexing columns that are frequently updated, as maintaining the index can slow down insert and update operations.
- Low Selectivity: Avoid indexing columns with low selectivity (e.g., columns with many duplicate values), as the index may not provide much benefit.
Index Maintenance
- Rebuilding Indexes: Over time, indexes can become fragmented. Rebuilding indexes can help maintain performance.
REINDEX idx_name;
Dropping Unused Indexes: If an index is no longer needed, it can be dropped to save space and reduce maintenance overhead.
DROP INDEX
Analyzing Index Usage
EXPLAIN Query Plan: Use the EXPLAIN QUERY PLAN
statement to understand how SQLite uses indexes in your queries.
EXPLAIN QUERY PLAN SELECT * FROM projects WHERE name = 'Cool Project';
The output (3, 0, 0, 'SEARCH projects USING INDEX idx_name (name=?)')
from the EXPLAIN QUERY PLAN
statement provides detailed information about how SQLite plans to execute your query. Let’s break it down:
Explanation of Each Part
- 3: This is the
selectid
, which represents the identifier for the query or subquery. In this case, it indicates the main query. - 0: This is the
order
, which represents the order of operations within the query plan. It shows the sequence in which the operations will be executed. - 0: This is the
from
value, which indicates the table or subquery that the operation is being performed on. In this case, it refers to theprojects
table. - ‘SEARCH projects USING INDEX idx_name (name=?)’: This is the detailed description of the operation. It indicates that SQLite will search the
projects
table using theidx_name
index on thename
column. The(name=?)
part shows that the query is using a parameterized value for thename
column.
Thank you for reading this article. I hope you found it helpful and informative. If you have any questions, or if you would like to suggest new Python code examples or topics for future tutorials, please feel free to reach out. Your feedback and suggestions are always welcome!
Happy coding!
C. C. Python Programming
You also find this article at Medium.com