Indexing Practices for SQLite Databases with Python

This article shows the importance of indexing in SQLite databases, particularly when a database is expected to be large.

By understanding and implementing primary keys and indexes, you can significantly enhance the performance of queries, ensuring applications run smoothly and efficiently. Proper indexing practices can make a world of difference in how quickly and effectively you can access your data.

Creating a SQLite Database

Create a SQLite database and a table with a primary key. We’ll use Python’s built-in sqlite3 module for this purpose. (The sqlite3 module is included in Python’s standard library, so there’s no need to install it separately). Create a file named database_setup.py in your project directory:

import sqlite3

def create_connection(db_file):
    """Create a database connection to the SQLite database specified by db_file."""
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        print(f"Connected to {db_file}")
    except sqlite3.Error as e:
        print(e)
    return conn

def create_table(conn, create_table_sql):
    """Create a table from the create_table_sql statement."""
    try:
        c = conn.cursor()
        c.execute(create_table_sql)
    except sqlite3.Error as e:
        print(e)

def main():
    database = "my_database.db"

    sql_create_projects_table = """
    CREATE TABLE IF NOT EXISTS projects (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        start_date TEXT,
        end_date TEXT
    );
    """

    conn = create_connection(database)

    if conn is not None:
        create_table(conn, sql_create_projects_table)
    else:
        print("Error! Cannot create the database connection.")

if __name__ == '__main__':
    main()

This script creates a SQLite database named my_database.db and a table called projects with a primary key on the id column.

Inserting Data

Now, let’s insert some data into our table. Create a new file named insert_data.py:

import sqlite3

def create_connection(db_file):
    """Create a database connection to the SQLite database specified by db_file."""
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        return conn
    except sqlite3.Error as e:
        print(e)
    return conn

def insert_project(conn, project):
    """Insert a new project into the projects table."""
    sql = ''' INSERT INTO projects(name, start_date, end_date)
              VALUES(?,?,?) '''
    cur = conn.cursor()
    cur.execute(sql, project)
    conn.commit()
    return cur.lastrowid

def main():
    database = "my_database.db"

    conn = create_connection(database)

    if conn is not None:
        project = ('Cool Project', '2024-01-01', '2024-12-31')
        project_id = insert_project(conn, project)
        print(f"Project inserted with id: {project_id}")
    else:
        print("Error! Cannot create the database connection.")

if __name__ == '__main__':
    main()

This script connects to the my_database.db database and inserts a new project into the projects table.

Indexing Practices

Indexes are crucial for improving the performance of database queries. Let’s create an index on the name column of the projects table. Create a new file named create_index.py:

import sqlite3

def create_connection(db_file):
    """Create a database connection to the SQLite database specified by db_file."""
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        return conn
    except sqlite3.Error as e:
        print(e)
    return conn

def create_index(conn):
    """Create an index on the name column of the projects table."""
    sql = ''' CREATE INDEX idx_name ON projects(name); '''
    try:
        c = conn.cursor()
        c.execute(sql)
        conn.commit()
        print("Index created successfully")
    except sqlite3.Error as e:
        print(e)

def main():
    database = "my_database.db"

    conn = create_connection(database)

    if conn is not None:
        create_index(conn)
    else:
        print("Error! Cannot create the database connection.")

if __name__ == '__main__':
    main()

This script creates an index named idx_name on the name column of the projects table.

Querying Data

In our previous steps, we created an index on the name column of the projects table using the code in the create_index.py script. Once this index is created, any query that uses the name column will benefit from the index, allowing the database to quickly locate and access the matching rows.

By using the index, the select_all_projects (see script below) function can efficiently retrieve rows from the projects table, improving the performance of the query.

We only have (1) record and scaling out to the number of records needed to see performance improvement isn’t possible in this short article. But imagine, if you have a table with thousands of rows, a full table scan might take several seconds or even minutes. With an index, the same query could be completed in milliseconds.

Let’s query the data using indexing. Create a new file named query_data.py:

import sqlite3

def create_connection(db_file):
    """Create a database connection to the SQLite database specified by db_file."""
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        return conn
    except sqlite3.Error as e:
        print(e)
    return conn

def select_all_projects(conn):
    """Query all rows in the projects table."""
    cur = conn.cursor()
    cur.execute("SELECT * FROM projects")

    rows = cur.fetchall()

    for row in rows:
        print(row)

def main():
    database = "my_database.db"

    conn = create_connection(database)

    if conn is not None:
        select_all_projects(conn)
    else:
        print("Error! Cannot create the database connection.")

if __name__ == '__main__':
    main()

Querying Without an Index

When you query a table without an index, the database engine performs a full table scan. This means it examines each row in the table to find the matching records. For small tables, this might not be a big deal, but as the table grows, full table scans can become very slow and resource-intensive.

Querying With an Index

When you query a table with an index, the database engine uses the index to quickly locate the matching records. An index is like a sorted list that allows the database to find rows much faster than scanning the entire table. This is especially beneficial for large tables.

Example

Let’s consider the projects table from our previous example. Suppose we want to find all projects with the name “Cool Project”.

Without an Index:

SELECT * FROM projects WHERE name = 'Cool Project'

In this case, the database engine will scan every row in the projects table to find matches. If the table has thousands or millions of rows, this can be very slow.

With an Index:

CREATE INDEX idx_name ON projects(name); 
SELECT * FROM projects WHERE name = 'Cool Project'

Here, the database engine uses the idx_name index to quickly locate rows where the name column matches “Cool Project”. This is much faster because the index allows the engine to jump directly to the relevant rows without scanning the entire table.

Performance Impact

The performance improvement from using an index can be significant. For example, if you have a table with a million rows, a full table scan might take several seconds or even minutes. With an index, the same query could be completed in milliseconds.

Trade-offs

While indexes improve query performance, they come with some trade-offs:

  • Storage: Indexes require additional storage space.
  • Insert/Update Performance: Maintaining indexes can slow down insert and update operations because the index needs to be updated whenever the indexed column changes.

Types of Indexes

Unique Indexes: These ensure that all values in the indexed column(s) are unique. They are automatically created for primary key columns.

CREATE UNIQUE INDEX idx_unique_name ON projects(name);

Composite Indexes: These are indexes on multiple columns. They are useful when queries often filter by more than one column.

CREATE INDEX idx_composite ON projects(name, start_date);

When to Use Indexes

  • Frequent Searches: Use indexes on columns that are frequently searched or filtered.
  • Joins: Index columns that are used in join conditions to speed up the join operations.
  • Sorting: Index columns that are often used in ORDER BY clauses to improve sorting performance.

When Not to Use Indexes

  • Small Tables: Indexes may not provide significant performance benefits for small tables.
  • Frequent Updates: Avoid indexing columns that are frequently updated, as maintaining the index can slow down insert and update operations.
  • Low Selectivity: Avoid indexing columns with low selectivity (e.g., columns with many duplicate values), as the index may not provide much benefit.

Index Maintenance

  • Rebuilding Indexes: Over time, indexes can become fragmented. Rebuilding indexes can help maintain performance.
REINDEX idx_name;

Dropping Unused Indexes: If an index is no longer needed, it can be dropped to save space and reduce maintenance overhead.

DROP INDEX

Analyzing Index Usage

EXPLAIN Query Plan: Use the EXPLAIN QUERY PLAN statement to understand how SQLite uses indexes in your queries.

EXPLAIN QUERY PLAN SELECT * FROM projects WHERE name = 'Cool Project';

The output (3, 0, 0, 'SEARCH projects USING INDEX idx_name (name=?)') from the EXPLAIN QUERY PLAN statement provides detailed information about how SQLite plans to execute your query. Let’s break it down:

Explanation of Each Part

  1. 3: This is the selectid, which represents the identifier for the query or subquery. In this case, it indicates the main query.
  2. 0: This is the order, which represents the order of operations within the query plan. It shows the sequence in which the operations will be executed.
  3. 0: This is the from value, which indicates the table or subquery that the operation is being performed on. In this case, it refers to the projects table.
  4. ‘SEARCH projects USING INDEX idx_name (name=?)’: This is the detailed description of the operation. It indicates that SQLite will search the projects table using the idx_name index on the name column. The (name=?) part shows that the query is using a parameterized value for the name column.

Thank you for reading this article. I hope you found it helpful and informative. If you have any questions, or if you would like to suggest new Python code examples or topics for future tutorials, please feel free to reach out. Your feedback and suggestions are always welcome!

Happy coding!
C. C. Python Programming

You also find this article at Medium.com

Leave a Reply