1. Introduction
Large table reads can cause our application to run out of memory. They also add extra load to the database and require more bandwidth to execute. The recommended approach while reading a large table is to use paginated queries. Essentially, we read a subset (page) of data, process the data, and then move to the next page.
In this article, we’ll discuss and implement different strategies for pagination with JDBC.
2. Setup
First, we need to add the appropriate JDBC dependency based on our database in the pom.xml file so that we can connect to our database. For example, if our database is PostgreSQL, we need to add the PostgreSQL dependency:
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
</dependency>
Second, we’ll need a large dataset to make a paginated query. Let’s create an employees table and insert one million records into it:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
salary DECIMAL(10, 2)
);
INSERT INTO employees (first_name, last_name, salary)
SELECT
'FirstName' || series_number,
'LastName' || series_number,
(random() * 100000)::DECIMAL(10, 2) -- Adjust the range as needed
FROM generate_series(1, 1000000) as series_number;
Lastly, we’ll create a connection object inside our sample app and configure it with our database connection:
Connection connect() throws SQLException {
Connection connection = DriverManager.getConnection(url, user, password);
if (connection != null) {
System.out.println("Connected to database");
}
return connection;
}
3. Pagination With JDBC
Our dataset contains about 1M records, and querying it all together puts pressure not only on the database but also on bandwidth since more data needs to be transferred for a given moment. Additionally, it puts pressure on our in-memory app space since more data needs to fit in RAM. It is always recommended to read and process in pages or batches when reading large datasets.
JDBC doesn’t provide out-of-the-box methods to read in pages, but there are approaches that we can implement by ourselves. We’ll be discussing and implementing two such approaches.
3.1. Using LIMIT And OFFSET
We can use LIMIT and OFFSET along with our select query to return the defined size of results. The LIMIT clause gets us the number of rows that we want to return, while the OFFSET clause skips the defined number of rows from the query result. We can then paginate our query by controlling the OFFSET position.
In the below logic, we’ve defined LIMIT as pageSize and offset as the start position for the reading of the records:
ResultSet readPageWithLimitAndOffset(Connection connection, int offset, int pageSize) throws SQLException {
String sql = """
SELECT * FROM employees
LIMIT ? OFFSET ?
""";
PreparedStatement preparedStatement = connection.prepareStatement(sql);
preparedStatement.setInt(1, pageSize);
preparedStatement.setInt(2, offset);
return preparedStatement.executeQuery();
}
The query result is a single page of data. To read the entire table in pagination, we iterate for each page, process each page’s records, and then move to the next page.
3.2. Using a Sorted Key With LIMIT
We can also take advantage of the sorted key with LIMIT to read results in batches. For example, in our employees table, we have an ID column that is an auto-increment column and has an index on it. We’ll use this ID column to set a lower bound for our page, and LIMIT will help us to set an upper bound for the page:
ResultSet readPageWithSortedKeys(Connection connection, int lastFetchedId, int pageSize) throws SQLException {
String sql = """
SELECT * FROM employees
WHERE id > ? LIMIT ?
""";
PreparedStatement preparedStatement = connection.prepareStatement(sql);
preparedStatement.setInt(1, lastFetchedId);
preparedStatement.setInt(2, pageSize);
return preparedStatement.executeQuery();
}
As we can see in the above logic, we’re passing lastFetchedId as the lower bound for the page, and pageSize would be the upper bound that we set with LIMIT.
4. Testing
Let’s test our logic by writing simple unit tests. For testing, we’ll set up a database and insert 1M records into the table. We’re running setup() and tearDown() methods once per test class for setting up test data and tearing it down:
@BeforeAll
public static void setup() throws Exception {
connection = connect(JDBC_URL, USERNAME, PASSWORD);
populateDB();
}
@AfterAll
public static void tearDown() throws SQLException {
destroyDB();
}
The populateDB() method first creates an employees table and inserts sample records for 1M employees:
private static void populateDB() throws SQLException {
String createTable = """
CREATE TABLE EMPLOYEES (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
salary DECIMAL(10, 2)
);
""";
PreparedStatement preparedStatement = connection.prepareStatement(createTable);
preparedStatement.execute();
String load = """
INSERT INTO EMPLOYEES (first_name, last_name, salary)
VALUES(?,?,?)
""";
IntStream.rangeClosed(1,1_000_000).forEach(i-> {
PreparedStatement preparedStatement1 = null;
try {
preparedStatement1 = connection.prepareStatement(load);
preparedStatement1.setString(1,"firstname"+i);
preparedStatement1.setString(2,"lastname"+i);
preparedStatement1.setDouble(3, 100_000+(1_000_000-100_000)+Math.random());
preparedStatement1.execute();
} catch (SQLException e) {
throw new RuntimeException(e);
}
});
}
Our tearDown() method destroys the employees table:
private static void destroyDB() throws SQLException {
String destroy = """
DROP table EMPLOYEES;
""";
connection
.prepareStatement(destroy)
.execute();
}
Once we’ve set up the test data, we can write a simple unit test for the LIMIT and OFFSET approach to verify the page size:
@Test
void givenDBPopulated_WhenReadPageWithLimitAndOffset_ThenReturnsPaginatedResult() throws SQLException {
int offset = 0;
int pageSize = 100_000;
int totalPages = 0;
while (true) {
ResultSet resultSet = PaginationLogic.readPageWithLimitAndOffset(connection, offset, pageSize);
if (!resultSet.next()) {
break;
}
List<String> resultPage = new ArrayList<>();
do {
resultPage.add(resultSet.getString("first_name"));
} while (resultSet.next());
assertEquals("firstname" + (resultPage.size() * (totalPages + 1)), resultPage.get(resultPage.size() - 1));
offset += pageSize;
totalPages++;
}
assertEquals(10, totalPages);
}
As we can see above, we’re also looping until we’ve read all the database records in pages, and for each page, we’re verifying the last read record.
Similarly, we can write another test for pagination with sorted keys using the ID column:
@Test
void givenDBPopulated_WhenReadPageWithSortedKeys_ThenReturnsPaginatedResult() throws SQLException {
PreparedStatement preparedStatement = connection.prepareStatement("SELECT min(id) as min_id, max(id) as max_id FROM employees");
ResultSet resultSet = preparedStatement.executeQuery();
resultSet.next();
int minId = resultSet.getInt("min_id");
int maxId = resultSet.getInt("max_id");
int lastFetchedId = 0; // assign lastFetchedId to minId
int pageSize = 100_000;
int totalPages = 0;
while ((lastFetchedId + pageSize) <= maxId) {
resultSet = PaginationLogic.readPageWithSortedKeys(connection, lastFetchedId, pageSize);
if (!resultSet.next()) {
break;
}
List<String> resultPage = new ArrayList<>();
do {
resultPage.add(resultSet.getString("first_name"));
lastFetchedId = resultSet.getInt("id");
} while (resultSet.next());
assertEquals("firstname" + (resultPage.size() * (totalPages + 1)), resultPage.get(resultPage.size() - 1));
totalPages++;
}
assertEquals(10, totalPages);
}
As we can see above, we’re looping over the entire table to read all the data, one page at a time. We’re finding minId and maxId that’ll help us define our iteration window for the loop. Then, we’re asserting the last read record for each page and the total page size.
5. Conclusion
In this article, we discussed reading large datasets in batches instead of reading them all in one query. We discussed and implemented two approaches along with a unit test verifying the working.
LIMIT and OFFSET methods may turn inefficient for large datasets since they read all the rows and skips defined by OFFSET position, while the sorted key approach is efficient since it only queries relevant data using a sorted key that is indexed as well.
As always, the example code is available over on GitHub.