1. Overview
In this short tutorial, we’ll learn how to efficiently find the number of contiguous subarrays of an array with a given arithmetic mean.
We’ll start with a naive approach that computes the task in a quadratic time complexity O(n^2). Then, we’ll analyze why it’s inefficient and optimize it to a linear O(n) solution using prefix sums and a frequency map.
2. Problem Statement
Let’s first understand what the problem we’re trying to solve is.
Suppose we have an array of integers and a target mean, which is just a single integer. Now, we want to count the number of contiguous subarrays having the specific arithmetic mean starting from an input array. For example, for the array [5, 3, 6, 2] and the target mean of 4, the output should be 3. This is because the subarrays [5, 3], [6, 2], and [5, 3, 6, 2] all have a mean of 4.
Additionally, we want to impose the following constraints:
- The array can contain up to 100,000 elements.
- Each number in the array is in the range of [-1,000,000,000, +1,000,000,000].
- The target mean is in the same range.
Let’s start by solving this problem with a brute-force approach.
3. Brute Force Solution
The most straightforward approach to solve this problem is to start with two nested loops. One is for iterating over all possible indices of the subarray, and the other is for calculating the sum of the subarray and checking if the mean is equal to the target mean:
static int countSubarraysWithMean(int[] inputArray, int mean) {
int count = 0;
for (int i = 0; i < inputArray.length; i++) {
long sum = 0;
for (int j = i; j < inputArray.length; j++) {
sum += inputArray[j];
if (sum * 1.0 / (j - i + 1) == mean) {
count++;
}
}
}
return count;
}
Here, we calculate the sum and length for each subarray to find the mean. If the mean equals the target mean, we increment the count. Also, we multiply by a floating number so the mean is calculated with more precision.
However simple this solution might seem, it’s not performant. When talking about algorithm complexity, we’re usually interested in time complexity. This brute-force solution has a time complexity of O(n^2), which is inefficient. In fact, for our input constraint of 100,000 elements, we’d need to perform 10 billion operations, which is too slow.
Let’s find an alternative, more efficient solution.
4. Linear Solution
Before moving forward, we need to understand two key ideas to optimize the solution to a linear time complexity.
4.1. Understanding Prefix Sums and Frequency Maps
First, the idea of prefix sums allows us to calculate the sum of any subarray in O(1) time. Second, the concept of a frequency map helps us count subarrays by keeping track of the frequency of certain values.
Now, suppose we have an input array X and a target mean S. Let’s define two arrays to represent the prefix sum and the adjusted prefix sum of the input array:
P[i] = X[0] + X[1] + ... + X[i] (prefix sum array)
ADJUSTED_PREFIX_SUM_ARRAY[i] = P[i] - S * i (adjusted prefix sum array)
Then, for any subarray [i, j] with average S, we define Q:
ADJUSTED_PREFIX_SUM_ARRAY[j] = P[j] - S * j
= (P[j] - P[i-1]) - S * (j - (i-1))
= (sum of subarray [i, j]) - (length of subarray [i, j]) * S
= ADJUSTED_PREFIX_SUM_ARRAY[i-1]
ADJUSTED_PREFIX_SUM_ARRAY calculates the sum of the subarray [i, j] and subtracts the expected sum of the subarray with average S. Now, with these values, we can count the number of subarrays with average S by counting the number of pairs of indices (i-1, j) where ADJUSTED_PREFIX_SUM_ARRAY[j] = ADJUSTED_PREFIX_SUM_ARRAY[i-1].
The key insight here is that if we find two indices in the ADJUSTED_PREFIX_SUM_ARRAY array with the same value, the subarray between these indices (not including the earlier index) has the mean S. This means that the subarray from i to j has exactly the mean S:
(P[j] - P[i-1]) - S * (j - (i-1)) = 0
This subarray is equivalent to this one:
P[j] - S * j = P[i-1] - S * (i-1)
The left side is ADJUSTED_PREFIX_SUM_ARRAY[j], and the right side is ADJUSTED_PREFIX_SUM_ARRAY[i-1].
Let’s implement this.
4.2. Java Implementation
Let’s start by creating two arrays:
static int countSubarraysWithMean(int[] inputArray, int mean) {
int n = inputArray.length;
long[] prefixSums = new long[n+1];
long[] adjustedPrefixes = new long[n+1];
// More code
}
Here, we use long values instead of integers to avoid overflow when calculating the sum of sub-arrays containing large values. Then, we calculate the prefix sum array P and the adjusted prefix sum array Q:
for (int i = 0; i < n; i++) {
prefixSums[i+1] = prefixSums[i] + inputArray[i];
adjustedPrefixes[i+1] = prefixSums[i+1] - (long) mean * (i+1);
}
Next, we create a frequency map to count the number of subarrays and return the total count:
Map<Long, Integer> count = new HashMap<>();
int total = 0;
for (long adjustedPrefix : adjustedPrefixes) {
total += count.getOrDefault(adjustedPrefix, 0);
count.put(adjustedPrefix, count.getOrDefault(adjustedPrefix, 0) + 1);
}
return total;
For each adjusted prefix in the prefixes array, we increment the total count by the frequency of adjustedPrefixes seen so far. If the frequency map doesn’t contain the prefix, we return 0.
This solution has a time complexity of O(n). We’re running a for loop twice. First, we calculate the prefix sum and the adjusted prefix sum, and the second time, we count the number of subarrays. The space complexity is also O(n) as we’re using two arrays of size n and a frequency map that could store n entries.
5. Conclusion
In this article, we solved the problem of counting subarrays with a given arithmetic mean. First, we implemented a brute force solution with an O(n^2) time complexity. Then, we optimized it to a linear time complexity O(n) using prefix sums and a frequency map.
As always, the complete code is available over on GitHub.