There are many ways to approach writing test cases, one of them being writing test cases that cover discovered bugs. In this post, I’ll walk through my train of thoughts into writing one test case for a bug in the Linux kernel’s AMDGPU Display Mode Library.
Background story
The file of interest here is located inside the Linux kernel repository at
drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
. To try and understand
the approach into building the test case, take a look at the patch series that
fixes the
issue.
Before commit 8861c27a
, the function dcn21_update_bw_bounding_box
worked
fine, but its 2192-byte frame size triggered a warning for exceeding the stack
size that could be used. An attempt to fix this problem is made by commit
8861c27a
, which removes a very large variable from the function. Although this
solution does fix the stack size problem, the new version of
dcn21_update_bw_bounding_box
is not functionally equivalent to the old one.
Then comes commit 9ad5d02c
tackling the two previous issues: the stack size no
longer exceeds the specified limit, and dcn21_update_bw_bounding_box
now
presents the same behavior as it did before 8861c27a
.
Setup and playing a bit
To get a grasp of what was accomplished, it could be desirable to clone the repository and play around a bit.
git clone --branch blog-post-03 https://github.com/magalilemes/linux.git
Then navigate into the directory:
cd linux
Run the command below to run all KUnit’s AMDGPU tests, they are all expected to run successfully:
./tools/testing/kunit/kunit.py run --arch=x86_64 --kunitconfig=drivers/gpu/drm/amd/display/tests
Roll back drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
to the commit
version that introduced the issue:
git checkout 8861c27a -- drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
When running the KUnit tests again, it is natural and expected that the test will fail:
./tools/testing/kunit/kunit.py run --arch=x86_64 --kunitconfig=drivers/gpu/drm/amd/display/tests
Roll back drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
to a point
before the problematic commit was introduced:
git checkout 8861c27a^ -- drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
Finally run the KUnit tests again, and notice that they all pass:
./tools/testing/kunit/kunit.py run --arch=x86_64 --kunitconfig=drivers/gpu/drm/amd/display/tests
Writing a test case for dcn21_update_bw_bounding_box
The task at hand is to write a test case that fails when the file of the
function to be tested is at version 8861c27a
. Run the command below to have
only dcn20_fpu.c
at that version.
git checkout 8861c27a -- drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
If everything worked correctly, the function below is the same as the one in
dcn20_fpu.c
. Now, the next step is getting a gist of the function.
void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params)
{
struct dcn21_resource_pool *pool = TO_DCN21_RES_POOL(dc->res_pool);
struct clk_limit_table *clk_table = &bw_params->clk_table;
unsigned int i, closest_clk_lvl = 0, k = 0;
int j;
dc_assert_fp_enabled();
dcn2_1_ip.max_num_otg = pool->base.res_cap->num_timing_generator;
dcn2_1_ip.max_num_dpp = pool->base.pipe_count;
dcn2_1_soc.num_chans = bw_params->num_channels;
ASSERT(clk_table->num_entries);
/* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */
for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
}
for (i = 0; i < clk_table->num_entries; i++) {
/* loop backwards*/
for (closest_clk_lvl = 0, j = dcn2_1_soc.num_states - 1; j >= 0; j--) {
if ((unsigned int) dcn2_1_soc.clock_limits[j].dcfclk_mhz <= clk_table->entries[i].dcfclk_mhz) {
closest_clk_lvl = j;
break;
}
}
/* clk_table[1] is reserved for min DF PState. skip here to fill in later. */
if (i == 1)
k++;
dcn2_1_soc.clock_limits[k].state = k;
dcn2_1_soc.clock_limits[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz;
dcn2_1_soc.clock_limits[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz;
dcn2_1_soc.clock_limits[k].socclk_mhz = clk_table->entries[i].socclk_mhz;
dcn2_1_soc.clock_limits[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2;
dcn2_1_soc.clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
dcn2_1_soc.clock_limits[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
dcn2_1_soc.clock_limits[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
dcn2_1_soc.clock_limits[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
dcn2_1_soc.clock_limits[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
k++;
}
if (clk_table->num_entries) {
dcn2_1_soc.num_states = clk_table->num_entries + 1;
/* fill in min DF PState */
dcn2_1_soc.clock_limits[1] = construct_low_pstate_lvl(clk_table, closest_clk_lvl);
/* duplicate last level */
dcn2_1_soc.clock_limits[dcn2_1_soc.num_states] = dcn2_1_soc.clock_limits[dcn2_1_soc.num_states - 1];
dcn2_1_soc.clock_limits[dcn2_1_soc.num_states].state = dcn2_1_soc.num_states;
}
dml_init_instance(&dc->dml, &dcn2_1_soc, &dcn2_1_ip, DML_PROJECT_DCN21);
}
This function has only two parameters:
struct dc *
: a pointer to a Display Core1 control structure;struct clk_bw_params *
: a pointer to a structure for clocks and bandwidth.
After the declaration and initialization of variables, dc_assert_fp_enabled
is
called to ensure that the function is being executed under FPU protection, as it
is about to perform some floating point operations. Then some values from the
global variable dcn2_1_ip
are updated based on the provided parameters.
Finally, before the first loop begins, it is assured that the clock table from
struct clk_bw_params
has at least one entry.
In the first loop, an odd thing happens:
/* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */
for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
}
It seems like nothing meaninful is happening, since the elements of the clock
limits array are being set to themselves. At this point, it is useful to go back
and see how the function behaved before 8861c27a
:
/* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */
for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
clock_limits[i] = dcn2_1_soc.clock_limits[i];
}
That makes more sense, and is coherent with the comment above it. It’s important
to notice that commit 8861c27a
replaced occurences of clock_limits
wit
dcn2_1_soc.clock_limits
, so that is why the first for loop is like that.
Finding the problem
Moving on, there’s another loop. Notice its last lines:
for (i = 0; i < clk_table->num_entries; i++) {
[...]
dcn2_1_soc.clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
dcn2_1_soc.clock_limits[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
dcn2_1_soc.clock_limits[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
dcn2_1_soc.clock_limits[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
dcn2_1_soc.clock_limits[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
k++;
}
Compare with the previous version, which used clock_limits
to store the clock
limits values before assigning them to dcn2_1_soc.clock_limits
directly:
for (i = 0; i < clk_table->num_entries; i++) {
[...]
clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
clock_limits[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
clock_limits[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
clock_limits[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
clock_limits[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
clock_limits[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
clock_limits[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
k++;
}
for (i = 0; i < clk_table->num_entries + 1; i++)
dcn2_1_soc.clock_limits[i] = clock_limits[i];
Now to make it easier to visualize, suppose that before
dcn21_update_bw_bounding_box
is called, some values from
dcn2_1_soc.clock_limits
are as follows:
dcn2_1_soc.clock_limits[2].dispclk = 757.89;
dcn2_1_soc.clock_limits[3].dispclk = 847.06;
dcn2_1_soc.clock_limits[4].dispclk = 900.00;
Now after invoking dcn21_update_bw_bounding_box
, what happens with the
values of dcn2_1_soc.clock_limits[k].dispclk
for k = 3, closest_clk_lvl = 2;
and for k = 4, closest_clk_lvl = 3?
Previous version (8861c27a^): *
clock_limits[3].dispclk_mhz
is set todcn2_1_soc.clock_limits[2].dispclk_mhz
; * thenclock_limits[4].dispclk_mhz
is set todcn2_1_soc.clock_limits[3].dispclk_mhz
.After the end of the loop,
clock_limits
is copied ontodcn2_1_soc.clock_limits
, resulting in:
dcn2_1_soc.clock_limits[3].dispclk = 757.89;
dcn2_1_soc.clock_limits[4].dispclk = 847.06;
Problematic version (8861c27a): *
dcn2_1_soc.clock_limits[3].dispclk_mhz
is set todcn2_1_soc.clock_limits[2].dispclk_mhz
; * then in the next iteration,dcn2_1_soc.clock_limits[4].dispclk_mhz
is set todcn2_1_soc.clock_limits[3].dispclk_mhz
.After the end of the loop, this is the result:
dcn2_1_soc.clock_limits[3].dispclk = 757.89;
dcn2_1_soc.clock_limits[4].dispclk = 757.89;
The part to pay attention here is that by dropping the intermediate variable,
dcn2_1_soc.clock_limits
ends up being overwritten by itself, and some values
that it stored may be lost in the way.
Building the test case
After understanding where the problem lays, it is time to write a test case that covers it.
The function parameters will be as simple as possible, only the necessary struct members will have their values set:
.dc = {
.res_pool = &(struct resource_pool) {
.pipe_count = 2,
.res_cap = &(struct resource_caps) {
.num_timing_generator = 3
},
},
}
bw_params.clk_table.entries[]
values will be set in a way so that a situation
where (k = 3, closest_clk_lvl = 2) and (k = 4, closest_clk_lvl = 3) will be
created, closely following what was discussed above.
.bw_params {
.num_channels = 1,
.clk_table = {
.entries = {
{
.voltage = 0,
.dcfclk_mhz = 200,
.fclk_mhz = 400,
.memclk_mhz = 800,
.socclk_mhz = 0,
},
{
.voltage = 0,
.dcfclk_mhz = 201,
.fclk_mhz = 800,
.memclk_mhz = 1600,
.socclk_mhz = 0,
},
{
.voltage = 0,
.dcfclk_mhz = 553,
.fclk_mhz = 1067,
.memclk_mhz = 1067,
.socclk_mhz = 0,
},
{
.voltage = 0,
.dcfclk_mhz = 602,
.fclk_mhz = 1333,
.memclk_mhz = 1600,
.socclk_mhz = 0,
},
{
.voltage = 0,
.dispclk_mhz = 1372,
.dppclk_mhz = 1372,
.phyclk_mhz = 810,
.phyclk_d18_mhz = 667,
.dtbclk_mhz = 600,
},
},
.num_entries = 5,
}
}
dcn2_1_soc
is a global variable, and its original values will be used for this
test case, so no need to set it differently.
After having the function input, the next step is building the expected values.
For this test case, dcn2_1_soc
is the one expected to change. One easy way (is
it the right one, though?) to obtain the expected values is running the function
when it is in its correct state, and seeing what was yielded. Finally, one test
case is ready: it is now time to think about other ways to write others.