This query was inspired from the essay at
http://www.artima.com/cppsource/how_to_go_slow.html.
If you look at the
section regarding Thra****ng, about half-way through the page you'll see
the following -
-- Start extract
Consider some code that sums a square array:
for (row = 0; row < N;, ++row)
for (col = 0; col < N; ++col)
sum += A[row][col];
Or you can do it the other way round:
for (col = 0; col < N; ++col)
for (row = 0; row < N; ++row)
sum += A[row][col];
-- End of extract
From what I understand, the array int A[8][3] is stored as follows -
|0|1|2|3|4|5|6|7| - row 0
|0|1|2|3|4|5|6|7| - row 1
|0|1|2|3|4|5|6|7| - row 2
Later in this article, the author says - "On my machine summing a
billion bytes row-wise takes 9 seconds, whereas summing them column-wise
takes 51 seconds. The less sequential your data access, the slower your
program will run."
I'm a little confused by this line. Does he mean that the fastest
sequential access to each element would be -
A[1][0];
A[2][0];
A[3][0];
A[1][1];
A[2][1];
A[3][1];
A[1][2];
A[2][2];
A[3][2];
A[1][4];
..
..
....so on
I am aware that this is not the best approach to optimization and I
would get better returns through a different algorithm, et al. This is
simply a C-newbie's query regarding some aspect of the language
implementation. Please be gentle.
..p