Sorting benchmarks - Professional blog

Input for Both Parts
An input file of 1,000,000 randomly generated names (one per line) is posted in this module. There are a few
different versions of the file and it will serve as the input to both parts of the assignment. Choose the one that
best suits you:
Windows users: names_windows.txtPreview the document
Mac or Linux users: names.txtPreview the document (this has the control-M’s (extra carriage return) removed)
If you use the Cloud9 IDE, there is an upload limit which may prevent you from uploading large files. Use this
compressed version, which you will need to unzip after uploading: names.zip
Part 1, Perform A Simple Timing Study

In this lab you will apply the “4-cycle timing code” posted to Canvas in this module here:
timingtests.cpp.Preview the document
The purpose is to prepare you for making “big oh” determinations and confirming them with timing studies.
How to Use timingtests.cpp
Th code in this file reads a predetermined number of lines from the data file into a vector. This can be changed
by changing the constant NLINES at the top of the file. You probably won’t need to change NLINES but keep in
mind that the data file itself contains only one million lines so NLINES cannot be larger than that.
For each part of this lab, use the file like this:
Add your code to the timed section. The clock() calls should come before and after your code, to record the
timing.
Set the string variable ‘bigOh’ to your guess as to the “big O” of your code (see the file for the options).
Set the variable ‘n’ to the input size. Start with the default size, then increase if you see 0’s or numbers that
look strange.
Run your program.
The output should show growth of the time according to what you guessed:
The first entry is a baseline.
The program will double the input every time through the loop.
Each subsequent line is the timing of your code with the doubling of the input. For example, if you selected “n”
and the first output timing was 2 seconds, the next one would be 4, then 8, 16, etc.
Re-adjust ‘n’ and/or ‘bigOh’ as needed and run again, if the numbers do not match your expectations.
Your Code for Part 1
Write a C++ console app, applying the timing test code to an operation that requires scanning the entire vector
of names read from the file.
You’re looking for some operation that’s close to O(n) here. Here are a few examples of loops that would work:
Finds the shortest name in the array
Finds the longest name in the array
Counts how many times a particular first name occurs, like “Victoria”
To perform the timing tests, you will set the starting size with the variable ‘n’ at the top of main(), which controls
how many lines your algorithm will use for input. Also set ‘bigOh’, which represents your “guess” about your
algorithm.
Start with n = 50,000 for the first cycle, and set bigOh to O(n) — that is, we expect that the time it takes to read
the file is directly proportional to the number of lines read from the file.
It is possible that caching may throw off the timing for the first cycle, so run your program more than once to
see it work correctly. Your timings will not be an exact match for the ‘expected’ amounts each time, but should
be close.
Example Output
0.0002 (expected O(n)) for n=50000
0.0004 (expected 0.0004) for n=100000
0.0008 (expected 0.0007) for n=200000 
2/28/2021 Order 337539340
https://admin.writerbay.com/orders_available?subcom=detailed&id=337539340 3/4
0.0015 (expected 0.0015) for n=400000
As n increases each time through the loop, you should see the timing increase proportionally, i.e. doubling the
input should double the processing time.
If you’re seeing numbers that don’t make sense, your computer may be too fast! Try increasing NLINES and
the initial ‘n’ to a larger number. Remember that ‘n’ is doubled every time through the loop, so make sure your
last doubling doesn’t exceed the available data (1M lines).
Part 2, Benchmark a Nested Loop Sort
In the second part you will make your own determination for the big oh of nested for-loop sorting of an array,
and confirm your determination.
The nested for()-loop sort should use to this algorithm (with n as the number of strings):
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
if (a[j] < a[i])
swap(a[i], a[j]);
Your Code for Part 2
Write a C++ console app, applying the same timing test code from the module, to sort the names in your vector
in ascending order. You decide what you expect the “big oh” to be, and what n to use for the first cycle.
Write your app to do the following:
Start the timer, perform the nested for()-loop sort, and stop the timer.
Write a few lines of test code to verify that each string in the array is greater or equal to the string before it,
starting with the 2nd string. Use assert, like this:
for (int i = 1; i < n; i++)
assert (a[i – 1] <= a[i]);
To use assert() you may need to include the header file . These “assertions” are used to abort the program if
some unknown condition occurs, and are mainly enabled in debug/developer builds of the code, where the
programmer wants to catch abnormal conditions as soon as possible. In “production” code (deployed into the
real world), assertions are usually removed

The post Sorting benchmarks first appeared on COMPLIANT PAPERS.

Share this:

Reference no: EM132069492