ECE-1021
HOMEWORK #7
(Last Mod: 27 November 2010 21:38:39 )
Download the program randgen.c from the Example Code page and run it. This program will ask for the name of a text file, how many data values you want printed to the file, and what the minimum and maximum values are that you desire. It will then write that many data values to the file.
You are to write a program that then asks the user for a file name, opens that file, and prints out the following information about the file:
DATA
FILE: mydata.dat
NUMBER
OF PTS: 2000
MINIMUM
VALUE: 1.283944
MAXIMUM
VALUE: 77.387448
MEAN:
23.127845 STD DEV: 5.35867
sigmas
lower upper combined
0.5 17.78% 13.29% 30.97%
1.0 32.80% 29.12% 61.92%
The table at the end of the print out above lists the fraction of the total data that lies within the indicated number of standard deviations of the mean. In other words, it says that 17.78% of the values were between the mean and (mean-0.5*sigma) less than the mean while 13.29% were between the mean and (mean+ 0.5*sigma). The combined value is simply the fraction of the values that lied within that many standard deviations of the mean regardless of what side they were on and hence is the sum of the lower and upper values. Your table should print data for 0.5 to 3.0 sigmas.
For simplicity's sake, you may assume that the data file will contain values less than 1000 and no smaller than 0.001 - of course you have control over that when you generate your data file. Plan your output accordingly and assume that the grader will use data sets at both ends of that range. It is acceptable to simply print all data values to six decimal places. Percentages should be reported to two decimal places.
After you have printed out the above information, generate a histogram from the data. The number of bins used and the bin size should be reasonable for the data set. There should be an odd number of bins and the mean should be located in the middle of the center bin. Since you need to use an array to store the contents of each bin, you may place a reasonable cap of 100 on the maximum number.
Your histogram should start with the first bin that has data and continue until the last bin that has data is displayed. Bins outside of this range should not be displayed at all. Empty bins within this range must be displayed.
The following is an example format - note that the values used are random junk - it's the format that is important.
HISTOGRAM
BINS:
1
5
WIDTH:
0.456532
%/*: 0.48%
MAX %: 24.23%
======================================================================
9999.020435
to 9999.020675 |****
9999.020435
to 9999.020675 |*******
9999.020435
to 9999.020675 |***********
9999.020435
to 9999.020675 |*************
9999.020435
to 9999.020675 |*********************
9999.020435
to 9999.020675 |********************
9999.020435
to 9999.020675 |*****************************************
9999.020435
to 9999.020675 |*********************************
9999.020435
to 9999.020675 |*******************************
9999.020435
to 9999.020675 |************************************
9999.020435
to 9999.020675 |**********************
9999.020435
to 9999.020675 |******************
9999.020435
to 9999.020675 |************
9999.020435
to 9999.020675 |***
9999.020435
to 9999.020675 |****
======================================================================
The %/* value is how many percentage points each asterisk on a line represents. This number should be chosen so that the largest bin uses close to the maximum number of asterisks that can be printed without causing the line to wrap. The above format would permit approximately fifty asterisks to be used.
After that, your program should sort the data from minimum to maximum. You should give the user an option of which sorting algorithm to use and then print out the method and how long it took. After that, generate a table of percentile break points such as the following
SORT
METHOD: Bubble Sort
SORT
TIME: 32.345 seconds.
PERCENTILE
GROUPS
MAX
8916.981726
99 8723.324293
95 8237.982745
90 7923.239423
85 7392.029384
80 6829.293948
75 6072.283841
70 5582.398384
...
10 2313.349858
5 2109.928374
0 1928.384885
The 0th percentile is, by definition, that value that is greater than zero percent of the values in the data set. Hence, it is simply the lowest value in the data set. There is no such thing as the 100th percentile as this would require a value that is larger than all values in the data set, including itself, which is impossible. But it is still useful to know what the absolute largest value is and so that is simply called "MAX" and is printed above the 99th percentile score.
Finally, your program should ask the User for a lower and an upper limit and then print out to the screen all values that are between those limits. Values that exactly match the limits should be included. Prior to the printed list, you should indicate how many values were found and the percentile range of the data subset.
LIMIT
MIN: 3400.000000
LIMIT
MAX: 5800.000000
RANGE
MIN: 3412.928384 (17 %ile)
RANGE
MAX: 5766.601849 (72 %ile)
DATA
SIZE: 763 points
3412.928384
....
5766.601849