

"I'd like the teetertotter..."  Billfred [more] 



Thread Tools  Rate Thread  Display Modes 
#1




Replacing duplicates in a randomly generated array
Hello all! I've been working on a small C program to help me study. The basic idea is that I can randomly generate a problem set from a given range of questions in a textbook to a size I choose, and have the program spit out that set in numerical order (yes, I do realize that there are much easier ways to do this. I chose to use this as an opportunity to practice some coding as well ). The only issue I've had is that the rand function does spit out duplicate numbers when I run it, so I'm wondering if anyone has any suggestions on how to remove and replace duplicate numbers within an array.
If it helps I've also attached my code thus far. It's nothing too complex, but then I'm also majoring in EE, not Comp Sci, so I haven't had much occasion to learn anything more advanced Code:
/* Generate random problem set for a given range input minimum number input max number input set size generate problems of number set size within range output in numerical order */ #include <stdio.h> #include <stdlib.h> #include <time.h> int main(void) { int setSize, max, min; //inputs int set[30] = {0}; //starting array int range; //range for random generation int gen, i, j, k; //indexes int temp; //bubble sort temporary storage //collect inputs printf("Minimum problem number: "); scanf("%d", &min); printf("Maximum problem number: "); scanf("%d", &max); printf("Size of problem set: "); scanf("%d", &setSize); //calculate range for generation range = max  min; //seed random number generator srand(time(NULL)); //generate problem set to given size for (gen = 0; gen < setSize; gen++) { set[gen] = ((rand() % range) + min); //rand generates 0 to range, adding min compensates for bottom edge of problem range } //bubble sort from least to greatest for (i = 29; i > 0; i) { for (j = 1; j <= i; j++) { if (set[j1] > set[j]) { temp = set[j1]; set[j1] = set[j]; set[j] = temp; } } } //display problem set, excluding 0s for (k = 0; k <= 30; k++) { if (set[k] > 0) { printf("%d\n", set[k]); } } } 
#2




Re: Replacing duplicates in a randomly generated array
This is a good example of time complexity.
The obvious easy way to "eliminate duplicates" is to just check each number you've picked so far, and if it is equivalent to your current number, skip it! However, if this was for very large numbers it would not be good because it would have O(n^2) ("BigO of n squared") time complexity. It would have to do a set of instructions for all n elements fo the array, but for each element, it is checking at least some part of the (n1) other elements, creating a polynomial. We take the largest term in the polynomial and simplify it to say that "The number of instructions processed in the worst case will be on the order of n^2", or O(n^2). Thankfully, in your case, I doubt you will be processing hundreds, thousands, or millions of questions, so we don't have to think any harder about better ways to do it. I don't have any off the top of my head right now, so I don't have one! EDIT: I thought of one. You could set up a binary tree containing the questions you've selected so far, and finding whether or not a number has been picked would be O(log n) making the entire algorithm O(n log n). This is way overcomplicated for your efforts I just love algorithms! Last edited by BigJ : 07302012 at 08:27 PM. 
#3




Re: Replacing duplicates in a randomly generated array
One way to represent a set is by using a bit array:
Last edited by Kristian Calhoun : 07302012 at 09:06 PM. 
#4




Re: Replacing duplicates in a randomly generated array
I assume you have a known total number of questions.
How about you fill an array with all available question numbers. Then traverse through the array, switching the value of each position with a random position. This will randomize the array. (See a card deck shuffling program if you don't understand.) Then pick the top N of them. Sort those N if desired. 
#5




Re: Replacing duplicates in a randomly generated array
Since you're already sorting the problems into numerical order, you could just eliminate duplicates as you print out the list: "If the number I'm about to print is the same as the one I printed last time through the loop, just skip it".
There are other ways to generate your list that won't create duplicates. How about this, create an array of all of the possible questions, then do 'n' random swaps inside this array, then read however many questions you want from the start of the list. (essentially "shuffle" the list and then take the top 'n') Or, create said array, and randomly pull questions out of the array (using rand to create an index in to the array). As you take a question, overwrite that array entry with 1 and whenever you try to pull an entry that has 1 in it, reroll another random index. (this probably isn't a great approach...) If you really need to take a set of data and efficiently eliminate duplicates, you can use std::set but I'd recommend doing without that until you learn more of the basics. 
#6




Re: Replacing duplicates in a randomly generated array
Without looking through your solution, I'll make a few suggestions. This is also related to an interview question we give to SW interns.
One approach is to keep track of the numbers you've already used by searching and rejecting duplicates. This is actually a pretty bad approach as the execution time of your solution is now largely dependent on the behavior of rand(). In fact, with a sufficiently bad rand() or a large enough set of questions, your program may never finish. A different approach is to build an index array from 1 to N where N is the number of problems to choose from and you want to randomly select M Problems from it. The first random number you generate is in the range of 1 to N inclusive. You remove that element from the index array and add it to your homework array. The index array is now N1 in size, so you generate a random number from 1 to N1, extract, and add it to homework array, etc. You continue to shrink the range of the random number and the index array at the same rate until you have extracted as many indices as you like  M of them. The index array only ever contains unclaimed problems, the execution time is constant, no duplicates, etc. Another approach is to make the index array and perform an inplace perfect shuffle on it. You may want to google perfect shuffle algorithm for exposure to some of them. Then take the first M. Note that this is pretty closely related to the previous approach. Greg McKaskle Last edited by Greg McKaskle : 07302012 at 08:38 PM. Reason: typo 
#7




Re: Replacing duplicates in a randomly generated array
Another possible technique, and probably the fastest executing, would be to generate an array of all possible questions (say 1100), randomize it, and select the top however many items (say, 10). This would give you no duplicates and would not have time complexity issues. A logical way to think of this is randomly generating a number, where the range of numbers possibly generated is only those which have not yet been generated.
This is how a trivia game on our website works. 
#8




Re: Replacing duplicates in a randomly generated array
Simple (and very fast) TP7 implementation: Code:
{ Choose 20 numbers from the set [1..10000] with no duplicates } CONST size=10000; choose=20; VAR a: array[1..size] of word; Procedure Initialize; var i: word; begin for i:=1 to size do a[i]:=i; end; Procedure Shuffle; var i,j,temp: word; begin for i:=1 to choose do begin j:=random(size)+1; temp:=a[i]; a[i]:=a[j]; a[j]:=temp; end; end; Procedure Select; var i: word; begin for i:=1 to choose do writeln(a[i]:8); end; CONST cr=#13#10; BEGIN write('Initializing...'); Initialize; write(cr,'Seed the RNG...'); Randomize; write(cr,'Shuffle...'); Shuffle; write(cr,'Select...',cr); Select; write('done. press ENTER '); readln; END. Last edited by Ether : 07312012 at 05:38 PM. 
#9




Re: Replacing duplicates in a randomly generated array
Add the following header:
{$APPTYPE CONSOLE} ... and change "word" to "dword" (4 places), and you've got a 32bit Delphi console app which will select 100 nonduplicate random values from a pool of one million in less than 0.004 seconds on an 8yearold desktop PC. 
#10




Re: Replacing duplicates in a randomly generated array
Quote:

#11




Re: Replacing duplicates in a randomly generated array
@Zholl:
The algorithm I posted works very fast for very large sample spaces. It does not require the overhead of removing samples from the array, nor does it require the overhead of shuffling the entire sample space. The header is to allow 16bit TP7 code to be compiled by Delphi into a 32bit app that can run natively under Windows. You can code the algorithm in your language of choice. 
#12




Re: Replacing duplicates in a randomly generated array
I'll note that I've used Greg McKaskle's last solutions in several simulation and test applications in Java by using something similar to this:
Code:
LinkedList<Object> someList = getList(); Collections.rotate(someList, (int)(Math.random()*(someList.size()  1))); Object randomObject = someList.remove();

#13




Re: Replacing duplicates in a randomly generated array
What is TP7? Your code looks like standard Pascal (with unusual style choices for capitalization).

#14




Re: Replacing duplicates in a randomly generated array
You may use an obscure programming language when it isn't mentioned for the first 6 pages of google results...

#15




Re: Replacing duplicates in a randomly generated array
I'll just call it pseudocode from now on :) I just reran it with a larger sample space. On an 8yearold desktop PC running XP Pro, it took a little over 1 millisecond to select 10 thousand nondup random samples from a 100 million sample space (neglecting the time required to populate the sample space). 
Thread Tools  
Display Modes  Rate This Thread 

