![]() |
Replacing duplicates in a randomly generated array
Hello all! I've been working on a small C program to help me study. The basic idea is that I can randomly generate a problem set from a given range of questions in a textbook to a size I choose, and have the program spit out that set in numerical order (yes, I do realize that there are much easier ways to do this. I chose to use this as an opportunity to practice some coding as well :D ). The only issue I've had is that the rand function does spit out duplicate numbers when I run it, so I'm wondering if anyone has any suggestions on how to remove and replace duplicate numbers within an array.
If it helps I've also attached my code thus far. It's nothing too complex, but then I'm also majoring in EE, not Comp Sci, so I haven't had much occasion to learn anything more advanced Code:
/* Generate random problem set for a given range |
Re: Replacing duplicates in a randomly generated array
This is a good example of time complexity.
The obvious easy way to "eliminate duplicates" is to just check each number you've picked so far, and if it is equivalent to your current number, skip it! However, if this was for very large numbers it would not be good because it would have O(n^2) ("Big-O of n squared") time complexity. It would have to do a set of instructions for all n elements fo the array, but for each element, it is checking at least some part of the (n-1) other elements, creating a polynomial. We take the largest term in the polynomial and simplify it to say that "The number of instructions processed in the worst case will be on the order of n^2", or O(n^2). Thankfully, in your case, I doubt you will be processing hundreds, thousands, or millions of questions, so we don't have to think any harder about better ways to do it. I don't have any off the top of my head right now, so I don't have one! EDIT: I thought of one. You could set up a binary tree containing the questions you've selected so far, and finding whether or not a number has been picked would be O(log n) making the entire algorithm O(n log n). This is way overcomplicated for your efforts :) I just love algorithms! |
Re: Replacing duplicates in a randomly generated array
One way to represent a set is by using a bit array:
|
Re: Replacing duplicates in a randomly generated array
I assume you have a known total number of questions.
How about you fill an array with all available question numbers. Then traverse through the array, switching the value of each position with a random position. This will randomize the array. (See a card deck shuffling program if you don't understand.) Then pick the top N of them. Sort those N if desired. |
Re: Replacing duplicates in a randomly generated array
Since you're already sorting the problems into numerical order, you could just eliminate duplicates as you print out the list: "If the number I'm about to print is the same as the one I printed last time through the loop, just skip it".
There are other ways to generate your list that won't create duplicates. How about this, create an array of all of the possible questions, then do 'n' random swaps inside this array, then read however many questions you want from the start of the list. (essentially "shuffle" the list and then take the top 'n') Or, create said array, and randomly pull questions out of the array (using rand to create an index in to the array). As you take a question, over-write that array entry with -1 and whenever you try to pull an entry that has -1 in it, re-roll another random index. (this probably isn't a great approach...) If you really need to take a set of data and efficiently eliminate duplicates, you can use std::set but I'd recommend doing without that until you learn more of the basics. |
Re: Replacing duplicates in a randomly generated array
Without looking through your solution, I'll make a few suggestions. This is also related to an interview question we give to SW interns.
One approach is to keep track of the numbers you've already used by searching and rejecting duplicates. This is actually a pretty bad approach as the execution time of your solution is now largely dependent on the behavior of rand(). In fact, with a sufficiently bad rand() or a large enough set of questions, your program may never finish. A different approach is to build an index array from 1 to N where N is the number of problems to choose from and you want to randomly select M Problems from it. The first random number you generate is in the range of 1 to N inclusive. You remove that element from the index array and add it to your homework array. The index array is now N-1 in size, so you generate a random number from 1 to N-1, extract, and add it to homework array, etc. You continue to shrink the range of the random number and the index array at the same rate until you have extracted as many indices as you like -- M of them. The index array only ever contains unclaimed problems, the execution time is constant, no duplicates, etc. Another approach is to make the index array and perform an inplace perfect shuffle on it. You may want to google perfect shuffle algorithm for exposure to some of them. Then take the first M. Note that this is pretty closely related to the previous approach. Greg McKaskle |
Re: Replacing duplicates in a randomly generated array
Another possible technique, and probably the fastest executing, would be to generate an array of all possible questions (say 1-100), randomize it, and select the top however many items (say, 10). This would give you no duplicates and would not have time complexity issues. A logical way to think of this is randomly generating a number, where the range of numbers possibly generated is only those which have not yet been generated.
This is how a trivia game on our website works. |
Re: Replacing duplicates in a randomly generated array
Simple (and very fast) TP7 implementation: Code:
{ Choose 20 numbers from the set [1..10000] with no duplicates } |
Re: Replacing duplicates in a randomly generated array
Quote:
{$APPTYPE CONSOLE} ... and change "word" to "dword" (4 places), and you've got a 32-bit Delphi console app which will select 100 non-duplicate random values from a pool of one million in less than 0.004 seconds on an 8-year-old desktop PC. |
Re: Replacing duplicates in a randomly generated array
Quote:
|
Re: Replacing duplicates in a randomly generated array
@Zholl:
The algorithm I posted works very fast for very large sample spaces. It does not require the overhead of removing samples from the array, nor does it require the overhead of shuffling the entire sample space. The header is to allow 16-bit TP7 code to be compiled by Delphi into a 32-bit app that can run natively under Windows. You can code the algorithm in your language of choice. |
Re: Replacing duplicates in a randomly generated array
I'll note that I've used Greg McKaskle's last solutions in several simulation and test applications in Java by using something similar to this:
Code:
LinkedList<Object> someList = getList();
|
Re: Replacing duplicates in a randomly generated array
Quote:
|
Re: Replacing duplicates in a randomly generated array
You may use an obscure programming language when it isn't mentioned for the first 6 pages of google results...
|
Re: Replacing duplicates in a randomly generated array
I'll just call it pseudo-code from now on :-) I just re-ran it with a larger sample space. On an 8-year-old desktop PC running XP Pro, it took a little over 1 millisecond to select 10 thousand non-dup random samples from a 100 million sample space (neglecting the time required to populate the sample space). |
| All times are GMT -5. The time now is 22:50. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi