Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Programming (http://www.chiefdelphi.com/forums/forumdisplay.php?f=51)
-   -   Replacing duplicates in a randomly generated array (http://www.chiefdelphi.com/forums/showthread.php?t=107574)

Zholl 30-07-2012 19:50

Replacing duplicates in a randomly generated array
 
Hello all! I've been working on a small C program to help me study. The basic idea is that I can randomly generate a problem set from a given range of questions in a textbook to a size I choose, and have the program spit out that set in numerical order (yes, I do realize that there are much easier ways to do this. I chose to use this as an opportunity to practice some coding as well :D ). The only issue I've had is that the rand function does spit out duplicate numbers when I run it, so I'm wondering if anyone has any suggestions on how to remove and replace duplicate numbers within an array.

If it helps I've also attached my code thus far. It's nothing too complex, but then I'm also majoring in EE, not Comp Sci, so I haven't had much occasion to learn anything more advanced

Code:

/* Generate random problem set for a given range

input minimum number
input max number
input set size
generate problems of number set size within range
output in numerical order */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
        int setSize, max, min; //inputs
        int set[30] = {0}; //starting array
        int range; //range for random generation
        int gen, i, j, k; //indexes
        int temp; //bubble sort temporary storage

        //collect inputs
        printf("Minimum problem number: ");
        scanf("%d", &min);
        printf("Maximum problem number: ");
        scanf("%d", &max);
        printf("Size of problem set: ");
        scanf("%d", &setSize);

        //calculate range for generation
        range = max - min;
        //seed random number generator
        srand(time(NULL));

        //generate problem set to given size
        for (gen = 0; gen < setSize; gen++) {
                set[gen] = ((rand() % range) + min); //rand generates 0 to range, adding min compensates for bottom edge of problem range
        }

        //bubble sort from least to greatest
        for (i = 29; i > 0; i--) {
                for (j = 1; j <= i; j++) {
                        if (set[j-1] > set[j]) {
                                temp = set[j-1];
                                set[j-1] = set[j];
                                set[j] = temp;
                        }
                }
        }

        //display problem set, excluding 0s
        for (k = 0; k <= 30; k++) {
                if (set[k] > 0) {
                        printf("%d\n", set[k]);
                }
        }
}


BigJ 30-07-2012 20:21

Re: Replacing duplicates in a randomly generated array
 
This is a good example of time complexity.

The obvious easy way to "eliminate duplicates" is to just check each number you've picked so far, and if it is equivalent to your current number, skip it!

However, if this was for very large numbers it would not be good because it would have O(n^2) ("Big-O of n squared") time complexity. It would have to do a set of instructions for all n elements fo the array, but for each element, it is checking at least some part of the (n-1) other elements, creating a polynomial. We take the largest term in the polynomial and simplify it to say that "The number of instructions processed in the worst case will be on the order of n^2", or O(n^2).

Thankfully, in your case, I doubt you will be processing hundreds, thousands, or millions of questions, so we don't have to think any harder about better ways to do it. I don't have any off the top of my head right now, so I don't have one!

EDIT: I thought of one. You could set up a binary tree containing the questions you've selected so far, and finding whether or not a number has been picked would be O(log n) making the entire algorithm O(n log n). This is way overcomplicated for your efforts :) I just love algorithms!

Kristian Calhoun 30-07-2012 20:30

Re: Replacing duplicates in a randomly generated array
 
One way to represent a set is by using a bit array:
  1. Create an array of size (max - min + 1).
  2. Initialize each element of this bit array to 0.
  3. When you generate a random number, x, check the value of the element in the bit array at index [x - min].
    • If the bit array value is 0, set the value at [x-min] in the bit array to 1.
    • If the bit array value is 1, then x already exists in the set, so generate another random number.
  4. Repeat the process until you've added setSize number of elements to the set.

GaryVoshol 30-07-2012 20:30

Re: Replacing duplicates in a randomly generated array
 
I assume you have a known total number of questions.

How about you fill an array with all available question numbers. Then traverse through the array, switching the value of each position with a random position. This will randomize the array. (See a card deck shuffling program if you don't understand.) Then pick the top N of them. Sort those N if desired.

Hjelstrom 30-07-2012 20:34

Re: Replacing duplicates in a randomly generated array
 
Since you're already sorting the problems into numerical order, you could just eliminate duplicates as you print out the list: "If the number I'm about to print is the same as the one I printed last time through the loop, just skip it".

There are other ways to generate your list that won't create duplicates. How about this, create an array of all of the possible questions, then do 'n' random swaps inside this array, then read however many questions you want from the start of the list. (essentially "shuffle" the list and then take the top 'n')

Or, create said array, and randomly pull questions out of the array (using rand to create an index in to the array). As you take a question, over-write that array entry with -1 and whenever you try to pull an entry that has -1 in it, re-roll another random index. (this probably isn't a great approach...)

If you really need to take a set of data and efficiently eliminate duplicates, you can use std::set but I'd recommend doing without that until you learn more of the basics.

Greg McKaskle 30-07-2012 20:35

Re: Replacing duplicates in a randomly generated array
 
Without looking through your solution, I'll make a few suggestions. This is also related to an interview question we give to SW interns.

One approach is to keep track of the numbers you've already used by searching and rejecting duplicates. This is actually a pretty bad approach as the execution time of your solution is now largely dependent on the behavior of rand(). In fact, with a sufficiently bad rand() or a large enough set of questions, your program may never finish.

A different approach is to build an index array from 1 to N where N is the number of problems to choose from and you want to randomly select M Problems from it. The first random number you generate is in the range of 1 to N inclusive. You remove that element from the index array and add it to your homework array. The index array is now N-1 in size, so you generate a random number from 1 to N-1, extract, and add it to homework array, etc. You continue to shrink the range of the random number and the index array at the same rate until you have extracted as many indices as you like -- M of them. The index array only ever contains unclaimed problems, the execution time is constant, no duplicates, etc.

Another approach is to make the index array and perform an inplace perfect shuffle on it. You may want to google perfect shuffle algorithm for exposure to some of them. Then take the first M. Note that this is pretty closely related to the previous approach.

Greg McKaskle

quinxorin 31-07-2012 16:39

Re: Replacing duplicates in a randomly generated array
 
Another possible technique, and probably the fastest executing, would be to generate an array of all possible questions (say 1-100), randomize it, and select the top however many items (say, 10). This would give you no duplicates and would not have time complexity issues. A logical way to think of this is randomly generating a number, where the range of numbers possibly generated is only those which have not yet been generated.

This is how a trivia game on our website works.

Ether 31-07-2012 17:29

Re: Replacing duplicates in a randomly generated array
 

Simple (and very fast) TP7 implementation:

Code:

{ Choose 20 numbers from the set [1..10000] with no duplicates }

CONST
size=10000;
choose=20;

VAR
a: array[1..size] of word;

Procedure Initialize;
var i: word;
begin
for i:=1 to size do a[i]:=i;
end;

Procedure Shuffle;
var i,j,temp: word;
begin
for i:=1 to choose do
  begin
  j:=random(size)+1;
  temp:=a[i]; a[i]:=a[j]; a[j]:=temp;
  end;
end;

Procedure Select;
var i: word;
begin
for i:=1 to choose do writeln(a[i]:8);
end;

CONST
cr=#13#10;

BEGIN

write('Initializing...');    Initialize;
write(cr,'Seed the RNG...');  Randomize;
write(cr,'Shuffle...');      Shuffle;
write(cr,'Select...',cr);    Select;
write('done. press ENTER ');  readln;

END.


Ether 31-07-2012 23:35

Re: Replacing duplicates in a randomly generated array
 
Quote:

Originally Posted by Ether (Post 1180033)
Simple (and very fast) TP7 implementation

Add the following header:

{$APPTYPE CONSOLE}
USES Windows;

... and change "word" to "dword" (4 places), and you've got a 32-bit Delphi console app which will select 100 non-duplicate random values from a pool of one million in less than 0.004 seconds on an 8-year-old desktop PC.



Zholl 01-08-2012 03:00

Re: Replacing duplicates in a randomly generated array
 
Quote:

Originally Posted by Ether (Post 1180065)
Add the following header:

{$APPTYPE CONSOLE}
USES Windows;

... and change "word" to "dword" (4 places), and you've got a 32-bit Delphi console app which will select 100 non-duplicate random values from a pool of one million in less than 0.004 seconds on an 8-year-old desktop PC.



I'm afraid I don't understand what you're getting at. I can follow your code, but I don't recognize the language, nor do I see the purpose of your header

Ether 01-08-2012 10:40

Re: Replacing duplicates in a randomly generated array
 
@Zholl:

The algorithm I posted works very fast for very large sample spaces. It does not require the overhead of removing samples from the array, nor does it require the overhead of shuffling the entire sample space.

The header is to allow 16-bit TP7 code to be compiled by Delphi into a 32-bit app that can run natively under Windows. You can code the algorithm in your language of choice.



JesseK 01-08-2012 11:11

Re: Replacing duplicates in a randomly generated array
 
I'll note that I've used Greg McKaskle's last solutions in several simulation and test applications in Java by using something similar to this:
Code:

LinkedList<Object> someList = getList();
Collections.rotate(someList, (int)(Math.random()*(someList.size() - 1)));
Object randomObject = someList.remove();

  • Rotating is usually much faster than shuffling
  • Removing the element from the queue ensures lack of duplicates
If you want a good lesson in C, you could try implementing a basic generic rotate function and a basic generic LinkedList object. This isn't the most efficient way in Java since the Collections class doesn't know about the Queue implementation of the LinkedList (thus it could save time by manipulating indices rather than doing a rotation), but implementing your own solution would give you that flexibility.

Alan Anderson 01-08-2012 11:35

Re: Replacing duplicates in a randomly generated array
 
Quote:

Originally Posted by Ether (Post 1180033)

Simple (and very fast) TP7 implementation:

What is TP7? Your code looks like standard Pascal (with unusual style choices for capitalization).

EricVanWyk 01-08-2012 11:35

Re: Replacing duplicates in a randomly generated array
 
You may use an obscure programming language when it isn't mentioned for the first 6 pages of google results...

Ether 01-08-2012 11:52

Re: Replacing duplicates in a randomly generated array
 

I'll just call it pseudo-code from now on :-)

I just re-ran it with a larger sample space.

On an 8-year-old desktop PC running XP Pro, it took a little over 1 millisecond to select 10 thousand non-dup random samples from a 100 million sample space (neglecting the time required to populate the sample space).




All times are GMT -5. The time now is 22:50.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi