I was experimenting with the output speed and jitter, because I was considering using it as a simple tone generator.
I was having a lot of trouble with jitter, and so I finally tried a test that would just modulate a digital output as fast as possible. (before I had been regulating it with RT wait functions, in micro-seconds)
I found that it had a limit at about 280 micro-seconds to switch between the ON and OFF states.
In looking at the NI 9403 Operating Instructions, it says the module has a propagation speed of 330 nanoseconds, at maximum.
So why might I be seeing a limit 1000 times slower than that?
Is there some huge overhead that is slowing the processor down?
I’ve been deploying it as a startup application, so I know it’s not because of any communications on the network.
You are looking for the update rate, not the propogation delay. This is 7uS, which means that the FPGA can put a 70kHz square wave out.
But, that is still faster than you are seeing. This is because of the time it takes the CPU to tell the FPGA what to do. If you want it to go faster, you will have to use more advanced API calls. I’m not sure how to do it in an FRC setup.
Basically, there are just a lot of layers in the FRC control setup. LabView is an interpreted language running on firmware on the cRio itself. The firmware (an interpreter) reads the compiled LabView code and figures out what it wants the cRio to do. That interpretation takes time. Also, if you’re using the FRC digital output VI’s, they have other things in them that run every time you call them that slow things down, like port addressing, and some checks to make sure things are still working. These checks are quite nice in a competition setting with high schoolers forgetting to plug things in correctly, or something getting damaged in a match, but they take time.
Probably the fastest way to switch the output pin is to use a register call. There is a similar circumstance on the older PIC-based FRC controllers. There’s that nice little setup using WPILib where you have the function SetDigitalOutput(PIN, VALUE); and it does it for you, to save you having to write things like PORTA = 0xFE; and such. However, the second method is still much faster, because you’re working with the raw register values.
I don’t know how to do this in LabView, or even if it’s possible in this setup, because they’ve shielded us from a lot of the really lower level stuff in LabView, especially in FRC to keep people from tampering with the disable code, they’ve locked us out of the FPGA for now. You may possibly be able to do something lower level in C++, but I don’t know.
The updates to that module happen at around 7microseconds. Other digital modules that don’t have individually addressable and directional pins update much faster, but that particular digital module is less expensive, flexible, and a bit slow. 7us is the speed you can accomplish if the FPGA updates the values with FPGA logic, or if the cpu updates a register on the FPGA map fast enough. I suspect that the layers of function calls and some value checking are what is adding the overhead you are seeing. You certainly know your way around LV, so feel free to drill down into the digital subVIs. Test each time you get to a new layer and you’ll get a feel for where the overhead is. Please ask questions if something doesn’t make sense or if you make some good discoveries.
You may also consider using the PWM I/O functions. I believe that will more easily produce the tones you want because it will be the FPGA doing the high speed switching.
I wanted to comment on one of the other posts.
Some of the other things in the reply were essentially the same as Eric and I are saying, that it is SW layers adding overhead. But just to nip the interpreted thing in the bud, LV is a compiled language and has been for over twenty years. The LV diagram is turned into PPC instructions that are executed directly by the CPU. I believe the firmware you are referring to is the VxWorks OS, and it is not interpreted either. It is native code based on C/C++. In a LV app, there is no interpreting unless you write one.
You may have been thinking of the NXT execution system. I’ll be happy to go into more details or answer any questions you may have.
Well, that’s nice to know. I was just going off of the fact that it runs interpreted on our computers and on the NXT bricks, so it’s good to know that they’ve got it running directly on their own hardware. I thought the compilation process was just to put it in a lower-level interpreted language, so I’m glad it’s actually compiling it into machine language.
Glad to help clear things up. But I’m curious why you say that it runs interpreted on your computers. Do you mean the PC? For that LV generates native x86 instructions. Basically, it LV targets a platform it generates native code. The exceptions are the NXT where it runs on a VM similar to Java, and for embedded tools where we generate C code and run the C code through vendor tools for generating native code. I suppose the FPGA counts as an exception as well since we generate VHDL and that trudges through the native tools to become an optimized fabric for the FPGA.
Okay, I got something about 10 times as fast.
It was locked, else I might have dug deeper. (though I suspect this may be as far as the processor goes)
I have some questions about why it is done this way.
It looks like it actually updates all 14 GPIO at the same time. Is this how the module is normally accessed? I ran the channel mapping VI on my computer, and found that the bit placement directly corresponds to the channel number on the DIO module.
What separates these updates from the PWM and relay updates? Is this just an efficient way of passing the data to the Digital Module?
Would it be faster sending a string of data through I2C?
First of all, nice job on speeding it up, it’s not always easy. But this is how it’s usually done on microcontrollers, I’m not sure if it’s the same way on the FPGA. But at least in micros, values are updated an entire register at a time, and it does not impact speed. Basically, the microcontroller is designed in such a way that it tells the port register what ALL of its values are in parallel (for speed), and it updates all 8 pins on the register almost simultaneously.
This is mainly for parallel communication, where you need to update all of the bits in the byte you’re trying to send at the same time so you can tell the receiving end of the line when it can look for the new data. It’s kind of just a precedent, and they’ve probably put that into the update protocols of the digital module itself (cRIO communicates to the digital module over a serial connection, that VGA-sized port at the bottom, so all data extraction is done inside the module).
This is all just inferences from working with other microcontrollers, and I know that the cRIO is a completely different animal, but the precedents probably still stand. I’m going to guess that the GPIO bank is made up of two registers coming off of the chip inside the digital module, which would kind of act like a “middle-man”. If anyone here knows the actual construction of the cRIO modules, I would love to hear about it.
Its not just for parallel communication. It is a side effect of the fact that figuring out what to do is often harder than doing it. In designing a microprocessor, it is just as easy to have it update N* pins as it is to only update one. You might as well take advantage.
For serial devices like the cRIO module, it is often quicker to update them all so you don’t have to do any addressing. For example, the relay pins on the DSC do just this. We send a reset signal and then clock every single pin out every time. The whole process takes on the order of N+4 clock cycles. If we were to do individual pin addressing, we’d have to clock out the address and then the pin value, which would be on the order of log2(N)+5 per pin. This would be quicker for one or two pins, but loses in a hurry if we are updating many pins.
Note that CAN is taking the opposite strategy: Since there are so many more registers to talk to, the overhead of addressing what you are talking to is worthwhile.
If N is equal to or less than the width of the processor