Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Programming (http://www.chiefdelphi.com/forums/forumdisplay.php?f=51)
-   -   Methods for deploying/(cross-)compiling vision code for NVidia Jetson (http://www.chiefdelphi.com/forums/showthread.php?t=149208)

Wasabi Fan 27-06-2016 23:22

Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
My team has been working with the Jetson TK1 recently for on-board vision processing. The vision code that we (or, in reality, I -- I am the vision team) write has been primarily using OpenCV functions for the heavy lifting, but we also have written custom CUDA kernels to perform some more specialized processing when OpenCV isn't doing what we want.

Up until now, I've been using NVidia's Nsight Eclipse Edition tooling, which supports built-in remote compilation and debugging of both plain C++ and CUDA code. However, after spending more time troubleshooting sync errors than writing code this past season, I am looking for alternatives to the system I'm currently using.

So, I'd like to hear from others that are using the Jetson to see what they are using for development tools and deployment. I've seen some posted code that indicates that they just copy their code to the remote device and then compile manually, but given the caliber of some of the vision systems I've seen, I doubt that is the state of the art. Additionally, I have yet to see anyone that implemented their own CUDA kernels in addition to the OpenCV ones; having to compile CUDA code manually would get to be a real pain. Are others having success with the project sync facilities in Nsight? Or are there home-grown solutions that teams are using instead? I'd really like to hear what others are having success with!

marshall 28-06-2016 07:55

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by Wasabi Fan (Post 1594644)
My team has been working with the Jetson TK1 recently for on-board vision processing. The vision code that we (or, in reality, I -- I am the vision team) write has been primarily using OpenCV functions for the heavy lifting, but we also have written custom CUDA kernels to perform some more specialized processing when OpenCV isn't doing what we want.

Up until now, I've been using NVidia's Nsight Eclipse Edition tooling, which supports built-in remote compilation and debugging of both plain C++ and CUDA code. However, after spending more time troubleshooting sync errors than writing code this past season, I am looking for alternatives to the system I'm currently using.

So, I'd like to hear from others that are using the Jetson to see what they are using for development tools and deployment. I've seen some posted code that indicates that they just copy their code to the remote device and then compile manually, but given the caliber of some of the vision systems I've seen, I doubt that is the state of the art. Additionally, I have yet to see anyone that implemented their own CUDA kernels in addition to the OpenCV ones; having to compile CUDA code manually would get to be a real pain. Are others having success with the project sync facilities in Nsight? Or are there home-grown solutions that teams are using instead? I'd really like to hear what others are having success with!

We compile on our Jetsons for the most part. It might not be "state of the art" but the Jetson's are cheap(ish) and we have enough of them to give them to individual (or pairs of) students to use. We don't transfer the code. It is written on them as well. Basically, we use them as cheap desktops that the students can experiment with.

We aren't doing anything with custom CUDA kernels that I'm aware of... I'm not sure why you would but perhaps you can share your use case. We are using systems others have built to take advantage of CUDA already, both with OpenCV and with the neural network programming we have been doing.

Wasabi Fan 28-06-2016 13:38

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Oh, I get it; I completely forgot about the fact that one could develop directly on the Jetson with an external monitor and keyboard. After doing that for my initial tests, I got caught up in other tooling and moved to remote development; maybe what I need to so is work with editors on-board. I'm not sure how I might be able to compile CUDA code, but hopefully OpenCV will have what's needed anyway. Primarily, I've turned to custom algorithms when I want to do things like custom adaptive thresholding or other functions that OpenCV doesn't implement, where doing it using existing CPU functions would be pretty computationally expensive and wasteful. Writing custom kernels makes maintenance difficult, so I try to avoid them whenever I can.

Separately, I think your team's vision code was one of the codebases I was looking at yesterday to figure out how others did it; I have to say, I was really impressed with some of the results that you had posted! After reading through that, I'm thinking that I am probably over-complicating a fair amount of my vision code.

marshall 28-06-2016 14:34

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by Wasabi Fan (Post 1594712)
Oh, I get it; I completely forgot about the fact that one could develop directly on the Jetson with an external monitor and keyboard. After doing that for my initial tests, I got caught up in other tooling and moved to remote development; maybe what I need to so is work with editors on-board. I'm not sure how I might be able to compile CUDA code, but hopefully OpenCV will have what's needed anyway. Primarily, I've turned to custom algorithms when I want to do things like custom adaptive thresholding or other functions that OpenCV doesn't implement, where doing it using existing CPU functions would be pretty computationally expensive and wasteful. Writing custom kernels makes maintenance difficult, so I try to avoid them whenever I can.

Yep, keyboard/mouse and monitor. It's saved us from having to buy real developer systems. We can put that money into other things instead.

Quote:

Originally Posted by Wasabi Fan (Post 1594712)
Separately, I think your team's vision code was one of the codebases I was looking at yesterday to figure out how others did it; I have to say, I was really impressed with some of the results that you had posted! After reading through that, I'm thinking that I am probably over-complicating a fair amount of my vision code.

Thanks! We try though sometimes we get a bit ambitious. If you have specific questions then I can try to direct you to the right people to answer them. We have a decent team of programmers working around the clock on how to develop new and cool stuff... though mostly they are playing video games I think.

Turing'sEgo 28-06-2016 20:02

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
A downside to developing directly on the jetson is that is it ARM based and only has 1 usb port (but a hub fixes that issue no problem).

Just remember to NOT use java if you can avoid it (and based off what you said, you are.) JVMs are memory hogs.

The most common method of compiling / deploying is do cross-compiling. This is the safest way to ensure you don't accidentally break your stable (yet fragile) jetson configuration.

Look into gcc-arm-linux-gnueabi

Are you writing your own routines in cuda-c? (If so, that is pretty metal)

Slightly off topic: look into otsu thresholding for adaptive thresholding. OpenCV already has it, and it has a gpu version.

Wasabi Fan 28-06-2016 20:16

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by Turing'sEgo (Post 1594790)
The most common method of compiling / deploying is do cross-compiling. This is the safest way to ensure you don't accidentally break your stable (yet fragile) jetson configuration.

Look into gcc-arm-linux-gnueabi

So, you'd recommend rolling my own cross-compilation scripts (using existing toolchains, of course)? Would something like Eclipse be willing to do remote debugging on a binary cross-compiled for ARM?

Quote:

Originally Posted by Turing'sEgo (Post 1594790)
Are you writing your own routines in cuda-c? (If so, that is pretty metal)

Yep, I've been using OpenCV data structures and most of their functions, but occasionally writing my own CUDA C code (which I learned through trial and error) when I'm unhappy with what OpenCV has to offer. Unfortunately, I don't think I've done memory management right -- I keep getting unfortunately low frame rates when using my own code. And definitely no Java here! Although, we do program our main robot program in Java.

Turing'sEgo 29-06-2016 03:14

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
I know of 2 robotics companies (both in Boston actually) that do the method I described.

Rudimentary google search about remote debugging for arm devices seemed to give helpful results:
http://janaxelson.com/eclipse1.htm
http://www.hertaville.com/remote-debugging.html

Nvidia is kind of trying to get into FIRST. (They sponsor a couple teams and give discounts). It wouldn't hurt to send a message to their rep on here so they could get you connected to people who know cuda c that could help you out. I do not know of any groups / forums you could ask for help on regarding this sadly.

Wasabi Fan 29-06-2016 03:28

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by Turing'sEgo (Post 1594839)
Rudimentary google search about remote debugging for arm devices seemed to give helpful results:

I can take a hint ;) That's code for "LMGTFY".

Quote:

Originally Posted by Turing'sEgo (Post 1594839)
It wouldn't hurt to send a message to their rep on here so they could get you connected to people who know cuda c that could help you out.

Yeah, I think I'll see if I can get in touch with someone from Nvidia and see if they can direct me. Thanks for the tip!

marshall 29-06-2016 07:10

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by Wasabi Fan (Post 1594840)
Yeah, I think I'll see if I can get in touch with someone from Nvidia and see if they can direct me. Thanks for the tip!

Your best bet is the Nvidia Developer Forums, which is where the rep (Currently Dusty Franklin) is likely to point you. He checks CD infrequently but he's a super nice dude and loves the FIRST community. The entire group working on Tegra marketing are awesome (and I'm not just saying that because they sponsor 900).

Wasabi Fan 29-06-2016 13:51

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Haha yeah... I've had a thread open on their dev forums for a few days now, but so far all it's gotten is 7 views and a reply from someone who advised me to configure git. That's actually why I decider to post here; because their devtools forum was so lethargic that I doubted I'd actually get helpful responses.

KJaget 30-06-2016 09:54

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by Wasabi Fan (Post 1594712)
Oh, I get it; I completely forgot about the fact that one could develop directly on the Jetson with an external monitor and keyboard.

Two other things we do, each at opposite ends of the spectrum :

1. Build and test on x86 Linux laptops and desktops. With a little bit of extra effort our code is portable which adds a lot in our ability to work outside lab hours or when the Jetson is being used on the robot to debug other problems (i.e. our drive team). Included in that is the ability to test using recorded videos or still images so we don't need the entire robot to test changes. You can't always test everything that way, but you can fix a lot of stuff before moving it over to the Jetson.
2. Export the Jetson X display to a laptop via ssh tunnel (ssh -Y ubuntu@10.x.y.z to connect, then anything which uses X exports the display back to the Linux system you connected to). This is great for headless debugging when the Jetson is actually on the robot.

Quote:

I'm not sure how I might be able to compile CUDA code
We use cmake to build, and that has support for building CUDA code.

Quote:

Primarily, I've turned to custom algorithms when I want to do things like custom adaptive thresholding or other functions that OpenCV doesn't implement, where doing it using existing CPU functions would be pretty computationally expensive and wasteful. Writing custom kernels makes maintenance difficult, so I try to avoid them whenever I can.
This is one of those cases where you need to be sure you're actually speeding things up. That includes making sure that what you're working on something that's actually slow (i.e. profile it rather than assuming) and make sure that the speed up will actually matter (i.e. going from 80 to 100 FPS is useless if your camera runs at 30FPS).

Quote:

After reading through that, I'm thinking that I am probably over-complicating a fair amount of my vision code.
"This code is long and complicated because I didn't have enough time to make it shorter" is a common problem, especially with time crunches involved.

Wasabi Fan 30-06-2016 16:27

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Quote:

Originally Posted by KJaget (Post 1595025)
Two other things we do, each at opposite ends of the spectrum :

1. Build and test on x86 Linux laptops and desktops. With a little bit of extra effort our code is portable which adds a lot in our ability to work outside lab hours or when the Jetson is being used on the robot to debug other problems (i.e. our drive team). Included in that is the ability to test using recorded videos or still images so we don't need the entire robot to test changes. You can't always test everything that way, but you can fix a lot of stuff before moving it over to the Jetson.
2. Export the Jetson X display to a laptop via ssh tunnel (ssh -Y ubuntu@10.x.y.z to connect, then anything which uses X exports the display back to the Linux system you connected to). This is great for headless debugging when the Jetson is actually on the robot.

This past season, we developed our algorithms on desktop PCs (running Windows and Visual Studio) and then switched to developing and deploying directly to the Jetson from Ubuntu machines later in the season when it was on the robot. We kept all our code compatible with both platforms however, and with a compile-time switch we could enable and disable GUI windows with sliders and previews and such. Once we got to the point where we needed to deploy to the Jetson, we began running into serious issues with the Nvidia IDE tooling that we were using, which essentially left us dead in the water -- that's the major learning point that I'm trying to remedy. There were definitely a lot of things we could've done better in that workflow had we been a bit smarter!

Quote:

Originally Posted by KJaget (Post 1595025)
We use cmake to build, and that has support for building CUDA code.

I think that is looking like the best option for me. I'm interested to see how CUDA integration would work; I guess I'll be investigating that next. Thanks for all the tips! Seeing what others have done successfully makes it a lot easier to figure this stuff out.

Quote:

Originally Posted by KJaget (Post 1595025)
This is one of those cases where you need to be sure you're actually speeding things up. That includes making sure that what you're working on something that's actually slow (i.e. profile it rather than assuming) and make sure that the speed up will actually matter (i.e. going from 80 to 100 FPS is useless if your camera runs at 30FPS).

IIRC our code was running at something around 10-20FPS by the end of last season, and sometimes it dropped to more like 5... so I didn't hit the capacity of the camera ;) (although we did hit frame bandwidth issues with some cameras that would only give us 10FPS). The algorithm that we were using was fundamentally flawed, so I have been able to dramatically improve it recently as a proof of concept.

For profiling, I was actually using a tool that Nvidia provides packaged with their development kit. I think it is primarily targeted at profiling GPU code (it automatically keeps track of CUDA core utilization, memory transfers, etc.) but by adding calls in your code that label certain events, you can get a very nice analysis of time taken for each step in processing frames. That clearly showed me where the slowdown was occurring.

dusty_nv 08-07-2016 12:04

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Hey there, if you're looking how to integrate CUDA toolchain with cmake build system, I use cmake and you can find some example on my github like here: https://github.com/dusty-nv/turbo2/b...CMakeLists.txt

Basically you just find_package(CUDA) and then use cuda_add_executable() cuda_add_library() and similar (you can see the functions/variables available in FindCUDA.cmake, which comes with cmake I believe). I also set some NVCC CUDA compiler settings for the Jetson which you can see in the CMakeLists above.

Like marshall says, I should hang around ChiefDelphi more frequently :D

Not sure which Dev Forum you posted in earlier, but let me know your username,
and post here for TX1: https://devtalk.nvidia.com/default/b...64/jetson-tx1/
and here for TK1: https://devtalk.nvidia.com/default/b...62/jetson-tk1/

dusty_nv 08-07-2016 12:13

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Also one other thing, when you are building your list of source files, you will want to include *.cu files in the GLOB, like so:

Code:

file(GLOB my_sources *.cpp *.cu)
Then the functions like cuda_add_executable() and cuda_add_library() will automatically compile the CUDA .cu source files containing your CUDA kernels.

If you find yourself having to write a big algorithm in CUDA that isn't included inOpenCV, also check out the VisionWorks and NPP (NVIDIA Performance Primitives) libraries.

Wasabi Fan 08-07-2016 19:00

Re: Methods for deploying/(cross-)compiling vision code for NVidia Jetson
 
Thanks for the info on compiling CUDA code! I'll try it out and see how it goes. Am I correct that I should be giving both .cpp and .cu files to the CUDA compiler, and then it will invoke the normal C++ compiler for the .cpp files? If so, will my C++11 code still work even though the supported version of NVCC doesn't support C++11?

Quote:

Not sure which Dev Forum you posted in earlier, but let me know your username,
and post here for TX1: https://devtalk.nvidia.com/default/b...64/jetson-tx1/
and here for TK1: https://devtalk.nvidia.com/default/b...62/jetson-tk1/
My original question which I posted on the dev forums was more focused on Nsight Eclipse Edition, so I posted it in the forum specific to that toolset; after not getting any replies there, I decided to post here to see what other FRC teams have had success with. Depending on what I get from my experimentation with CMake tools, I might go over to the general Jetson TK1 forums and ask a question there.


All times are GMT -5. The time now is 10:30.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi