View Full Version : FRC code search: a great project for someone to start
virtuald
10-10-2016, 21:20
I personally don't have the time for this -- but I think it would be awesome if someone else did it. :)
It would be really cool if someone could create a simple search engine for FRC code. Think like 'grepcode', but obviously with less functionality -- really, you only need basic full-text search this to be interesting.
There are a variety of pieces/challenges I can think of at the moment:
Locating team code (FIRSTwiki has a good list to start out with (https://firstwiki.github.io/wiki/robot-code-directory), and it's already in JSON format)
Ingesting the code and storing it in a database (maybe even as simple as filtering by filetype and just using full-text search)
Finding an appropriate database provider (firebase like TBA uses? giant json dump in github pages? AWS/heroku/azure/etc free tier?)
Provide a nice search page that allows you to do basic things -- filter by year, language, team
Anyone up for the challenge?
AustinShalit
10-10-2016, 21:22
This sounds like a great project!
euhlmann
11-10-2016, 18:55
Here's a basic first pass at it in node.js with mongodb: https://github.com/erikuhlmann/grepfrc
Unfortunately I haven't had the patience to let it finish indexing all the code repos and I'm not sure how big that causes the database to become.
Since most people use git to manage robot code, there may be an easier way to go about this.
1) Allow people to submit their code. Take the appropriate deets (team #, year, primary lang) if applicable (remember, non-teams can have FRC code too ;) ).
2) Store this on a database, along with a link to the git repo
3) Index it by downloading the latest commit (--depth=1) of this repo somewhere known. Keep a watch on all repos, updating if needed maybe once per day. (You could do a deeper depth but HDD use would skyrocket if you plan on scaling at all)
4) Just pass in your query as a git grep.
If anyone is willing to do this project and wants some extra information about the implementation of the above, let me know.
GLHF
marshall
11-10-2016, 19:54
More serious suggestion would be to enable teams to link to GitHub projects for specific robots on TBA. That should provide 80% functionality for 20% of the effort and I'm pretty sure it would work the same way the existing social media extensions work for TBA.
plnyyanks
11-10-2016, 20:01
More serious suggestion would be to enable teams to link to GitHub projects for specific robots on TBA. That should provide 80% functionality for 20% of the effort and I'm pretty sure it would work the same way the existing social media extensions work for TBA.
TBA can already link GitHub profiles to teams. Do you think that is good enough, or would it better to build an association between (team + year) and github repo(s)?
More serious suggestion would be to enable teams to link to GitHub projects for specific robots on TBA. That should provide 80% functionality for 20% of the effort and I'm pretty sure it would work the same way the existing social media extensions work for TBA.
I believe the goal is to grep code usages. For example, "show me examples of the usage of ControllerPower" would just be a grep of "ControllerPower", not necessarily "Show me ####'s code for their 2016 Robot". I believe you can do this with Github's search, but only one repo at a time.
euhlmann
11-10-2016, 21:52
4) Just pass in your query as a git grep.
Or use an actual database engine's full text search, which should be far more efficient.
Btw, when I tried searching for usages of WPILib classes with my mongo/node implementation, I found more than a few top results being the source for that WPILib class, since it seems many teams like to keep entire copies of WPILib in their repos. I wonder if there's a way to filter those out.
virtuald
11-10-2016, 22:30
TBA can already link GitHub profiles to teams. Do you think that is good enough, or would it better to build an association between (team + year) and github repo(s)?
The latter is good (and you could pull much of that data from what I've already got on FIRSTwiki), but wouldn't meet the goal of 'grepcode for FRC code'.
virtuald
11-10-2016, 22:31
More serious suggestion would be to enable teams to link to GitHub projects for specific robots on TBA. That should provide 80% functionality for 20% of the effort and I'm pretty sure it would work the same way the existing social media extensions work for TBA.
I already have that (except not on TBA, but that's less important to me). I wouldn't have suggested the project if I already had it. :)
However, it would be useful to have that information on TBA too.
virtuald
11-10-2016, 22:39
Or use an actual database engine's full text search, which should be far more efficient.
Btw, when I tried searching for usages of WPILib classes with my mongo/node implementation, I found more than a few top results being the source for that WPILib class, since it seems many teams like to keep entire copies of WPILib in their repos. I wonder if there's a way to filter those out.
You could probably hash every file, and store them indexed by hash. Then each teams code is just a list of paths mapped to hashes. When you do a search, it could show each of the places where the file is used.
If you wanted to be super fancy, you could split by subroutine, and hash those (excluding whitespace).
Or use an actual database engine's full text search, which should be far more efficient.
Btw, when I tried searching for usages of WPILib classes with my mongo/node implementation, I found more than a few top results being the source for that WPILib class, since it seems many teams like to keep entire copies of WPILib in their repos. I wonder if there's a way to filter those out.
Git grep is actually quite fast, and way more space efficient than most database engines. Most database engines don't have paging sizes large enough for an entire source file, and so you're left with subpar lookup speeds as compared to git grep that can directly look through the changes to files. Obviously this needs some testing, but this is the theory behind it anyway.
Git grep also has the advantage that you can call it with `git rev-list --all` to search for occurrences of a string over all commits, ever. Likewise, you can grep subdirectories, discounting any directories that resemble that of the wpilib source.
This sounds like a great project!
You know its a good idea when you get Austin Shalit to respond on Chief.
vBulletin® v3.6.4, Copyright ©2000-2017, Jelsoft Enterprises Ltd.