Automatic RuleBot

Continuing the discussion from Introducing FRCBot for Slack: A new way to subscribe to events:

I figured I would start a separate thread for this instead of overloading @cjdenio’s thread. If a mod can move the relevant posts here, I’d appreciate it.

 
I made a html scraper that finds rules from the html manual. You can try it at: http://arimb.pythonanywhere.com/rule/R20

It uses @bobbysq’s method, but then continues on until it finds the next rule or other paragraph break. That allows it to capture (most of) the blue boxes and lists after rules, but it introduces other problems when the rule is directly followed by a text paragraph without a heading. Some of the rules are also missing their hyperlinks, which makes them invisible to the scraper. Basically, the HTML is a mess and not given to easy scraping.

I’m starting to warm to @UnofficialForth’s idea to just take a bunch of screenshots and manage a database of them. It would be a bit of work right after kickoff, but once everything is done they don’t usually update many rules each week so it should be feasible.

5 Likes

Posting actual text is more valuable than images of text. We could crowd source parsing the rules into a Google Spreadsheet.

7 Likes

The more I think about it, I’m not a huge fan of the image database.

Sure it works to check a specific rule, such as R15. However, who has the naming of the rule R15 memorized (and not the rule itself)? What if I’m searching for a specific rule? Can I find a it easily?

Looking at what the end user needs, I don’t think my initial idea of an image database works. It’s too simplified and limited in the scope of what team would want to use it for. If we want to make something to truly benefit the community, we would have to take it past that. (Images such as diagrams might still be nice.)

1 Like

That said, I rarely remember the exact rule number I’m trying to quote / look up. Maybe some keyword feature for lookup would be helpful.

This sounds like a fantastic idea and one that could be accomplished quite quickly after the reveal. Many hands make light work.

I imagine this would allow for searching, keyword style, as well, but images and diagrams might be a bit more difficult to work with.

1 Like

I think this is a great idea! I’ll do some tinkering once I get over this cold. I was thinking of using python and bs4 but idk how to write a slack bot.

I thought EricH was already a bot?

9 Likes

Nifty. Make one for the rules next. I want to type /rulesbot r15 and have it spit out the rule.

21 Likes

I’ve been thinking of how to do something like that for a while, but with the rules changing every week and only being released in PDF form, it seems like it’d be rather difficult.

1 Like

Could there be a picture database that is kept updated, so you only have to swap out an image when a rule changes?

They’re not released only in PDF format, they’re also released as a hyperlinkable HTML version. I wish it was the default, truthfully.

7 Likes

They tried that one year and it did not go well. I’m very happy they continue to release PDFs.

That’s an awesome idea, though we’d need a lot of people to manually enter every rule.

Hmm, maybe someone could develop a crawler that extracts the rules from that document. Ideally FIRST will keep most of the HTML markup the same from year to year.

1 Like

True, but at the same time the above link was one click referenced to R15. That’s come in very handy in conversations compared to “open this PDF, go to page X, find “R15” on it”. At least we have options for both, I definitely wouldn’t want to get rid of the PDF any time soon, because in current form, the HTML version is not a proper replacement.

1 Like

Sounds like a fun project for a spreadsheet bot - no promises but I’ll work on this

I implemented this into a Discord bot that I never really finished. Anyone willing to implement my code into a bot or program of their own is more than welcome to, just please credit me in a comment or something.

2 Likes

@bobbysq, does this code look up the rule in the HTML rulebook? I’d be interested in learning how this works (I’m not great at reading Python, though :grin:)

Any reason crtl+f isint an option? I usually have the pdf open via Chrome and it works just fine.

1 Like

@bobbysq’s code is searching the HTML for <a> tags with rule numbers. Someone else digging into this can also look for class="RobotRule". I’m exploring better options right now, though.

1 Like