As others have been chatting but I hope to summarize/bring clarity:
Disable threshold triggers are observed tighter in '24 vs '23 DS. Change is inevitable and assuming positive intent: there may be good reasons for this, the threshold may not have functionally existed, was outdated to slower and less energetic robots, or was a magic number having no meaning that '24 DS updates try to apply logic and meaning too. I don’t know the answer but updating it to address any of those reasons off the top of my head sounds like a good thing to me!
There is a difference in '23-'24 DS behavior that everyone acknowledges. But is it a problem? What is the problem?
“Customers” (FRC teams), note some robot/DS combinations trigger the disable threshold in '24DS. They try '23 DS and everything works, so the perception is that everything was working and now it’s not, the update is the problem.
But what if it’s simply showing you your setup has a problem? An invisible risk you were previously blissfully unaware of?
The flip side is: the new threshold may indeed be too tight.
So I think that is the “threshold” question. I’m interpreting @Thad_House 's comment earlier in the thread to be the '24 logic (although it’s not explicitly stated to be so), needing to functionally service a 200mS heartbeat or disable will occured on the DS end. A DS resource utilization spike can potentially trigger this, especially in older machines with less computing power.
So to the teams experiencing issues: collect some data! Are your DS’s disabling much much earlier indicating a false flag? Dig deep and rebut the presumption that the new logical threshold is located in the right spot.
To simply say '23 worked and now '24 doesn’t, does not get to the root/heart of the issue though, that’s only a superficial observation of the behavior change everyone sees.
I think that’s a good summary. I’d add one more thing:
“Sciencing” the problem would be a whole lot easier if the DS were open-source. I could just read the code to see what it’s doing! I could adjust the timeout and see what happens! I could add instrumentation to see the problem better!
“Sciencing” the problem would also feel a lot better if the DS were open-source. I’m happy to contribute my labor to open-source, and encourage students to contribute their labor too! It’s the whole point of open-source. But for closed, proprietary systems, that feels pretty bad, like paying twice.
I hope the 2027 control system does not include any of these closed-source proprietary components.
FIRST is a mostly-volunteer organization, maybe we could find a way to contribute volunteer labor to address the (valid) safety concern without sacrificing the (valid) opacity concern?
Do you know who over there might be able to engage on that topic?
It seems like FIRST likes the “task force” approach lately (I do too). I hope there is a task force or something similar around the DriverStation. It is such an important piece of the experience and MANY people have a wide range of very valid concerns with the current system.
I am not going to volunteer anyone. Sending an email is the most direct way to engage them but I wouldn’t be surprised if they have already thought about this longer/harder than any of us have.
an update: NI got back to me (specifically @Greg_McKaskle) to say that NI do indeed intend to adjust the code and values during the 2025 beta, which I think is great news.
at the excellent MA rookie event today, we struggled with disconnects. the laptop is logged during these events as about 5%, so I don’t think it’s a laptop business issue.
the failure pattern involved disconnection for about a minute, often followed by the thing shown here: alternating connected and not-connected, about 0.5 sec period.
other almost-identical laptops had no problems.
does this seem familiar to anyone?
oh also, there was this weirdness, 25% packet loss in the prematch period.
also there’s the “timing” chart which I don’t understand at all.
I see 2 things off the bat in that picture with task manager that scare me. What is that top process listed? That looks like some security thing. Those absolutely could interfere with communications. They can definitely cause massive packet interference without looking like high CPU utilization.
Additionally, that SOLIDWORKS background process. Not exactly the same, but autodesk had issues for YEARS where it would cause massive disconnects. You really need to go through and make sure no extra services are running.
oh! interesting, thanks! i don’t know what that stuff is; these are laptops used by engineering classes, and solidworks is part of the curriculum there. hm.
we could go back to dedicated vanilla driver-station laptops, i guess.
I would absolutely ensure any DS is always running as a dedicated laptop, with nothing extra installed. We generally always test with fresh laptops, as AV software, and many other things that schools like to install, can definitely cause unexpected and undebuggable behavior. Especially because our packets kind of look like gaming packets, and a lot of AV’s schools install might attempt to get in the way of that to avoid students playing games.