Electronic Data Storage, Preservation, and Management

I’m not sure if we’ve ever had this discussion on here, at least not quite in the manner I’d like to get going with this thread. I can’t recall it in my memory, so I figured I’d start one.

How does your team store, manage, and preserve electronic data?

We’re talking all types of electronic data. Anything that can be saved to a computer, hard drive, or server. Artwork, photos, videos, CAD models, you name it.

  • How much data is your team currently in possession/control of? (Gigabytes, terabytes)
  • Do you keep everything on one PC, or elsewhere?
  • Do you have a server? Do you manage it, or someone else? What are the specs (drives, OS, etc)?
  • How do you organize, categorize, and sort your data? (specific focus area is photos and videos)
  • Do you sync with a cloud service provider? Which one?
  • Do you have a backup plan? What is it?
  • Do you have any sort of archival process each year?
  • Do you have any specific hardware you use? (types of drives, tape backup, etc).
  • How do you control who can access the data and at what permission level?

I’ll bite:

How much data is your team currently in possession/control of? (Gigabytes, terabytes)

Last I checked, 2220 had two 500 GB hard drives that were full of pictures, plus about 20 GB of stuff in Google Drive, plus our team wiki (which I’m not sure of the size of). All in all (and accounting for stuff that I don’t know the location of, I would estimate that 2220 has about 1.5 to 2 TB of data in various locations. This also isn’t counting raw footage from CA video, because I really have no idea how much of that we have.

Do you keep everything on one PC, or elsewhere?

All the Drive and wiki stuff is cloud-based, but the pictures and video are on external hard drives and backups of those drives, which are kept with the team.

Do you have a server? Do you manage it, or someone else? What are the specs (drives, OS, etc)?

The team rents a relatively small server to manage our parts ordering process, but other than that, no. I wasn’t a part of setting up the server, but it’s pretty low powered.

How do you organize, categorize, and sort your data? (specific focus area is photos and videos)

All of the archived data is by date, which people can access and copy over to their personal or other team machines and reorganize.

Do you sync with a cloud service provider? Which one?

Nope (unless you count Drive, which we use for sign ups, organizational stuff, and CAD files).

Do you have a backup plan? What is it?

Backups of the media drives aren’t made on a regular schedule to my knowledge. I wasn’t very much a part of that part of the team other than needing to access the data.

Do you have any sort of archival process each year?

Other than making sure that events were photographed and getting those photos by date on the externals and Shutterfly, not really.

Do you have any specific hardware you use? (types of drives, tape backup, etc).

Like I said, I believe we have two external HDDs with capacities around 500GB. Mostly backups and data stufff was managed by our former faculty mentor, who was the tech manager for the school and had access to all sorts of computers and computer leftovers.

How do you control who can access the data and at what permission level?

The externals are only given to students or mentors with a valid reason for having them, or they are only accessed at the school. In Drive we have a rather complicated nested permission structure, and the wiki uses standard Admin/Editor/Viewer permissions.

I know our data practices aren’t the best, but they’re miles better than they were when I first joined the team (I remember one media director had classified all photos are “Before me” and “Me”). It’s an ever-changing process, and I would only anticipate the team getting better at managing our data.

Does anyone know if there have been any presentations on this topic? I feel like it would be very valuable for teams of all levels.

Thanks for your post! I’m hoping others will follow. I agree, a presentation or whitepaper on the topic of data (and IT) management for FIRST teams would be very beneficial.

I can reply back with a little bit about our setup. It’s older hardware but it’s been working well for us. And being older hardware, it’s been very affordable for the value we’re getting out of it.

Dell Poweredge 1850 with two 160GB 10K RPM SCSI drives in RAID1, 8GB RAM, and two Xeon dual-core processors. We run Windows Server 2008 R2 and Autodesk Vault 2014 Basic Server on here and use it for CAD only. Either drive dies, we’re ok (in theory). We keep an identical drive in “cold storage”

Rackable Systems 2U, half depth server with four 1TB SATA drives in RAIDZ, running FreeNAS 8.3.2. Dual Xeon Dual Core, 8GB RAM. This is our primary data storage server for all work and archiving other than CAD. We keep photos, videos (mostly finished works, very little raw footage), CAM files, graphics, animations, Matlab files, software installers, content we’ve created, and as of recently, Windows System Image backups here. The Vault server (above) also backs up to here. The Vault Server and FreeNAS server live in separate buildings. FreeNAS lets us control user access and permissions. We’re currently at about 1.4TB used, and 0.98TB remaining. Within the next month, we’re going to upgrade the drives to WD Red 2TB drives to double the capacity to 8TB total, which should give us somewhere around 5.5TB I imagine in ZFS. While we’d much prefer RAIDZ2, RAIDZ is all that’s practical on only 4 drives. The good news is, ZFS is a pretty reliable file system with lots of error checking, and the only time you really need to worry is when you’re in the process of replacing a bad drive (which is a real concern). Also of note, supposedly FreeNAS takes “snapshots” of each dataset on an interval that we set and that can be rolled back at any time, but I’ve never tested the restoration to an earlier snapshot.

Seagate 3TB external drive, formatted NTFS - This is where the FreeNAS server is backed up to. It’s not automated yet. At least once per year, we manually copy all the network shares to this external drive. It’s kept in “cold storage” in a separate building and is not used for anything other than backups. Obviously, this is not a sustainable way to continue, especially after we upgrade the FreeNAS Server. What we may do is set up a Windows desktop with the four older 1TB drives we pull out, plus the 3TB external, and do automated backups to that. Long term, maybe we build another FreeNAS box and rsync to it.

Dropbox We have a very small amount (<2GB) of business-team related data (logos, artwork, etc) for access from home when needed.

Google Drive - Mostly for collaborative editing of spreadsheets and presentations. Our accounting is done here, before being entered more formally into Quickbooks.

Website - We have 10GB of hosted space on a cPanel shared virtual server, that we pay monthly for.

** AVID ISIS Server ** - This does not belong to the robotics team, but we can access it. It has somewhere in the neighborhood of 48TB of drives and serves about 160 users in the Cinematography program at our school. All raw footage is loaded onto the ISIS server and then worked with there during production. I’m not aware of the archival and backup processes and standards for video on the ISIS server.

Very little critical data is ever stored on local machines. Our general policy is not to. The only exception may be files for vinyl cutting, but the artwork exists elsewhere.

All of this is something we’ve implemented just in the past two years. I like to think we’re pretty solid on data integrity (at least for our operation), at least in theory. Nothing we’re doing would ever compare to proper hardware and practices in a business with a proper IT department, but I like to think we do it pretty well just for a school FIRST team. Some of the backup and restoration processes have never been tested, which is a little scary, but in theory we shouldn’t lose much of anything with the way we’re set up.

The big idea is that no data should ever exist for any extended period of time on only one physical storage device, and never only a device (flash drive) that could become easily lost, stolen, or damaged. Also remember, redundancy is not the same as a backup!

3138 just bought a Western Digital MyCloud (I think 3 TB, but I could be wrong). We’ve started to migrate over to that. For CAD, we currently have an Amazon EC2 server running the SolidWorks PDM Server application. All data is stored on an S3 instance with nightly delta backups.