I'm getting this error when running Machine learning Training

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
<ipython-input-1-5a565b55ff83> in <module>()
     27 # Change this bucket if you want to train with your own data. The WPILib bucket contains thousands of high quality labeled images.
     28 # s3://wpilib
---> 29 estimator.fit("s3://redstorm509")

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
    459         self._prepare_for_training(job_name=job_name)
    460 
--> 461         self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
    462         self.jobs.append(self.latest_training_job)
    463         if wait:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs, experiment_config)
   1011             train_args["enable_sagemaker_metrics"] = estimator.enable_sagemaker_metrics
   1012 
-> 1013         estimator.sagemaker_session.train(**train_args)
   1014 
   1015         return cls(estimator.sagemaker_session, estimator._current_job_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image, algorithm_arn, encrypt_inter_container_traffic, train_use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics)
    527         LOGGER.info("Creating training-job with name: %s", job_name)
    528         LOGGER.debug("train request: %s", json.dumps(train_request, indent=4))
--> 529         self.sagemaker_client.create_training_job(**train_request)
    530 
    531     def process(

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    659             error_code = parsed_response.get("Error", {}).get("Code")
    660             error_class = self.exceptions.from_code(error_code)
--> 661             raise error_class(parsed_response, operation_name)
    662         else:
    663             return parsed_response

ClientError: An error occurred (AccessDeniedException) when calling the CreateTrainingJob operation: User: arn:aws:sts::215031180259:assumed-role/AmazonSageMaker-ExecutionRole-20200120T195765/SageMaker is not authorized to perform: sagemaker:CreateTrainingJob on resource: arn:aws:sagemaker:us-east-1:215031180259:training-job/wpi-cpu-2020-01-21-01-12-27-274 with an explicit deny

I’m trying to start the training for machine learning on the AWS SageMaker and I’m getting this error, and I am honestly unsure what to do. I’ve tried using my own bucket and the WPILib bucket

Are you absolutely certain that you are using the correct account? Did you access the AWS account through “Classrooms” -> “AWS Console”?

Can you also go to the IAM Management console, and ensure that AmazonSageMaker-ExecutionRole-20200120T195765 has the AmazonSageMakerFullAccess policy, as well as the AmazonSageMaker-ExecutionPolicy-20200120T195765 policy?

If you’ve done everything right, then I’ve seen this issue a couple times now; I’m in contact with AWS looking for a solution.

I get the same error(Not sure if my last post made it)
I can’t even file a support ticket from within the AWS console.

I had to file a ticket as if I didn’t have an account at all.

From https://docs.wpilib.org/en/latest/docs/software/examples-tutorials/machine-learning/index.html

We are aware that you will get an exception while trying to train your model with the Jupyter notebook. The WPILib team and Amazon are working hard on a solution and hope to have something in the next few days. Until we do, you will not be able to train your models. We will post the status here as it changes, so please check back often. We are very sorry that this issue occurred and hope to get it resolved as quickly as possible.

Hi, from team 7539 here.

Are there any updates on the situation?

1 Like

We are working hard on a solution. We will be testing it with team 190 this week, and if all goes well we will release asap.

Again, sorry for this huge inconvenience.

1 Like

Hi, i have been getting same error for 2 days, is there any update ?
thank you so much

We’ve had to completely redo what kind of AWS account training uses, as well as the docs. An official update will be out soon.

Hi again, by when do you think the official update will be out?

Any update?

Guys, I’m a senior on a relatively small (~8 student) FRC team. I’m one of the only people who know how to code or wire the robot at all.

I’m sorry, but I’m prioritizing getting my team’s robot done right now. I honestly didn’t even plan on working for WPILib during the build season, and then this Amazon issue came up. It’s super inconvenient for everyone involved especially y’all.

As a 190 student, I will send a trained Power Cell model to an email of your choosing, if you pm me.

1 Like

I have reached out to gcperk20 but as of yet he hasn’t responded. I know the season is “postponed” right now but could someone send me the trained model for testing? Send it to [email protected]
I’d really appreciate it if anyone could help me out. Thanks!

Sent.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.