Using Globus Online with Amazon S3

The Globus data transfer service can be used to transfer data to and from Amazon's S3 cloud storage service.

Overview

Globus is a Software-as-a-Service (SaaS) that provides file transfer and sharing services, as well as identity, profile, and group management. It provides a high performing and secure method to transfer data between endpoints. A Globus transfer handles all the difficult aspects of data transfer allowing by optimizing bandwidth usage, managing security configurations, providing automatic fault recovery, and notifying users of completion and problems. Northwestern University affiliated users can use Globus to transfer data between Northwestern-controlled endpoints (such as Quest storage and the Research Data Storage Service) and any other endpoint they have access to, including a personal endpoint that can be set up on a laptop or workstation. More information about Globus data transfer can be found at https://www.globus.org/data-transfer.

Amazon S3 is a cloud-based "object storage" service, allowing files to be securely stored and accessed from anywhere. It can scale to handle any amount of data and pricing is based solely on usage, with multiple tiers of service available to allow for flexibility and cost optimization. More information about Amazon S3 can be found at https://aws.amazon.com/s3/.

Globus offers a connector that allows an endpoint to transfer files to and from Amazon S3, and Northwestern IT has set up three of these endpoints.

When to Use Globus

  • Transferring very large data sets (larger than a few GBs). Globus uses GridFTP, a high performance transfer protocol that uses parallel TCP streams for optimal bandwidth. For large transfers (TBs and above), it is significantly faster (8X) than using scp, rsync or sftp.
  • Collaboration and data sharing. Many national labs, universities, and data centers around the world use it for data management.
  • "Fire and forget" transfers so that users can start transferring files and work on something else, while it automatically optimizes transfer settings, retries any failed attempts, and emails you when the file transfer is done. In case of a dropped network connection, it picks up where it left off and performs a checksum to check for file corruption and confirm data integrity.

When to Use Amazon S3

  • Backups and data archiving. S3 is a relatively low cost solution for data storage ($0.023/GB/month) and can be used to automatically transfer data to the even lower cost AWS Glacier service for long term archiving of data that will not be frequently accessed.
  • Data analysis using Amazon Web Services. If you will be using Amazon's other services (AWS EC2, AWS Batch, AWS Elastic MapReduce, etc.) to analyze your data, storing it in S3 will provide significantly faster data access than transferring from other services.
The Northwestern Globus S3 connector service can access S3 buckets in the us-east-1 (Northern Virginia), us-east-2 (Ohio), and us-west-2 (Oregon) regions.

Amazon S3 Pricing and Cost Considerations

Amazon S3 pricing can be found here: https://aws.amazon.com/s3/pricing/. For the regions supported by this service, the standard storage tier is $0.023/GB/Month. There is a charge for requests (that is, API commands interacting with the S3 service) as well but it is usually only a few cents per month unless a very large amount of requests is issued.

Data transfer *in* to Amazon S3 is always free. However, data transfer out to the internet is $0.09/GB and data transfer from one S3 region to another costs $0.01/GB or $0.02/GB, depending on the region. Therefore it is important to consider which region you create your buckets in to minimize cost and latency of data transfer.

Northwestern does have a consolidated billing account in place with Amazon Web Services, and accounts created within this billing structure have their data transfer out cost waived (provided certain conditions are met). More information about accounts can be found here: http://www.cloud.northwestern.edu/aws/.

Request Access to the Northwestern Globus S3 Connector Service

To get access, fill out the account request form here: https://app.smartsheet.com/b/form/bc77a55967bf4f44a7d2b284b16318bb
Northwestern IT will respond and create the account within two business days.

Configure the Endpoint

Once you have been granted access, you need to create a bucket (if you don't have one already), create an IAM access key with permission to write to the bucket, and install the access key on the endpoint.

Create an S3 Bucket

If you do not already have an S3 bucket, you will need to create one. Log into the AWS console and access the S3 service:

S3 Console Link



Then click the "Create bucket" button:

Create S3 Bucket


In the screen that appears, give your bucket a name (the name must be globally unique and DNS compliant, IE only alphanumeric characters and dashes allowed) and choose a region. In order for your bucket to be accessible by the Northwestern Globus S3 Connector endpoints, it must be in the US East (N. Virginia), US East (Ohio), or US West (Oregon) regions. The default values on the "Properties" and "Permissions" pages of the Create Bucket window should be acceptable.

Generate a AWS IAM Access Key

In order to access your S3 bucket through Globus, you will need to create a user in the AWS Identity and Access Management (IAM) console with a policy attached granting access to the bucket. Then, you will generate an access key and secret key for that user, which is what Globus will use to communicate with AWS.

1. Log into the AWS console and click on the IAM service.

2. Click "Users" in the left navigation and then click the Add user button:

IAM Add User



3. In the screen that appears, give your user a name such as "globus-s3-user" and check the checkbox next to "Programmatic Access". You do not need to check the "AWS Management Console Access" checkbox.

4. In the Permissions screen that appears next, click the Attach Existing Policies Directly option. Here, you can choose to search for and attach the `AmazonS3FullAccess` policy. Doing so will grant this user full access to all of your S3 buckets in your account. Alternatively, you can create a policy (click the "Create policy" button) and use the wizard to create a policy that has full access to only the bucket you created earlier. You can also use the sample IAM policy below, being sure to replace the bucket name with the actual name of your bucket. To use it, click to the JSON tab of the Create Policy screen and paste it in, then change "YOUR-BUCKET-NAME-HERE" in the Resource lines to match your bucket name:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME-HERE"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME-HERE/*"
        }
    ]
}
5. (If creating a policy) Give your new policy a name (e.g. "globus-bucket-policy) and save it. Go back to the window where you were creating the IAM user, click the "Refresh" button, and search for your newly created policy. Check the checkbox next to it then click the "Review" button.

Attach IAM Policy


6. Click the Create User button and you will be prompted to download a CSV file containing the user's credentials (access key ID and secret access key). Either download the file or copy and paste the credentials from this page. The credentials will only be displayed one time so make sure to save them from this page.

Install the Access Key on the Endpoint

Once Northwestern IT contacts you to let you know your access to the service has been approved, you will need to install your access key and secret key onto the endpoint.

1. ssh to globusaws1.ci.northwestern.edu (replacing "<netid>" below with your actual NetID):
$ ssh <netid>@globusaws1.ci.northwestern.edu
2. Export your AWS Access Key ID and AWS Secret Access Key as environment variables (pasting in the proper values to replace the placeholders below):
$ export S3_ACCESS_KEY_ID=<Your AWS Access Key ID>
$ export S3_SECRET_ACCESS_KEY=<Your AWS Secret Access KEY>
3. Run the following commands to save the credentials to a file where Globus can read them:
$ mkdir -m 0700 -p ~/.globus
$ (umask 077; echo "$(id -un);$S3_ACCESS_KEY_ID;$S3_SECRET_ACCESS_KEY" > ~/.globus/s3)

This file will be shared by both of Northwestern's Globus S3 endpoints.

Transferring Data

You can now use the Northwestern Globus S3 endpoints to transfer data to and from your AWS S3 bucket. Log into the Globus transfer console: https://www.globus.org/app/transfer and from there you can search for the "Northwestern AWS us-east-1 (N. Virginia)", "Northwestern AWS us-east-2 (Ohio)", and "Northwestern AWS us-west-2 (Oregon)" endpoints. Activate them as normal and you will see your S3 buckets listed as directories. Note however that you must use the us-east-1 endpoint for S3 buckets created in Northern Virginia, the us-east-2 endpoint for buckets created in Ohio, and the us-west-2 endpoint for buckets created in Oregon. Although all buckets are visible to both endpoints, only buckets created in the same region can be properly accessed.

Note that the Globus S3 connector will have problems uploading files whose path contains certain non-alphanumeric characters, such as the % character. See the S3 Object Key Naming Guidelines for details, and make sure the files and directories you are uploading follow those naming guidelines.

For more information about using Globus to transfer files, see Using Globus Online With Quest.

See Also:




Keywords:quest, globus, file, transfer, sharing, data transfer, management, aws, s3   Doc ID:81093
Owner:Research Computing .Group:Northwestern
Created:2018-03-22 13:40 CSTUpdated:2018-06-22 14:46 CST
Sites:Northwestern
CleanURL:https://kb.northwestern.edu/using-globus-with-s3
Feedback:  0   0