Amazon S3 Tools: Command Line S3 Client Software and S3 Backup

AWS S3 Command Line Clients for Windows, Linux, Mac. Backup to S3, upload, retrieve, query data on Amazon S3.


S3cmd Home   |   S3cmd Download   |   FAQ / KB   

S3cmd: FAQ and Knowledge Base

Main Page > Printable
Common Errors
 Common known issues and their solutions
 What is AttributeError: 'module' object has no attribute 'format_exc'?
 What is error 'KeyError: etag'?
 Why does s3cmd complain about invalid signature after upgrading from older s3cmd?
FAQ
 Does s3cmd support Amazon S3 server-side encryption?
 Does s3cmd support multipart uploads?
 Does s3cmd work on Windows?
 Does s3cmd work with CloudFront in Frankfurt (eu-central-1)?
 Does S3cmd work with Frankfurt region (eu-central-1)?
 Is there a mailing list for s3cmd?
FAQ: Version 1.5
 Is s3cmd version 1.5 slower than earlier releases?
 Why did s3cmd version 1.5 drop python 2.4 support?
Tips, Tricks and More
 About the s3cmd configuration file
 CloudFront support in s3cmd
 Enforcing server-side encryption for all objects in a bucket
 How can I remove a bucket that is not empty?
 How to restrict access to a bucket to specific IP addresses
 How to throttle bandwidth in s3cmd
 Why doesn't 's3cmd sync' support PGP / GPG encryption for files?


Common Errors


Common known issues and their solutions

Please see: wiki on GitHub



What is AttributeError: 'module' object has no attribute 'format_exc'?

You are probably using Python 2.3 or older, however s3cmd only supports Python 2.4 or newer.



What is error 'KeyError: etag'?

This is an old error in s3cmd that is now fixed. Please upgrade to s3cmd 0.9.8.4 or later.



Why does s3cmd complain about invalid signature after upgrading from older s3cmd?

S3cmd 0.9.5 added support for buckets created in European Amazon data center. Unfortunately the change has broken access to existing buckets with upper-case characters in their names. This regression has long stayed unnoticed and has only recently been fixed in s3cmd 0.9.8.4.

Therefore if you are suddenly getting errors like:

ERROR: S3 error: 403 (Forbidden): SignatureDoesNotMatch

after upgrading from s3cmd 0.9.4 or older to 0.9.5 or newer, you are advised to upgrade even further to s3cmd 0.9.8.4 or newer to regain access to your upper-case named buckets.




FAQ


Does s3cmd support Amazon S3 server-side encryption?

Yes, file encryption can optionally be used to make a backup/upload to S3 more secure. Files can be stored on the Amazon S3 servers encrypted (i.e. at rest).

Server-side encryption is only available starting with s3cmd 1.5.0-beta1.


S3cmd provides two types of file encryption: server-side encryption and client-side encryption.


Server-Side encryption
is about data encryption at rest, that is, Amazon S3 encrypts your data as it writes it to disks in its data centers and decrypts it for you when you access it. As long as you authenticate your request and you have access permissions, there is no difference in the way you access encrypted or unencrypted objects. Amazon S3 manages encryption and decryption for you. For example, if you share your objects using a pre-signed URL, the pre-signed URL works the same way for both encrypted and unencrypted objects.

Amazon S3 Server Side Encryption employs strong multi-factor encryption. Amazon S3 encrypts each object with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates. Amazon S3 Server Side Encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.

When you upload one or more objects with S3cmd, you can specify in your request if you want Amazon S3 to save your object data encrypted. To specify that you want Amazon S3 to save your object data encrypted use the flag --server-side-encryption. Server-side encryption is optional. Your bucket might contain both encrypted and unencrypted objects. Encrypted objects are marked automatically with the metadata header x-amz-server-side-encryption set to AES256.



With Client-Side encryption, you add an extra layer of security by encrypting data locally BEFORE uploading the files to Amazon S3. Client-side encryption and server-side encryption can be combined and used together. In S3cmd, client-side encryption is applied by specifying the flag -e or --encrypt.



Does s3cmd support multipart uploads?

Yes, the latest version of s3cmd supports Amazon S3 multipart uploads.

Multipart uploads are automatically used when a file to upload is larger than 15MB.
In that case the file is split into multiple parts, with each part of 15MB in size (the last part can be smaller). Each part is then uploaded separately and then reconstructed at destination when the transfer is completed.

With this new feature, if an upload of a part fails, it can be restarted without affecting any of the other parts already uploaded.

There are two options related to multipart uploads in s3cmd. They are:

--disable-multipart

Disable multipart uploads for all files

and

--multipart-chunk-size-mb=SIZE
Size of each chunk of a multipart upload. Files bigger than SIZE are automatically uploaded as multithreaded-multipart, smaller files are uploaded using the traditional method. SIZE is in Mega-Bytes, default chunk size is 15MB, minimum allowed chunk size is 5MB, maximum is 5GB.



Does s3cmd work on Windows?

Yes, however, being written in Python, s3cmd requires Python 2.4+ for Windows to be installed and also it requires the Python library date-util.

Alternatively, you can try S3Express. S3Express is a "native" Windows command line program, that works out of the box and does not require any additional library or software to run. S3Express is a commercial program. It's very compact and has very small footprint: the entire program is less than 5MB. S3Express is perfect for uploading, querying, listing Amazon S3 objects via the command line on Windows.



Does s3cmd work with CloudFront in Frankfurt (eu-central-1)?

No. CloudFront in Frankfurt currently requires newer signing code than 1.5.0 includes. This issue will be addressed in a future release. Other S3 operations do work for Frankfurt.



Does S3cmd work with Frankfurt region (eu-central-1)?

Yes. S3cmd supports the new Frankfurt S3 region since version 1.5



Is there a mailing list for s3cmd?

Yes, there are 4 s3cmd mailing lists available. Old e-mails are archived and searchable and can be accessed for reference.

The mailing lists are available here:

http://sourceforge.net/p/s3tools/mailman/




FAQ: Version 1.5


Is s3cmd version 1.5 slower than earlier releases?

Version 1.5 is a bit slower because calculating both the md5sum and sha256sum values for every object being uploaded is disk I/O intensive, and both calculations are necessary. Future versions will work to minimize local disk I/O.



Why did s3cmd version 1.5 drop python 2.4 support?

The new signing code implemented in version 1.5 is not easily backwards compatible to python 2.4 libraries. Minimal requirement is now Python 2.6. Please note that Python 3+ is supported since S3cmd version 2.




Tips, Tricks and More


About the s3cmd configuration file

The s3cmd configuration file is named .s3cfg and it is located in the user's home directory, e.g. /home/username/ ($HOME).

On Windows the configuration file is called s3cmd.ini and it is located in %USERPROFILE% -> Application Data , usually that is c:\users\username\AppData\Roaming\s3cmd.ini


The s3cmd configuration file contains all s3cmd settings. This includes the Amazon access key and secret key for s3cmd to use to connect to Amazon S3.

A basic configuration file is created automatically when you first issue the s3cmd --configure command after installation. You will be asked a few questions about your Amazon access key and secret key and other settings you wish to use, and then s3cmd will save that information in a new config file.

Other advanced settings can be changed (if needed) by editing the config file manually. Some of the settings contain the default values for s3cmd to use. For instance, you could change the multipart_chunk_size_mb default value from 15 to 5, and that would become the new default value for the s3cmd option --multipart-chunk-size-mb.


The following is an example of a s3cmd config file:

[default]
access_key = TUOWAAA99023990001
access_token =
add_encoding_exts =
add_headers =
bucket_location = US
cache_file =
cloudfront_host = cloudfront.amazonaws.com
default_mime_type = binary/octet-stream
delay_updates = False
delete_after = False
delete_after_fetch = False
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
expiry_date =
expiry_days =
expiry_prefix =
follow_symlinks = False
force = False
get_continue = False
gpg_command = /usr/bin/gpg
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase =
guess_mime_type = True
host_base = s3.amazonaws.com
host_bucket = %(bucket)s.s3.amazonaws.com
human_readable_sizes = False
ignore_failed_copy = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
invalidate_on_cf = False
list_md5 = False
log_target_prefix =
max_delete = -1
mime_type =
multipart_chunk_size_mb = 15
preserve_attrs = True
progress_meter = True
proxy_host =
proxy_port = 0
put_continue = False
recursive = False
recv_chunk = 4096
reduced_redundancy = False
restore_days = 1
secret_key = sd/ceP_vbb#eDDDK
send_chunk = 4096
server_side_encryption = False
simpledb_host = sdb.amazonaws.com
skip_existing = False
socket_timeout = 300
urlencoding_mode = normal
use_https = True
use_mime_magic = True
verbosity = WARNING
website_endpoint = http://%(bucket)s.s3-website-% (location)s.amazonaws.com/
website_error =
website_index = index.html



CloudFront support in s3cmd

CloudFront is Amazon content delivery network (CDN) — a ton of webservers distributed in multiple datacentres across the globe that should provide a fast access to public files stored in your buckets. The idea of a CDN is to bring some content as close to the user as possible, for instance when an european user is browsing your website he should be served by Europe-based webservers, if the same content is being accessed by clients from Japan they should be served by the CDN servers in Asia. In this case the web content is stored in Amazon S3 and the CDN in use is Amazon CloudFront. See more details at Amazon’s CloudFront page.

Since these two services are very closely related, it makes sense to have CloudFront support directly in s3cmd. CloudFront support has been added since version 0.9.9.

How is CloudFront related to Amazon S3

  • About buckets — As you know the files uploaded to Amazon S3 are organised in buckets. A bucket can have a name of your choice, but it pays off to name it in a DNS-compatible way. That in general means lower case only characters of the following groups: a-z, 0-9, - (dash) and . (dot). DNS-incompatible named buckets are not usable with CloudFront. A DNS compatible bucket name is for instance s3tools-test, with s3cmd URI syntax it is s3://s3tools-test
  • About publicly accessible files — A file uploaded to S3 with a Public ACL is accessible to anyone over a standard HTTP protocol. For example upload a file logo.png to the above named bucket:
    s3cmd put --acl-public logo.jpg s3://s3tools-test/example/logo.png

    The HTTP host name is always http://bucketname.s3.amazonaws.com so in our case the file would be accessible as http://s3tools-test.s3.amazonaws.com/example/logo.png
  • About virtual-hosts — If you don’t like the public URL above check out Amazon S3 Virtual Hosts: if your bucket name is a fully qualified domain name and your DNS is set properly you can refer to the bucket directly with its name. For instance let’s have a bucket called s3://public.s3tools.org and upload the above mentioned logo.png in there:
    s3cmd put --acl-public logo.jpg s3://public.s3tools.org/example/logo.png
    Create a DNS record for public.s3tools.org to have a CNAME of public.s3tools.org.s3.amazonaws.com:
    public.s3tools.org.   IN   CNAME   public.s3tools.org.s3.amazonaws.com.
    From now on everybody can access the logo as http://public.s3tools.org/example/logo.png – this way you can offload all the static images, pdf documents, etc from your web server to Amazon S3.
  • About CloudFront on the scene — The disadvantage in the above is that your content is in a data centre either in the US or in Europe. If it’s in EU and your visitor lives in South Pacific they’ll experience a poor access performance, even if they live in the US it still won’t be optimal. Wouldn’t it be nice to bring your content closer to them? Let Amazon copy it to the CloudFront datacentres in many places around the world and let it do the magic when selecting the closest datacentre for each client. Simply create for example a DNS record cdn.s3tools.org pointing to a special CNAME that we’ll find out in a later example and have all your static content at http://cdn.s3tools.org/.... This cdn.s3tools.org name will resolve to different IP addresses in different parts of the world, always pointing to the closest CloudFront datacentre available. The before mentioned logo.png accessed through CDN now has a URL: http://cdn.s3tools.org/example/logo.png

How manage CloudFront using s3cmd

  • CloudFront is set up at a bucket level — you can publish one or more of your buckets through CloudFront, creating a CloudFront distribution (CFD) for each bucket in question. To publish our public.s3tools.org bucket let’s do:
    s3cmd cfcreate s3://public.s3tools.org
  • Each CFD has a unique Distribution ID (DistId) in a form of a URI: cf://123456ABCDEF It’s printed in the output of s3cmd cfcreate:
    Distribution created:
    Origin:         s3://public.s3tools.org/
    DistId:         cf://E3RPA4Z4ALGTGO
    DomainName:     d11jv2ffak0j4h.cloudfront.net
    CNAMEs:
    Comment:        http://public.s3tools.org.s3.amazonaws.com/
    Status:         InProgress
    Enabled:        True
    Etag:           E3JGOIONPT9834
  • Each CFD has a unique “canonical” hostname automatically assigned by Amazon at the time the CFD is created. This could be for instance d11jv2ffak0j4h.nrt4.cloudfront.net.. It can also be found in the cfcreate output, or later on with cfinfo:
    ~$ s3cmd cfinfo
    Origin:         s3://public.s3tools.org/
    DistId:         cf://E3RPA4Z4ALGTGO
    DomainName:     d11jv2ffak0j4h.cloudfront.net
    Status:         Deployed
    Enabled:        True
  • Apart from the canonical name you can assign up to 10 DNS aliases for each CFD. For example the above canonical name can have an alias of cdn.s3tools.org. Either add the CNAMEs at the time of CFD creation or later with cfmodify command:
    ~$ s3cmd cfmodify cf://E3RPA4Z4ALGTGO --cf-add-cname cdn.s3tools.org
    Distribution modified:
    Origin:         s3://public.s3tools.org/
    DistId:         cf://E3RPA4Z4ALGTGO
    DomainName:     d11jv2ffak0j4h.cloudfront.net
    Status:         InProgress
    CNAMEs:         cdn.s3tools.org
    Comment:        http://public.s3tools.org.s3.amazonaws.com/
    Enabled:        True
    Etag:           E19WWJ5059E2W3

    At this moment you should update your DNS again:
    cdn.s3tools.org.   IN   CNAME   d11jv2ffak0j4h.cloudfront.net.
  • Run cfinfo to confirm that your change has been deployed. Look for the Status: and Enabled: fields:
    ~$ s3cmd cfinfo cf://E3RPA4Z4ALGTGO
    Origin:         s3://public.s3tools.org/
    DistId:         cf://E3RPA4Z4ALGTGO
    DomainName:     d11jv2ffak0j4h.cloudfront.net
    Status:         Deployed
    CNAMEs:         cdn.s3tools.org
    Comment:        http://public.s3tools.org.s3.amazonaws.com/
    Enabled:        True
    Etag:           E19WWJ5059E2W3
  • Congratulation, you’re set up. Now you should be able to access CloudFront using the host name of your choice: http://cdn.s3tools.org/example/logo.png and serve your visitors faster then ever ;-)
  • Oh, you may want to remove your CloudFront Distributions later, indeed. Simply run s3cmd cfremove cf://E3RPA4Z4ALGTGO to achieve that. Be aware that it will take a couple of minutes to finish because the CFD must be disabled first and that change must be propagated (“deployed”) before actually removing the the distribution. It’s perhaps easier to disable it manually using s3cmd cfmodify --disable cf://E3RPA4Z4ALGTGO, go get a coffee and once you’re back check that cfinfo says Enabled: False and Status: Deployed. At that moment s3cmd cfremove should succeed immediately.



Enforcing server-side encryption for all objects in a bucket

Amazon S3 supports bucket policy that you can use if you require server-side encryption for all objects that are stored in your bucket. For example, the following bucket policy denies upload object (s3:PutObject) permission to everyone if the request does not include the x-amz-server-side-encryption header requesting server-side encryption.

{
   "Version":"2012-10-17",
   "Id":"PutObjPolicy",
   "Statement":[{
         "Sid":"DenyUnEncryptedObjectUploads",
         "Effect":"Deny",
         "Principal":{
            "AWS":"*"
         },
         "Action":"s3:PutObject",
         "Resource":"arn:aws:s3:::YourBucket/*",
         "Condition":{
            "StringNotEquals":{
               "s3:x-amz-server-side-encryption":"AES256"
            }
         }
      }
   ]
}

In S3cmd, the --server-side-encryption option adds the x-amz-server-side-encryption header to uploaded objects.



How can I remove a bucket that is not empty?

You have to empty it first, sorry :-) There are two ways:

  1. The convenient one is available in s3cmd 0.9.9 and newer and is as simple as s3cmd del --recursive s3://bucket-to-delete
  2. The less convenient one available prior to s3cmd 0.9.9 involves creating an empty directory, say /tmp/empty and synchronizing its content (i.e. nothing) to the bucket: s3cmd sync --delete /tmp/empty s3://bucket-to-delete

Once the bucket is empty it can then be removed with s3cmd rb s3://bucket-to-delete



How to restrict access to a bucket to specific IP addresses

To secure our files on Amazon S3, we can restrict access to a S3 bucket to specific IP addresses.

The following bucket policy grants permissions to any user to perform any S3 action on objects in the specified bucket. However, the request must originate from the range of IP addresses specified in the condition. The condition in this statement identifies 192.168.143.* range of allowed IP addresses with one exception, 192.168.143.188.

{
    "Version": "2012-10-17",
    "Id": "S3PolicyIPRestrict",
    "Statement": [
        {
            "Sid": "IPAllow",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::bucket/*",
            "Condition" : {
                "IpAddress" : {
                    "aws:SourceIp": "192.168.143.0/24"
                },
                "NotIpAddress" : {
                    "aws:SourceIp": "192.168.143.188/32"
                }
            }
        }
    ]
}

The IPAddress and NotIpAddress values specified in the condition uses CIDR notation described in RFC 2632. For more information, go to http://www.rfc-editor.org/rfc/rfc4632.txt



How to throttle bandwidth in s3cmd

On Linux you can throttle bandwidth using the throttle command line program, e.g.

cat mybigfile | throttle -k 512 | s3cmd put - s3://mybucket/mybigfile

which would limit reads from mybigfile to be 512kbps.

Throttle is available by apt-get or yum in all the major Linux distributions.


Alternatively, the utility trickle can be used:

trickle -d 250 s3cmd get... would limit the download rate of s3cmd to 250 kilobytes per second.

trickle -u 250 s3cmd put... would limit the upload rate of s3cmd to 250 kilobytes per second.

Trickle is a yum or apt-get install if you're on a Fedora or Debian/Ubuntu machine. You must have the libevent library (Trickle's only dependency) already installed before you install Trickle. Most modern distribution will already have this installed.

On Windows, the throttle or trickle utilities are not available, but if you are using S3Express instead of s3cmd, you can limit max bandwidth simply by using the flag -maxb of the PUT command.



Why doesn't 's3cmd sync' support PGP / GPG encryption for files?

What the s3cmd sync command does is:

  1. Walk the filesystem to generate a list of local files
  2. Retrieve a list of remote files uploaded to Amazon S3
  3. Compare these two lists to find which local files need to be uploaded and which remote files should be deleted

The information about remote files we get from Amazon S3 is limited to names, sizes and md5 of the stored files. If the stored file is GPG encrypted we only get size and md5 of the encrypted file, not the original one and therefore we can't compare the local and remote lists against each other.


 A printable version of the entire FAQ and Knowledge Base is also available.
 For further queries or questions, please contact us.