S3 - 101
1. S3 - Simple Storage service
2. Object storage - files,videos,photos,media etc. 2 types of storage - block & object.
3. 0-5 terrabytes file size, unlimited storage - amazon checks the storage availability in each region & will provision SANs as needed
4. Files are stored in buckets - buckets in folder in amazon terms
5. Cloudberry - provides cool explorer type apps to access s3
6. S3 is a universal namespace- must be unique globally
http://s3-eu(region)-west/amazonaws.com/acloudguru
7. When you upload a file to S3 ok code of 200 means it was successful
8. Data consistency model for s3 -
a) Read after write consistency for PUTS of new objects
b) Eventual consistency for Override PUTS & DELETES (takes time to propogate)
9. S3 is a Key-Value store- Lexicographic design - so alphabetical order of file name
Key - name of file
Value - data or sequence of bytes
version id-
metadata - data about data like the date etc
Subresources- doesnot exist on its own. exists under a object
a) access control lists - users/groups that can have access to the object. This access can be defined to a file, object or a bucket also
b) Torrent- S3 supports Torrent bit protocol
10. S3 specs
a) built for 99.99% availability - SLA
b)guaranteest 99.9-11*9's for durability of information stored - can support failure of 2 data centers concurently
c)Tiered storage available - like 30 days old file put in this tier, the newer ones in the other tier etc
d) Lifecycle management - tier management configuration settings
e)versioning
f)encryption
g)secure your data using access control lists & bucket policies
11. S3 Storage Tiers/Classes
a) S3 - 99.99 availability & 11*9s 99.9 durability
b) S3- IA Infrequently accessed - less priced than s3 but charged for retrievals
c)Reduced Redundancy storage- much cheaper. but durability is 99.99 only
d) Glacier - archival only, restoration is slow may take 3-5 hours. Cheapest
Refer S3 Tiered Storage specs
Refer S3 vs Glacier
12. S3 Charges -
Charged for
a)Storage
b)Requests
c)Storage management
d) transfer pricing- data coming into S3 is free but transferring around costs
e)Transfer acceleration
S3-BUCKET Lab
1. created a bucket Storage services - click on S3
2. Uploaded a file.
3. Added users to the bucket
4. Can set encryptions - Client side encryption, Server side encryption with amazon provided keys (SSE with S3), SS encryption with KMS(SSE with KMS), SS encryption with customer provided keys (SSE-c)
5. Security is through ACL - Access control lists & bucket policies
6. By default all buckets & objects in it are private.
For a bucket there are 3 tabs shown - Overview, Properties,Permissions & Management
similarly for a file also.
Added tags to the object itself.- the tag added to the bucket doesnt get passed on
S3 Versioning Lab
1. Once version is enabled it can only be suspended but not be deleted.
2. It writes/stores every change to a file as a separate object- even delete is stored
3. Provides additional level of security - MFA (multi factor authentication) to enable versioning and delete capability
4.
Cross Region Replication
- Enabling CRR only copies the future changes from the source to the destination bucket
- the existing contents must be copied with aws cli
aws configure - will ask for access key code, access value, region.
aws s3 ls - lists the bucket
aws s3 cp --recursive sourcebucket name destibucket name
- delete markers are replicated but deleting individual versions or delete markers are not replicated
-cross region replication is at a high level
- the two buckets must be in unique regions
Lifecycle management /rules -lab
Glacier is not available in singapore & southamerica - so create your buckets for this lab in some other region
AWS console->services->storage->s3->create bucket(no caps allowed in bucket name)- in properties tab enable versioning
on selecting the bucket u get the bucket screen with overview,properties,permissions & management tab
click on management tab to find the lifecycle tab
on click of lifecycle tab it lets you create a lifecycle rule and the pop up dialog appears that takes you through the lifecycle rule setting process
The rule can be set to a bucket or an individual file
1. current versions-
a) settings to transfer to infrequent access - IA - has to be a min of 30 days
b) settings to transfer to glacier archival- has to be in ia for a min of 30 days - so 60 days form creation is minimum here
2. after file becomes previous version
a) settings to transfer to ia-
b) settings to transfer to glacier archival - here no limit on days
c) settings to expire - when this is set - only a delete marker is added against the current version at the expiry date. If delete has to happen then it has to be combined with the permanent delete option
Once you create the rule the summary screen gives you the transitions& expirations .
Exam tips
- lifecycle rule configs can be done in conjunction with versioning
- can be applied on current versions / previous versions
- transition rules - 30 days for IA & 60 days for glacier archiving
-permanently delete
CDN Overview -
Content delivery network - is a system of distributed networks that deliver webpages and other webcontents to a user based on their geographic location, the origin of the content & the content delivery server.
edgelocation - location where content will be cached and is different from AWS region
origin - the origin of all files which is distributed by the cdn. eg a website in london,europe
distribution - a cdn which consists of a cluster of edge locations.
a) web distribution - for websites only
b) RTMP - for media streaming- for dobi files with rtmp protocol
Exam tips
Edge location is not read only - write/put object is allowed which gets written to the object in the origin server
TTL- time to live- objects in the cache are present untiil TTL expires
You can clear cached object , but will be charged
Clount front distribution - Lab
Services->network& content delivery
create distribution
Origin domain- prefilled with the bucket names
Origin path - user friendly domain name otherwise its a collection of letters & number random
Origin Description - it must be unique within a distribution
an object may have multiple origins
-cache behaviour settings
read only for the distribution to the bucket/file access
allow http methods- just read or put,options etc
allow only logged in users - apply security to restrict only logged in/signed in users-restrict signed in urls
use origin cache headers
configure min,max ttls
-distribution settings
use all or specific edge locations - price class
alternate domain name
ssl certificate - default or client ssl certrification
you can apply geo restriction also
Security & encryption
Security
-Bucket policies & access control lists
-access control lists can drill down to specific objects in a bucket also
-objects in a bucket are by default private
-access logs - gives log of all requests/access done to your bucket. This can also be set up to other bucket or account
Encryption
In transit -
When object is transfer into or out of s3
SSL/TSL- https
In Rest
Server Side encryption - SSE
- SSE- with amazon provided keys - sse-s3
-SSE with aws key management service -SSE-KMS- they provide envelope/management of your encryption key- also provide order trail for the keys
-SSE with customer provided keys - SSE-C
Client side encryption
Storage Gateway
- connects in-Premise IT environment with cloud storage to provide secure & scalable data transfer & storage to AWS cloud environment from in premise
Your data center ->asynchronously replicate -> AWS(S3 or glacier)
-SG(storage gateway) is available as a software for download in the form of a VM image. It supports VMware Elxi or microsoft hyper v. Once installed at your data center
and connected to your aws account through the activation process- the software gateway can be set up using aws console with options that work for you
4 different types of gateways
-File Gateway - uses NFS - Network file system - to store flat files to s3. All data is stored only in S3. Nothing onsite
-Volume Gateway - For block storage - takes point in time snapshots and stores in s3 using Amazon - EBS - Elastic block storage
Blocks/snapshotsa are stored in incrementals - so only the latest changes are stored
a) Stored volume- data asynchronously backed up using iSCSI block storage. All data is stored in premise and backed up in s3
b) Cached volume - all data stored in S3- only the most frequently accessed data is cached on site
Gateway virtual tape library -
Used for backup and uses popula backup applications but with virtual tape cartridge
Snowball
Petabyte device -its a physical data transfer device - to effectively transfer large amounts of data into and out of AWS without the high network cost
Snowball - 80Tbyte data, onboard storage capabilities
SnowballEdge - 100TB - on board storage & computing capabilities - basically data processing in premise with lamda functions
1256 TB is 1 petabyte
& 1256 petabyte is 1 exabyte
Snowmobile - exabyte scale data transfer- at a time you can transfer 100 PB with snow mobile
Snowball lab
Its under migration in services
- click create job to create a job for AWS to send you the snowball. keep pressing next to enter your address details etc
the workflow block diagram is shown- job is midway when the snowball has been delivered to you
- open flap right & left of the cuboid narrow side. - one side is the actual snowball kindle. the other end has the access to ethernet cable etc. the top has the power jack - which needs to be connected to the power cable.
log on to aws and download the snowball client & install in your pc. get credentials & download the manifest
in cli
power up the kindle
/snowball -i internet ip -m the manifest name -u credentials
/snowball cp filename bucket link(can get from your create job request also) will be of the pattern s3//bucketname
starts copying
once done power off & create a job for aws to pick it up
Transfer Acceleration
- instead of directly uploading to a bucket it lets you upload to a edge location which then uploads to your bucket faster. this service comes at a additional cost. when this is enabled , a unique endpoint url is given which has s3-acclerated as its domain. through this link you'll write to the edge location instead of directly to the bucket.
Static website hosting
http://pri-staticwebsite.s3-website-us-east-1.amazonaws.com/
create s3 bucket and in properties enable static website. give public access to read - only then the website will work. add index & error htmls to the static website properties. create the mentioned index & error files and upload to bucket. now go to static website and click on the endpoint url the website displays.
the url is always http://bucketname.s3-website-regionname.amazonaws.com