I got a chance to stop by the AWS Summit 2014 at their stop in San Francisco at the Moscone. It was a free event, and mobbed by people apparently hungry to learn more about Amazon’s Cloud platform services. I was interested in the talk: Scaling on AWS for the first 10 million users. Here are some notes:
In 2014, the amount of capacity added daily on AWS exceeds the capacity to operate Amazon’s $7billion business in 2014.
Stages of growing:
Initial: single machine (e.g., EC2 instance) running web+db, with DNS routed by AWS Route53.
Users>100: Split db off into a db service instance (e.g., using Amazon Relational Data Store (RDS)), or another vm instance dedicated to the db.
Users>1000: Replicate machine and db instance. Now we have two pairs of web+app and db. Let the dbs coordinate using replication. Balance between them using Elastic Load Balancer. Place each pair in different “availability zones” so that a data center failure in one of the zones affects only one web+app and db pair–your system is still functional in the other zone.
Users>10k: Add even more pairs. Consider more availability zones.
Beyond: Move static content to S3 and serve using cloudfront for edge CDN service. Use elastic cache and/or dynamodb to cache state and reduce traffic to db instances.
Amazon autoscaling can use metrics from cloudwatch to make decisions to add more web nodes or db instances as load changes.
Use service-oriented architecture: this facilitates making your components stateless, replaceable, and scalable. Don’t let components talk directly to each other–use indirection so that either side of the communication can fail without affecting the other.
Overall advice: split processing into pieces that are loosely-coupled and stateless as is possible.
Honestly though, I didn’t think of it so much as a deep-dive as advertised, but that may be more of a reflection of my deeply technical perspective.
The keynote talk this morning announced some steep price drops in AWS products like EC2 and S3. Most people think it was in response to Google’s price drops yesterday. Techcrunch articles: (Google) (AWS)
I also got a chance to talk with some people from the DynamoDB and CloudSearch groups at AWS, and was surprised to find out that both have implemented some tooling libraries for geospatial support. The CloudSearch support seemed really interesting. Apparently, you can push them your data, and they can do area searches (circle or cone) on latitude/longitude data, which is pretty neat. The showstopper might be the pricing (it seemed like about $1000/TB/month), but it seems worth a look at least–their latencies are really, really low, for returning batches of objects.
Anyway, the whole one-day event seemed well attended, and I found people pretty enthusiastic about their work and cloud-based work in general. Amazon really does have a dizzying array of building blocks for scalable systems. If I were building a scalable webapp, AWS really seems like a great way to get things (a) up and running, and (b) scaled up to netflix-like scale.