A Q&A with WebLinc co-founder and CTO, Jason Hill, on Workarea's underlying infrastructure and how it supports our customers’ highest revenue days.
Q: What factors did you consider when determining Workarea's hosting infrastructure?
A: Many retail teams selected Workarea as their commerce platform because of its proven ability to scale. We’ve got many customers whose businesses are highly seasonal, highly promotional or prone to traffic spikes. Costume SuperCenter, for example, does more than 80% of their yearly revenue during the month of October. Lime Crime’s massive loyal audience heads to the site whenever they have a product drop or put products on sale. Consequently, we needed to select infrastructure that could respond automatically to changes in performance and compensate with scaling.
Q: Why did you decide on AWS?
A: Security, compliance and availability zones aside, with AWS we’re always a step ahead of our customers’ unforeseen commerce events. It’s impossible to predict when Oprah will talk about The Bouqs or when Heidi Klum will decide to make an appearance wearing Rachel Roy. We take advantage of AWS’s Auto Scaling to enable near instantaneous reaction to performance degradation and Elastic Load Balancing to distribute incoming application traffic across multiple targets. The tools AWS provides allow us to do in seconds what used to take days. Plus, we still have complete control over our instances, which lets us do the custom enhancement work that benefits each of our customers. Companies like Netflix, Hulu, Spotify, Airbnb and Pinterest also host their applications in AWS--you can’t ask for better validation than that.
Q: Can you talk more about auto-scaling? How does it work?
A: At its simplest, auto scaling allows us to dynamically add or remove Amazon EC2 instances according to conditions we define. When CPU starts to butt up against our predetermined thresholds, more capacity is added. When conditions return to normal, that additional capacity is removed. This benefits our customers by making sure the sites they work hard to maintain are available to their customers. It also represents long-term savings, because decreasing capacity during lulls reduces cost.
Q: What peripheral tools do you use for performance monitoring?
A: This will turn into a novel quickly if I describe them all! I’ll focus on those that relate most to scale:
New Relic monitors and reports on application performance. It provides visibility into the application’s components and how they are performing relative to each other. Alerts are triggered when performance, availability, or error rates exceed defined thresholds.
Pingdom monitors application availability from 50 locations around the world. Critical multistep paths in the purchase process are monitored through Pingdom's Synthetics system. When a system or component is unavailable, alerts are sent to WebLinc's PagerDuty account to alert the on-call team.
PagerDuty is like the grand central station of monitoring alerts. All of the monitoring tools trigger PagerDuty alerts which are sent to on call representatives.
CloudWatch is a monitoring service for all AWS cloud resources and the applications we run on AWS. We use it to collect and track metrics, monitor log files, set alarms, and automatically react to changes in our AWS resources.
Sentry houses any exception generated by our system components for review by our infrastructure team.
Q: When you have the opportunity to prepare for big days, what are the steps you take?
A: If a known commerce event is approaching (think Black Friday or Cyber Monday) the infrastructure team helps our system implementers identify and address performance flaws. While this type of optimization work happens everyday, year round, the extra set of eyes during peak season can be helpful. Simple things like putting fail-safes in place to compensate for third party technology performance deficiencies can go a long way...especially when every second counts. Because auto-scaling is triggered by high CPU usage, we’ll also add Amazon EC2 instances in advance. This way, customers never have to experience a performance downtick.
Q: What is the uptime guarantee of Workarea? How did Black Friday and Cyber Monday go this year?
A: We guarantee 99.98% uptime. I can proudly say our customers experienced 100% uptime during last year's peak season and on Black Friday and Cyber Monday this year. We'll work hard for a repeat performance in 2018.