We’ve been using MediaTemple’s grid server for several months and until recently had been generally pleased with them. Our pages took ~.5 - 1 second to load, which is not great, but not awful. Additionally, we hoped that by being on the grid server, we would be protected from spikes in traffic that could bring a single server down. Last Friday load times started getting slower (~10 seconds) then over the weekend they got up to 45 seconds to over a minute. I called MediaTemple and asked them what the issue was. The rep stated that we were probably just making too many database calls. I then directed him to a page we set up that had zero database calls. The rep responded with a panicked “OK we’ll call you back!” and hung up. This was Monday morning. It’s now Tuesday night and they are still a mess. Our pages were alternating between taking over a minute to load and serving up errors. That’s over 36 hours of a professional hosting service being totally jacked. Unbelievable. We’re now on M5 Hosting and won’t ever be on MediaTemple again.

If they truly have had a massive “spike in demand” recently, I wonder if some Facebook apps that were hosted on their grid took off over the weekend. These are the updates that they have been posting on their site:

Web and email latency on Grid-Service Cluster.2
Incident Tracker status: HIGH

Monitoring system updated, AccountCenter maintenance
Wednesday, October 3rd, 2007 at 1:26 pm
By internal metrics Grid Cluster.2 is performing much better than before this incident began. No latency issues were detected today, given that we doubled the amount of RAM and increased the number of servers used as cluster nodes by 25 percent this was not unexpected. Our teams will continue to re-distribute storage load across the new resources to further reduce I/O related latency. (mt) Engineers have come up with new ways to measure Grid performance and will be adding them to our monitoring systems over the next few days, increasing the likelihood that we will detect symptoms before they affect customers. We have also changed our growth projection formulas so they will better predict when we need to add hardware to the clusters, avoiding issues like this in the future. In the next few days we will be scheduling a maintenance window for the AccountCenter so we can eliminate the main cause of slow page loads in the customer interface. We are leaving this incident open for the next 24 hours while we continue to work on improving performance and monitoring for Cluster.2

 

Additional tuning
Tuesday, October 2nd, 2007 at 4:40 pm
After mitigating most of today’s latency issues our engineering teams are continuing to work on tuning Cluster.2 Areas where we’ve made changes include major hardware additions, firewall rules, load balancers, networking, service configuration, storage tweaks and AccountCenter speed enhancements. The symptoms primarily manifested as latency issues in web page load times, SMTP and FTP. Unsatisfactory performance was also reported in MySQL enabled applications and AccountCenter management features. We consider the service level of Cluster.2 over the last two days to be unacceptable and are doing our utmost to correct the situation.

Performance Improvements
Tuesday, October 2nd, 2007 at 1:46 pm
After making several more tweaks to Grid Cluster.2 including firewall configuration changes and filesystem tuning performance has improved dramatically. (mt) Engineers have seen vast improvement in basic PHP page load times compared this morning. All other services including MySQL, SMTP and FTP should see corresponding latency decreases. All nodes have had RAM upgrades and are performing well, even so the load across Cluster.2 is still higher than we’d like. Our teams are still hard at work on this issue, we’ll keep updating our customers with our progress. Thank you for you patience.

More new hardware
Tuesday, October 2nd, 2007 at 11:55 am
In order to combat this lingering issue (mt) system engineers have doubled the amount of available RAM in every Grid Cluster.2 node. Combined with the 25% increase in total nodes we are seeing major performance gains for the cluster and latency times are plunging. We are still working furiously on this issue and will update this thread as soon as we have news.

Progress made, some latency returns
Tuesday, October 2nd, 2007 at 7:50 am
As of 6:30AM PDT we detected latency increasing across some nodes of Grid Cluster.2, services like FTP, SMTP and web pages (HTTP) are affected. We have engaged several teams of engineers, data center personnel and third party vendors to bring a resolution to these issues as soon as possible. Again, we thank you for your patience in this matter.

Latency times back to normal
Monday, October 1st, 2007 at 5:47 pm
(mt) Engineers made several changes throughout the day to improve performance of Grid Cluster.2 These changes include the addition of more available nodes, reconfiguring services and various networking tweaks. Our team is closely monitoring Cluster.2 and AccountCenter performance to ensure that the latency issues do not recur.

Continued work
Monday, October 1st, 2007 at 2:27 pm
We are still receiving reports of sporadic latency across Grid Cluster.2 Our engineers are currently working to eliminate any remaining issues that may be causing slow response times. We have also implemented several fixes to the AccountCenter that will help eliminate slow page loads. We will update this thread as soon as we have more information. Thank you for your patience in this matter.

Nodes added, services coming back online
Monday, October 1st, 2007 at 12:32 pm
(mt) Engineers have determined that the latency issue affecting Cluster.2 was due to unexpected growth causing a general lack of computational resources. To resolve the issue (mt) data center personnel have added more machines to Cluster.2 increasing the number of available nodes by 25 percent. All services including FTP, Email and HTTP are coming back online. Recent demands for computational resources have jumped unexpectedly which caused degraded performance of Cluster.2 Our engineers are re-evaluating the projected growth formula used to determine when Grid resources need to be added. We apologize for any inconvenience this may have caused.

Web and email latency on Grid-Service Cluster 2
Monday, October 1st, 2007 at 9:48 am
Some customers on Grid-Service Cluster.2 may be experiencing latency to web and email. There may also be some latency for all customers accessing the Account Center. (mt) Media Temple’s Systems Engineers are currently investigating and working to resolve the issue as quickly as possible. Thank you for your patience and understanding.


  1. Jason Mcvearry

    Rob,

    Cheers, and many apologies for any inconvenience. I would elaborate on the issues, but we’ve pretty much documented them as openly as possible on our site.

    More or less this was a complete anomoly and clearly this level of performance is not up to our standards. Fell free to contact me with your account info and any other questions you may have.

    Best,
    Jason Mcvearry
    jason@mediatemple.net

  2. Jason Mcvearry

    On another note..several clients on the Grid have frequently been dugg and linked on Reddit at the same time and their sites did not go down (or hiccup). The Grid works, but as with any new technology, upgrades and unusual usage patterns can create issues. Thankfully we’ve got a staff of engineers unafraid to miss a few days of sleep.

    Cheers again,
    Jason

  3. michaelt

    You should use a VPS, just like dedicated but cheaper and more flexible. Try Slicehost.

  4. Adam

    hmm, yes, I called them and they told me they will solve it in one hours. then one day passed..

  5. Don

    The “maintenance” was supposed to be 1.5 hours, then they changed it to 6 and now “they don’t know” I am seriously pissed at these guys. I wasted over a hundred dollars in PPC clicks going to a site that isn’t there.

    This massive outage is unacceptable I am definetly moving my servers elsewhere!

  6. ATP

    This was really a deplorable performance by mediatemple - they already have a really bad customer service and this brings into question their technical competence too:
    http://pakistaniat.com/2007/12/01/atps-disappearance-no-we-were-not-blocked-or-hacked-not-yet/

  7. patrick giagnocavo

    I suppose this is a no-brainer, obvious point, but, if you are running a web site you care about, you should have a way to monitor it remotely and have that remote site send you email when there is a problem.

    E.g. send email if site is down, or if page load for a set page takes longer than say 2 seconds.

    What is great is that as you add more tests and variables to check (I use argus.tcp4me.com as the code that monitors various sites) you can quickly use it as a “dashboard” to figure out what might be the issue.

  8. Jim Goings

    Issues with Media Temple’s grid server are still very much ongoing:
    http://www.jimgoings.com/2008/04/media-temple-kills-my-inner-child/

  9. Jim Goings

    I’ve been with Media Temple on the grid service for about 8 months and it’s been fairly unstable the entire time. My site went down 3 different times on Saturday for example. I run an external monitor to ensure availability and unfortunately, I’m only seeing about 98% uptime right now.

    The worst part is that for about 10% of the time, the site loads very slowly. I wrote more with some details on my blog:
    http://www.jimgoings.com/2008/04/media-temple-kills-my-inner-child/

  1. 1 Mediatemple Outages Continue : The Last Podcast

    [...] Looks like I am not the only one pondering a switch away from MediaTemple. Technorati tags: mediatemple — Related Posts [...]

  2. 2 Coming Soon: Media Temple Cluster Server (CS) : Daily Hypertext

    [...] with that innovation came some hiccups. Intermittently, over the past year, the grid simply hasn’t been able to handle the loads placed upon it by all the new users. The major problems have been latency with database calls and page load times [...]

  3. 3 My picks: The best service providers for startups « RobWebb2k

    [...] Hosting Service: M5 Hosting Previous mention here. These guys were referred to us by a friend and they have done a great job so far. Stay the hell [...]



Leave a Comment




  • Other stuff

  • Subscribe

  • RSS KnowledgeBid

    • General Expert: Motorcycle Dealerships
      We have consultations available for people intimately familiar with motorcycle dealerships. We would like to discuss customer buying habits and trends that have emerged over the last several months, as well as what you think is coming in the market. Ideal candidates will own motorcycle dealerships.
    • Resellers/Users/Customers: Corporate Video Conferencing Products
      We have consultations available for individuals that can provide updates on the sales trends and evaluations for the video conferencing market. Companies of interest include: Tandberg, Polycom, Cisco and Hewlett Packard. Ideal candidates will be a customer, user or reseller of video conferencing products.
    • General Expert: Automotive Fasteners (OEM)
      We are interested in speaking to anyone who is familiar with the competitive landscape of OEM fasteners business in the U.S., as related to manufacturers like Cummins, Boeing, and Harley Davidson and in the UK with Ford's Premier Automotive Group (Jaguar/Land Rover/Aston Martin) and others. Ideal candidates will have deep industry experience and knowledge of the competitive landscape and trends within the industry.
    • General Expert/Instructor: Deception & Social Engineering Detection
      We would like to speak with people who have experience training employees of financial institutions in the art of detecting deception/social engineering. Ideal candidates will have a deep background in instruction/teaching these types of techniques and strong empirical data that supports efficacy of their methods.
    • Purchaser/General Expert: Human Resource Information Systems
      We are looking for people who evaluate performance management, talent management, and human resources information systems. Ideal candidates should be familiar with the popular vendors out there such as Success Factors, Taleo and other and be able to discuss advantages and disadvantages of the software and services.
    • General Expert: IT Procurement
      We are interested in speaking with IT procurement professionals about trends they have seen in budgets and spending over the last several months. Ideal candidates will manage IT procurement budgets and make purchasing decisions for their organization.
    • Retail Buyer: Shoes and Footware
      We would like to speak with retail buyers who deal with shoe purchasing. Ideal candidates will be department store shoe buyers or shoe buyers in other retail sectors. We would like to discuss trends in shoe retail and what you think will happen in the industry over the next 6 - 12 months.
    • General Expert: Gaming (Activision)
      We are looking for people who can comment on Activision's outlook over the next 6-12 months. How will games lke Guitar Hero, World of Warcraft, Bond, Starcraft, Diablo and Call of Duty do? What would be expected in terms of subscriber growth? What challenges lie ahead? We'd be interested in talking to someone at a specialty PC maker like Alienware or a video game analyst with experience talking to investment fund managers.
    • Dentists & End Users of Dental Laboratories
      We are conducting a research project on the use of dental laboratories for various products, including crowns and bridges, and removable devices (dentures, cast partials, etc.). In addition to understanding general product usage and procedure trends, we are interested in the impact of new technologies, such as the CEREC system, and of foreign import products on the use of dental labs in the U.S.
    • General Expert: Lawn/Garden Retail (Fertilizer)
      I am interested in learning about recent trends in the fertilizer industry regarding brands, recent consumer demand, and the extent to which consumers are "trading down" from recognized brand name products. Ideal candidate would be a Regional Manager of the Lawn & Garden department of a prominent retail store.