We’ve been using MediaTemple’s grid server for several months and until recently had been generally pleased with them. Our pages took ~.5 – 1 second to load, which is not great, but not awful. Additionally, we hoped that by being on the grid server, we would be protected from spikes in traffic that could bring a single server down. Last Friday load times started getting slower (~10 seconds) then over the weekend they got up to 45 seconds to over a minute. I called MediaTemple and asked them what the issue was. The rep stated that we were probably just making too many database calls. I then directed him to a page we set up that had zero database calls. The rep responded with a panicked “OK we’ll call you back!” and hung up. This was Monday morning. It’s now Tuesday night and they are still a mess. Our pages were alternating between taking over a minute to load and serving up errors. That’s over 36 hours of a professional hosting service being totally jacked. Unbelievable. We’re now on M5 Hosting and won’t ever be on MediaTemple again.

If they truly have had a massive “spike in demand” recently, I wonder if some Facebook apps that were hosted on their grid took off over the weekend. These are the updates that they have been posting on their site:

Web and email latency on Grid-Service Cluster.2
Incident Tracker status: HIGH

Monitoring system updated, AccountCenter maintenance
Wednesday, October 3rd, 2007 at 1:26 pm
By internal metrics Grid Cluster.2 is performing much better than before this incident began. No latency issues were detected today, given that we doubled the amount of RAM and increased the number of servers used as cluster nodes by 25 percent this was not unexpected. Our teams will continue to re-distribute storage load across the new resources to further reduce I/O related latency. (mt) Engineers have come up with new ways to measure Grid performance and will be adding them to our monitoring systems over the next few days, increasing the likelihood that we will detect symptoms before they affect customers. We have also changed our growth projection formulas so they will better predict when we need to add hardware to the clusters, avoiding issues like this in the future. In the next few days we will be scheduling a maintenance window for the AccountCenter so we can eliminate the main cause of slow page loads in the customer interface. We are leaving this incident open for the next 24 hours while we continue to work on improving performance and monitoring for Cluster.2

 

Additional tuning
Tuesday, October 2nd, 2007 at 4:40 pm
After mitigating most of today’s latency issues our engineering teams are continuing to work on tuning Cluster.2 Areas where we’ve made changes include major hardware additions, firewall rules, load balancers, networking, service configuration, storage tweaks and AccountCenter speed enhancements. The symptoms primarily manifested as latency issues in web page load times, SMTP and FTP. Unsatisfactory performance was also reported in MySQL enabled applications and AccountCenter management features. We consider the service level of Cluster.2 over the last two days to be unacceptable and are doing our utmost to correct the situation.

Performance Improvements
Tuesday, October 2nd, 2007 at 1:46 pm
After making several more tweaks to Grid Cluster.2 including firewall configuration changes and filesystem tuning performance has improved dramatically. (mt) Engineers have seen vast improvement in basic PHP page load times compared this morning. All other services including MySQL, SMTP and FTP should see corresponding latency decreases. All nodes have had RAM upgrades and are performing well, even so the load across Cluster.2 is still higher than we’d like. Our teams are still hard at work on this issue, we’ll keep updating our customers with our progress. Thank you for you patience.

More new hardware
Tuesday, October 2nd, 2007 at 11:55 am
In order to combat this lingering issue (mt) system engineers have doubled the amount of available RAM in every Grid Cluster.2 node. Combined with the 25% increase in total nodes we are seeing major performance gains for the cluster and latency times are plunging. We are still working furiously on this issue and will update this thread as soon as we have news.

Progress made, some latency returns
Tuesday, October 2nd, 2007 at 7:50 am
As of 6:30AM PDT we detected latency increasing across some nodes of Grid Cluster.2, services like FTP, SMTP and web pages (HTTP) are affected. We have engaged several teams of engineers, data center personnel and third party vendors to bring a resolution to these issues as soon as possible. Again, we thank you for your patience in this matter.

Latency times back to normal
Monday, October 1st, 2007 at 5:47 pm
(mt) Engineers made several changes throughout the day to improve performance of Grid Cluster.2 These changes include the addition of more available nodes, reconfiguring services and various networking tweaks. Our team is closely monitoring Cluster.2 and AccountCenter performance to ensure that the latency issues do not recur.

Continued work
Monday, October 1st, 2007 at 2:27 pm
We are still receiving reports of sporadic latency across Grid Cluster.2 Our engineers are currently working to eliminate any remaining issues that may be causing slow response times. We have also implemented several fixes to the AccountCenter that will help eliminate slow page loads. We will update this thread as soon as we have more information. Thank you for your patience in this matter.

Nodes added, services coming back online
Monday, October 1st, 2007 at 12:32 pm
(mt) Engineers have determined that the latency issue affecting Cluster.2 was due to unexpected growth causing a general lack of computational resources. To resolve the issue (mt) data center personnel have added more machines to Cluster.2 increasing the number of available nodes by 25 percent. All services including FTP, Email and HTTP are coming back online. Recent demands for computational resources have jumped unexpectedly which caused degraded performance of Cluster.2 Our engineers are re-evaluating the projected growth formula used to determine when Grid resources need to be added. We apologize for any inconvenience this may have caused.

Web and email latency on Grid-Service Cluster 2
Monday, October 1st, 2007 at 9:48 am
Some customers on Grid-Service Cluster.2 may be experiencing latency to web and email. There may also be some latency for all customers accessing the Account Center. (mt) Media Temple’s Systems Engineers are currently investigating and working to resolve the issue as quickly as possible. Thank you for your patience and understanding.


  1. Rob,

    Cheers, and many apologies for any inconvenience. I would elaborate on the issues, but we’ve pretty much documented them as openly as possible on our site.

    More or less this was a complete anomoly and clearly this level of performance is not up to our standards. Fell free to contact me with your account info and any other questions you may have.

    Best,
    Jason Mcvearry
    jason@mediatemple.net

  2. On another note..several clients on the Grid have frequently been dugg and linked on Reddit at the same time and their sites did not go down (or hiccup). The Grid works, but as with any new technology, upgrades and unusual usage patterns can create issues. Thankfully we’ve got a staff of engineers unafraid to miss a few days of sleep.

    Cheers again,
    Jason

  3. michaelt

    You should use a VPS, just like dedicated but cheaper and more flexible. Try Slicehost.

  4. hmm, yes, I called them and they told me they will solve it in one hours. then one day passed..

  5. Don

    The “maintenance” was supposed to be 1.5 hours, then they changed it to 6 and now “they don’t know” I am seriously pissed at these guys. I wasted over a hundred dollars in PPC clicks going to a site that isn’t there.

    This massive outage is unacceptable I am definetly moving my servers elsewhere!

  6. ATP

    This was really a deplorable performance by mediatemple – they already have a really bad customer service and this brings into question their technical competence too:
    http://pakistaniat.com/2007/12/01/atps-disappearance-no-we-were-not-blocked-or-hacked-not-yet/

  7. I suppose this is a no-brainer, obvious point, but, if you are running a web site you care about, you should have a way to monitor it remotely and have that remote site send you email when there is a problem.

    E.g. send email if site is down, or if page load for a set page takes longer than say 2 seconds.

    What is great is that as you add more tests and variables to check (I use argus.tcp4me.com as the code that monitors various sites) you can quickly use it as a “dashboard” to figure out what might be the issue.

  8. Issues with Media Temple’s grid server are still very much ongoing:
    http://www.jimgoings.com/2008/04/media-temple-kills-my-inner-child/

  9. I’ve been with Media Temple on the grid service for about 8 months and it’s been fairly unstable the entire time. My site went down 3 different times on Saturday for example. I run an external monitor to ensure availability and unfortunately, I’m only seeing about 98% uptime right now.

    The worst part is that for about 10% of the time, the site loads very slowly. I wrote more with some details on my blog:
    http://www.jimgoings.com/2008/04/media-temple-kills-my-inner-child/

  10. futunet

    Well here we are in May 2009, and the Grid server at MediaTemple has been down for hours tonight. Thousands of sites are down.

    Congratulations on your Epic FAIL (mt)

    How many customers can you lose in one day? You’re about to find out.

  11. my website which is hosted at media temple is now down for the second day!!!!!!

    this is the worst down time i have ever seen since i ever bought a computer or heard about the internet!!!

  1. 1 Mediatemple Outages Continue : The Last Podcast

    [...] Looks like I am not the only one pondering a switch away from MediaTemple. Technorati tags: mediatemple — Related Posts [...]

  2. 2 Coming Soon: Media Temple Cluster Server (CS) : Daily Hypertext

    [...] with that innovation came some hiccups. Intermittently, over the past year, the grid simply hasn’t been able to handle the loads placed upon it by all the new users. The major problems have been latency with database calls and page load times [...]

  3. 3 My picks: The best service providers for startups « RobWebb2k

    [...] Hosting Service: M5 Hosting Previous mention here. These guys were referred to us by a friend and they have done a great job so far. Stay the hell [...]




Leave a Comment




  • Misc.

  • RSS Tweets

  • RSS KnowledgeBid

    • Owners/Operators: Fast Food/Quick Service Restaurants : Fast Food
      We need to interview fast food franchise owners. Ideally, we would like to talk to people who have 2+ years experience owning a franchise.
    • Project Managers or Sales People or Business Development People or Engineers: Wind Turbine Technology
      We are looking for experts who can talk about carbon fiber used in wind turbines or experts who make electrical components used in wind turbines (like from Zoltek and American Superconductor).
    • Expert: Digital Media (Video) Content Management
      We are looking for experts who can help us understand how studios are thinking of windowing and managing content licenses for any new digital media distribution models. We would like to get an idea of what new revenue models might look like as well as how these new versions might impact existing models.
    • Expert: Digital Media (Video) Device Technology
      We are interested in identifying experts who could help us to understand what standards people are currently focused on in the Digital media industry. An understanding of who the players are likely to be in creating the next mass market distribution model and where content is likely to be stored is a must.
    • Online Poker [Regulation] Expert: Gaming
      We are interested in speaking with someone who has an expertise of United States online gaming regulations. Ideal candidate would be familiar with UIGEA and recent proposed legislation in CA & NV regarding online gaming.
    • E-mail Security Infrastructure Purchaser at Fortune 1000/Global 2000 : E-Mail Security
      I am hoping to speak to somebody at a Fortune 1000/Global 2000 who evaluated and purchased e-mail security infrastructure (e.g. IronPost, Proofpoint, Symantec/Vontu, Sendmail, Postini) within the past 12 months.
    • Expert: Hepatitis C
      We are currently trying to identify experts familiar with Hepatitis C. Ideal experts are those who have previously been C-Level executives within the industry. Experts should have worked in the healthcare industry for a minimum of 15 years and have left the industry no later than 12 months ago. Ideal experts would be able to give a 15 minute presentation […]
    • Industry Expert: Airlines
      We are currently trying to identify experts in the Airline industry. Ideal experts are those who have previously been C-Level executives within the industry. Experts should have worked in the industry for a minimum of 15 years and have left the industry no later than 12 months ago. Ideal experts would be able to give a 15 minute presentation on current to […]
    • Specialty Apparel: Fashion
      We are currently trying to identify experts in the fashion industry. Ideal experts are those who have previously been C-Level executives within the industry. Experts should have worked in the industry for a minimum of 15 years and have left the industry no later than 12 months ago. Ideal experts would be able to give a 15 minute presentation on current to […]
    • General Expert: Specialty Chemicals
      We are trying to identify someone with extensive knowledge of the specialty chemicals industry. Ideal experts will have knowledge of companies that primarily produce high value-added chemicals used in the manufacture of a wide variety of products, including but not limited to fine chemicals, additives, advanced polymers, adhesives, sealants and specialty pa […]