At Seagate, we think our modeling and analytics efforts should help our customers to become more successful, competitive and profitable.
Computer models use mathematics to represent essential aspects of a simplified version of reality (product, process, phenomenon, object, element, system, etc.). These models typically offer convenience and cost advantages over experimental means of obtaining the required information. However, the act of “simplifying reality” typically leads to some loss of information, not so perfect correlations to experiment, inadequate accuracy of predictions, and – in the worst cases – to a wrong answer completely. This necessitates the use of experiments to validate computer models.
Experiments, in turn, have their limitations as well. They are time consuming, costly, and need to be repeated every time changes are made. And, when one is experimenting with a storage rack, or cluster, or even an entire data center, we are talking about tens of thousands to hundreds of thousands of dollars. And this is where the main value of our work is.
We – at the cloud modeling and data analytics group – believe in theory which is underpinned by good experiment.
The benefits of computer modeling are clear and widely recognized. Some products and technologies are simply impossible without it. For example, accurate weather predictions, which we all depend on, are extremely dependent on (super) computer simulations. Air-traffic around the world is under computer control all the time and will be financially unsustainable without mathematical optimization. Children’s toys, our cars, the medicine we take, and even our behavior on social networks is subjected to computer modeling. Even in the world of computer data storage (disk drives), development and testing of, for example, slider air-bearings would be a much longer, less efficient, and less accurate process if not for the advanced computer models.
So, computer modeling is cheaper, faster, safer, flexible, and allows us to address numerous scenarios sometimes even beyond what is experimentally possible (the Big Bang Theory comes to mind here – and we don’t mean the TV show). Therefore, one would expect that such an important and fast-growing area as Cloud Storage (and Cloud Computing) should rely heavily on modeling. However, we – at Seagate – have noticed that this is not necessarily the case. After interviewing many IT and Cloud architects, we came to the conclusion that there is a significant gap in modeling of many storage-related components of a Cloud Data Center. And many things are still done using tribal knowledge, rules of thumb, and back-of-the-envelope calculations.
Our Cloud Modeling and Data Analytics (CMDA) organization has been created to address this gap. Our main goals are to become the thought leaders in the field of Cloud Modeling and to create the tools that will simplify and accelerate data center design, prototyping and (virtual) testing, help DC and IT architects and professionals with answering daily tradeoff questions, and help Seagate to optimize storage products for the modern data centers.
The Cloud Modeling and Data Analytics (CMDA) organization – as of now – has three major focus areas (and corresponding groups):
Cloud health analytics
Big Data Analytics
The Cloud Modeling team focuses on providing tools for rapid prototyping, modeling, and optimization of cloud data centers, with emphasis on data storage solutions. Our efforts are directed at optimizing metrics in four major areas across all scales: total cost of ownership (TCO), performance, power and reliability/availability. In addition, efforts to calibrate our models with real-world benchmark data and to establish representative benchmark standards are under way within the group. Data center architects and managers should be able to use our toolkit to rapidly make design and business decisions based on state-of-the-art models and thus ensure optimal design and return on investment.
The Cloud Health Analytics team specializes in drive-and-up monitoring and collection of data center vital metrics and symptoms with the ultimate goal to diagnose, prevent or reduce causes for Cloud data center downtime and reduce the total cost of data center ownership. Our solutions are based on a highly-scalable data collection system, which is combined with an automated analytics engine utilizing state-of-the-art Big Data techniques and machine learning. With this technology at hand, we are able to remotely monitor and report the health status of a large data center and take actions towards higher data reliability and lower TCO.
The Big Data Analytics team focuses on the all-important area of complex analytics ranging from the traditional data analysis of large datasets to machine learning. They are dealing with some very large, frequently noisy, and often – unstructured data sets in order to extract the underlying nature’s mechanism. Typically, this are problems with high dimensionality (lots of independent variables) that could not be addressed by such traditional techniques as linear regression or residual analysis. This is where machine learning techniques – such as support vector machines (SVM) or random forests – come in handy. The modern Cloud Data Center consists of thousands of components, involves dozens of variables impacting its efficiency, cost, performance, and reliability, and requires new advanced analytics techniques.
All of the above mentioned modeling and analytics efforts should help our customers to become more successful, competitive, profitable, etc.
Let us always remember that in order for us to be really successful, good models should be matched with good experiments. Quoting Richard Feynman: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.”
Let us finish this blog entry with a quick joke from Einstein: “theory is something nobody believes, except the person who made it. An experiment is something everybody believes, except the person who made it.”