IT organizations receive scores of reported problems about system responsiveness—the PC is slow, frozen, or has crashed. We spend significant time trying to understand the causes, but with the myriad symptoms that these issues create, as well as other conflicting priorities, rather than spending days identifying the problem for a single PC, we often resort to rebuilding it. This solution is only marginally effective as every PC and configuration is unique. And if the problem exists within the settings, applications, or configuration, the rebuilt PC may exhibit the same problem.
Understanding the root cause of a single performance issue can help IT organizations solve and prevent thousands of other issues experienced by users. For example, the same cause of system slowness may present itself as blurry text on one PC, yet may exist as an entirely different display problem on another. Manual root-cause analysis can send analysts down a variety of paths while the user waits. And identifying the root cause across the organization’s entire fleet can mean analyzing massive amounts of data, which can be cost prohibitive.
Targeted Data Collection
At Intel IT, we developed an automated solution to help identify system responsiveness issues using a subset of the organization’s fleet to reduce the volume of data analyzed and to better identify trends. We developed a sample environment that collects data from devices participating in pilot projects, small groups of users who have reported responsiveness issues, and profiles from the broader environment with statistically relevant configurations.
This dynamic collection mechanism is populated both manually and automatically to flag and collect data, which establishes a baseline by device. We then aggregate the data and eliminate anything that is unrelated to responsiveness. The remaining data is parsed by time and device, and summarized for trends that tell us when the system was slow or unresponsive and what was running at the time.
Within one month of launching our automated solution, we identified two different problems that each caused one system crash per PC per month across the fleet. We already knew about the crashes because users were reporting them, but we did not know the exact source until we developed the automated solution.
The automated solution also helps us identify the cause of common faults and errors in the environment because they are often associated with responsiveness issues. When we identify and resolve one issue by joining it with performance data, other, often more obscure, symptoms are resolved as well.
This process has allowed us to identify potential performance issues with new software deployments, as well as identify existing problems in the environment and correlate the statistical impact across the fleet. In one pilot test, we recently discovered that the solution we were testing became the top memory-consuming application, reducing the available memory on each PC by 20 percent. Through our process, we were also able to provide detailed data to the solution provider for remediation.
While the solution is specific to our PC fleet, we have shared it with other Intel organizations as the scalable infrastructure enables ratification suitable for multiple uses. We have found the solution to be extremely valuable, even as we are still evolving and improving our it.
Read the IT@Intel White Paper, “Boost PC Health and Performance with Sustained, Automated Processes.”