Data-Mining Research May Enhance Flight Safety
UT Dallas and Illinois Team Analyzing Decades of Aviation Information
June 9, 2008
Computer scientists at UT Dallas are developing technology that will sift through mountains of aviation data in search of ways to further enhance flight safety.
Part of a new three-year, $1 million NASA-funded project conducted in collaboration with University of Illinois at Urbana-Champaign (UIUC) researchers, the work focuses on more than three decades of what are called “anomalous aviation events,” or incidents that deviated from normal flight operations.
Using data-mining techniques that are increasingly popular in searching for kernels of relevant information within enormous amounts of data — crime statistics or genomics data, for instance — researchers hope to identify subtle patterns of aviation events that could predict catastrophe.
“It’s essential to automatically mine large amounts of aviation safety reports in order to understand anomalous events and improve safety,” said Dr. Latifur Khan, an associate professor of computer science and principal investigator for the UT Dallas portion of the research.
“With the rapid growth of the aviation industry worldwide and the increasing power and complexity of aircraft, spacecraft and other aviation systems, data related to aviation safety is growing rapidly, making large-scale manual exploration unrealistic,” he said. “It’s crucial to develop data-mining systems to analyze these aviation reports to help prevent future incidents.”
That’s not so easily done, though, due to the variety of such anomalous events — airspace violations, in-flight encounters with birds, miscommunication between pilots and flight controllers, to name a few — as well as the complexity of the aviation systems involved and the variety of people throughout the aviation industry who report the various incidents. The software required to accomplish the task must be sophisticated enough to discover patterns, correlations and trends within the jumble of information.
The primary source for that information is the industry’s Aviation Safety Reporting System (ASRS) database, the world's largest repository of safety information provided by pilots, controllers, mechanics, flight attendants, dispatchers and others. The database includes almost 150,000 incident reports submitted over more than 30 years.
“The ASRS database is a rich source of information about why events or trends occurred in the opinions of those reporting them,” Dr. Khan said.
UT Dallas researchers will be addressing two primary questions involving this aviation data: What anomalies are associated with a given aviation event, and why did the event occur? These issues are complicated, though, by the “noise” inherent in the data, which stems from typos and grammatical mistakes as well as the use of abbreviations and jargon not widely used outside the aviation industry.
“Non-aviation experts will find it difficult to understand these reports,” said Dr. Vincent Ng, an assistant professor of computer science at UT Dallas and a co-principal investigator for the project. “Traditional text-mining techniques are unlikely to work well on this data because those techniques were developed primarily for use with clean text. So our biggest research challenge lies in developing techniques that can handle highly noisy data.”
Dr. Khan’s team also includes Dr. Bhavani Thuraisingham, a professor of computer science and director of the UT Dallas Cyber Security Research Center, as well as two Ph.D. students. The UIUC and UT Dallas team is also collaborating with Dr. Anne Kao, a technical fellow at the Boeing Co.
“At UTD we are focusing on developing novel techniques in knowledge discovery and data mining to solve a variety of practical problems in areas such as cyber security, surveillance and data quality,” Dr. Thuraisingham said. “We’re working closely with government, industry, standards groups and others in academia to solve difficult and challenging problems.”