Research in Data Mining Lab

New Research Project (2011 - Present)

SOMUT: Semi-automatic Outbreak Monitoring Using Twitter

It has been proven that the frequency of tweets containing Illness-related keyword from the Twitter received in time domain has a correlation of 98% with the released data from Centers for Disease Control and Prevention (CDC). The advantages of twitter based data lead us to detection of an outbreak 1-2 weeks in advance to the CDC. However, the collected data certainly has some sort of outliers, or false alarms, which if detected early, can increase the confidence of the user in the system. This project proposes a methodology, which based on summary of the tweets’ sentences and having an expert in medical domain, can differentiate between outbreaks and false alarms and ultimately will benefit surveillance system.

Funded Research

Knowledge Discovery in Large VE: A New Approach for Solving Wayfinding Problem

Funded by NSF (09/2003 – 09/2005)

PI: Mehmed Kantardzic

Project Summary

Users of large and highly complex virtual environments (VEs) often experience difficulty in maintaining their spatial orientation within the virtual world. VEs do not in general provide the same rich set of cues for distance, motion, and direction found in the physical environment. The research carried out in this project addresses the problem of dealing with spatial information in large scale virtual environment using Knowledge Discovery Techniques. A knowledge extracted and new patterns discovered from mining both virtual environment database and user navigation patterns, play an important role in understanding spatial information, and capturing intrinsic relationships between spatial and non spatial information, Reorganizing spatial information to accommodate data semantics and achieve high performance of user navigation in VE, is a part of a methodology for a new user-friendly VE interface.

Toward Autonomic Distributed Data Mining with Intelligent Web Services

Funded by KSEF (07/2003 – 06/2005)

PI: Mehmed Kantardzic

Project Summary

Currently there is no structured user-oriented framework that performs web based distributed data mining. Due to the complexity of existing data mining schemes successful data mining practices may not necessarily be repeated across the application domain, especially when data size increases or data and software tools may be available at more than one heterogeneous site. This additional dimension of distributed data mining significantly increases the complexity of each phase of a data mining process. Therefore, data mining requires a structured framework that will help the users in translating an application domain problem into a set of data mining tasks at a higher abstract level; with less effort, and without knowing all the technical details about distributed infrastructure. This project defines a new approach in building Web-service-based infrastructure for a distributed data mining applications.

Use Of Multiple AI Methodologies To Produce A Foundation For An Intelligent Medical System

Funded by NASA (09/2001 - 09/2002)

PI: Mehmed Kantardzic

Project Summary

A computer based system is being developed to aid medical diagnostic process in a remote location in the absence of immediate communication with a physician or in the presence a long communication delay. The system is originally intended for use by crew members of a spacecraft or explorers on a remote planet, but could also be adapted to use on Earth to guide emergency care in remote locations. The system is designed to make data-entry process intuitively understandable by a user trained in first aid. In the absence of immediate communication with a physician, the AI software would substitute for the physician in assisting the user to perform urgent steps in diagnosis and treatment. The net effect of the system would be to raise the level of care of the patient to approximately that provided by a paramedic.

Other Projects

Temporal Data Mining: Analysis of Multiple time series

Time Series analysis has become very popular topic in the recent past among the research in the realm of data mining. A time series is basically a well defined data set obtained through repeated measurements over a specific time period. Applications such as hourly measurement of humidity and air temperature, daily closing price of a company stock, monthly rainfall data and yearly sales are good examples of time series.
Understanding the underline structure of time series will help to develop a mathematical model that later can be used for control, prediction, etc. Time series analysis has several important applications. One application is preventing undesirable events by forecasting the event. Another application is forecasting undesirable, yet unavoidable, events to preemptively lessen their impact. And more interestingly for people those who want to earn good profit for their investments it is good to concern about predictions.
This research involves analyzing of a public domain data set using several techniques namely Support Vector Machines, Feed Forward Neural Network and Non linear Regression. Performances are compared and discussed. In addition a cost sensitive analysis is also performed.