San Marcos users can login to download this file.

Project

A Repository Database System to do Data Mining in Drug Discovery

The exponentially increasing amounts of data associated with drug discovery being generated each year make getting useful information from that data more and more critical. With a central repository to keep the massive amounts of data, organizations need tools that can help them extract the most useful information from the data. A data warehouse can bring together data in a single format, supplemented by metadata through the use of a set of input mechanisms known as extraction, transformation, and loading (ETL) tools. Extraction of the data can be either extracting existing data or the data that is imported to the database, transformation is when the data is translated to the format the database can understand. Transformation makes the new format of the data consistent with the other existing data. Finally, the formatted data can be loaded into files and the link address of the data is saved in tables in the database for further analysis. Analysis of the data includes simple query and reporting, statistical analysis, complex multidimensional analysis, and data mining. Large quantities of data are searched and analyzed to discover useful patterns or relationships, which are then used to predict behavior. The purpose of this project is to produce a repository database of drugs, drug features (properties), and drug targets where data can be mined and analyzed. Drug targets are different proteins that drugs try to bind to stop the activities of the protein. For example, -secretase is a protein that causes Alzheimer’s. There are certain drugs that can bind to -secretase to stop its functionality which in turn may stop Alzheimer’s disease. Users can utilize the database to mine useful data to predict the specific chemical properties that will have the relative efficacy of a specific target and the coefficient for each chemical property. This database can be equipped with different data mining approaches/algorithms such as linear, non-linear, and classification types of data modeling. The data models have enhanced with the Genetic Evolution (GE) algorithms [1, 2, through 17]. This paper discusses implementation with the linear data models such as Multiple Linear Regression (MLR) [18], Partial Least Square Regression (PLSR) [19], and Support Vector Machine (SVM) [20].

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.