Privacy Preserving Data Analysis and Publishing in Education
(Funded by TUBITAK, special call on FATIH project, 2014-2016)

There will be an intensive data collection effort in the scope of the FAT›H project regarding students as well as the instructors. Such data is a valuable source for course, and class management as well as for researchers and educators. Development of new education models, enhancement of existing education techniques, and identification of problems in education will be possible by enabling researchers to utilize these education data sources. There are two means for reaching these data sources: (1) Data sharing (Data holder shares the student and instructor data and researchers perform analysis and modeling on the data) (2) Sharing of data analysis results (Data holder performs analysis or modeling and shares the results with the other researchers and public). In both cases, the personal data or analysis results could be used for the benefit of the society, but also for undesired purposes. Our aim with this project is to identify the privacy risks that will result from the collection, sharing and analysis of the data collected about students, parents, instructors, and school managers, and to develop techniques to overcome those risks. Our project proposal is targeting the ďData AnonymizationĒ and ďPrivacy Preserving Data AnalyticsĒ subjects of the BT0102 FATIH Project Security and Privacy call.

Assoc. Prof. YŁcel Saygin from Sabanci University will be the principle investigator of the project working with Asst. Prof. Ali ›nan from Isik University, and Asst. Prof. Ercan Nergiz from Zirve University.


(STREP, EU-FP7-ICT, UbiPOL(Ubiquitous Participation Platform for Policy Making). Starting date:01 January 2010.)

UbiPOL aims to develop a ubiquitous platform that allows citizens be involved in policy making processes (PMPs) regardless their current locations and time. It is suggested that the more citizens find connections between their as-usual life activities and relevant policies, the more they become pro-active or motivated to be involved in the PMPs. For this reason, UbiPOL aims to provide context aware knowledge provision with regard to policy making. That is citizens using UbiPOL will be able to identify any relevant policies and other citizenís opinion whenever they want wherever they are according to their as-usual life pattern. With the platform, citizens are expected to be more widely aware of any relevant policies and PMPs for involvement during their as-usual life therefore improved engagement and empowerment. Also, the platform will provide policy tracking functionality via a workflow engine and opinion tag concept to improve the transparency of the policy making processes. Finally, the platform enable policy makers to collect citizen opinions more efficiently as the opinions are collected as soon as they are created in the middle of citizenís usual life. UbiPOL is provides security and identity management facility to ensure only authorised citizens can have access to relevant policies according to their roles in policy making processes. The delivery of the opinion and policy data over the wireless network is secure as the platform use leading edge encryption algorithm in its communication kernels. UbiPOL is a scalable platform ensuring at least 100,000 citizens can use the system at the same time (for example, for e-Voting applications) via its well proven automatic load balancing mechanisms. The privacy ensuring opinion mining engine prevents unwanted revealing of citizen identities and the mining engine prevents any unrelated commercial advertisements are included in the opinion base to minimise misuse of the system.

(CA, Funded by EU-FP7-ICT FET OPEN, 2009-2012)
(Sabanci University is the coordinatorof this project.)

With GPS enabled devices and other positioning systems, mobility behavior of individuals is captured for online or historical data analysis. For example, car insurance companies have started to issue policies with respect to the driving behavior which is captured through a GPS device installed under a special agreement. Such applications are enabled by mobility data mining which aims to extract knowledge from mobility data with a lot of opportunities as well as risks. The risks arise from the fact that mobility data is mostly about people, where they have been, at what times, how often, and with whom. Therefore, privacy is a major concern for mobility data which needs to be addressed before the opportunities of mobility data mining can be fully harvested. A recently completed EU project, GeoPKDD (Geographic Privacy-aware Knowledge Discovery and Delivery, was the pioneer in this field. MODAP project, which started as of September 2009 with nearly one million euro funding for three years, aims to continue the efforts of GeoPKDD by coordinating and boosting the research activities in the intersection of mobility, data mining, and privacy. MODAP is a timely project since privacy risks associated with the mobility behavior of people are still unclear, and it is not possible for mobility data mining technology to thrive without sound privacy measures and standards for data collection, and data/knowledge publishing. For that reason, MODAP aims to create a platform for technical as well as non-technical people who are interested in mobility data mining together with privacy issues. The site will be the main platform for all types of community activities and will be functional as of October 15,2009.

Anonymization of Spatio-temporal Data Sets
(TUBITAK Career Grant, 2007-2010)

Service providers can now collect the location information of mobile users and construct their trajectories. Trajectory of an object in general is the set of spatio-temporal points for that object sampled at a certain interval of time. Using such trajectory information, we can construct the behavioral patterns of people or moving object in general. These patterns can be used for the benefit of the society such as traffic management but they can also be used in a way that violates the privacy of individuals. For example our data can be handed over to third parties for commercial purposes leading the spam messages when we least expect them. Privacy issues are one of the challenges that mobile services are facing. Data confidentiality and access control have been studied for some time but privacy preserving data management techniques are drawing the attention of researchers for the past 5 years. The first step towards privacy is to strip-off the identity information from the released data. However, it was shown that even when identity information is removed, we can still link the confidential data to individuals via a collection of attributed called quasi-identifiers. Optimal anonymization of data sets while minimizing the data loss was shown to be an NP-Hard problem. Considering that the data sources may be in gigabytes the problem becomes unmanageable. When we consider the spatio-temporal data, things get even more complicated in terms of privacy and computation. This is due to the fact that we can infer the work and home addresses of individuals from trajectory information and link that information via yellow pages to reach the identities people following those trajectories. With this project, our aim is to develop methods for spatio-temporal data anonymization in centralized and distributed environments.

(STREP, Funded by EU-IST FET OPEN, 2005-2009)

A flood of data pertinent to moving objects is available today, and will be more in the near future, particularly due to the automated collection of privacy-sensitive telecom data from mobile phones and other location-aware devices. Such wealth of data, referenced both in space and time, may enable novel classes of applications of high societal and economic impact, provided that the discovery of consumable and concise knowledge out of these raw data is made possible. The goal of the GeoPKDD project is to develop theory, techniques and systems for geographic knowledge discovery and delivery, based on new privacy-preserving methods for extracting knowledge from large amounts of raw data referenced in space and time. More precisely, we aim at devising knowledge discovery and analysis methods for trajectories of moving objects; such methods will be designed to preserve the privacy of the source sensitive data.

The fundamental hypothesis is that it is possible, in principle, to aid citizens in their mobile activities by analyzing the traces of their past activities by means of data mining techniques. For instance, behavioral patterns derived from mobile trajectories may allow inducing traffic flow information, capable to help people travel efficiently, to help public administrations in traffic-related decision making for sustainable mobility and security management, as well as to help mobile operators in optimizing bandwidth and power allocation on the network. However, it is clear that the use of personal sensitive data arouses concerns about citizenís privacy rights. Obtaining the potential benefits by means of a trustable technology, designed to prevent infringing privacy rights, is a highly innovative goal; if fulfilled, it would enable a wider social acceptance of many new services of public utility that would find in the advocated form of geographic knowledge a key driver, such as in transport, environment and risk management.

(CA, Funded by EU- IST FET OPEN, 2005-2008)

This coordination action will bring together newly emerging research in ubiquitous knowledge discovery. Research areas are:
  • data mining in mobile systems, wireless communication networks, calm technologies
  • distributed architectures: distributed data mining, grid, P2P, autonomic computing, agents
  • learning components: statistical learning (incl. online learning), evolutionary computing, anytime learning
  • data types: spatio-temporal, stream, multimedia
  • security & privacy: privacy preserving data mining, intrusion detection
  • HCI & cognitive modelling: user interfaces of ubiquitous discovery systems
This multi-disciplinary approach constitutes a paradigm shift for the field of knowledge discovery since the idea of a standalone (desktop or workstation) analysis tool is abandoned in favour of process integrated, distributed and autonomous analysis systems. Work done in this area merely scratches the surface, is dispersed among several communities, and in a very early stage.

Integration of the various sub-areas involves considerable risk. The CA KDubiq will act to close the gap and strengthen long-term research and applications in a new and future-oriented discipline ubiquitous knowledge discovery. It faces many new challenges, e.g. because of technical limitations in memory, CPU power, bandwidth etc, and can only succeed if privacy and security are addressed in a principled and multi-disciplinary manner.

Web Users Clustering for introducing Personalization in Commercial Web Sites
(Funded by TUBITAK and GSRT, 2006-2008)

The e-commerce applications over the World Wide Web (WWW) have gained tremendous popularity and at the same time they have recovered problems which are due to the lack of a unique structure (the Web is characterized by semi-structured and structured data) and the exponentially increasing volume of transactions (Web users are often facing long delays and poor quality of service). To resolve such problems, this project proposes the adoption of effective Web users clustering techniques in order to facilitate Web personalization in commercially-oriented Web sites. The project will highlight the need to include flexible and scalable Web data clustering schemes on personalization systems for commercial Web sites. The proposed topic is quite challenging due to the high heterogeneity of the Web data and the lack of effective clustering schemes on personalization systems. The proposed research collaboration will focus on developing and evaluating Web data clustering approaches in the context of the personalization systems.

Access Control Models For Privacy Preserving Data Mining
(Funded by TUBITAK and Egide, 2004-2006)

Data mining attracted many researchers from universities, and research labs especially during the past 10 years with the increased capacity in data collection. Data mining field has its roots in machine learning, artificial intelligence, statistics, and databases. The aim of data mining is analyzing large collections of data and making this data useful for the data collectors. The main data sources today are: WEB (especially web services), and internet traffic in general which has multimedia content as well. Data collection efforts from different data sources gained a speed up in the past 2 years with the aim of tracking people with possible malicious interests. However with the powerful data mining tools and the ability of integrating distributed data sources regarding the same topic also raised fears in the public about privacy. Privacy issues were studied in the context of statistical databases starting from 1980ís. The aim than was to secure confidential data attributes which could be accessed via powerful query tools running over the databases. Data security in general was always a core topic in the database community with the aim of developing flexible access control policies for various databases including multimedia and WEB databases. The issue of privacy is the general discussion now in data mining community frequently discussed in panels and workshops. The issue is provide policies for privacy and to develop methods for privacy preserving data mining.. In this project we plan to investigate access control methods specifically tailored for data mining tools running on data warehouses as a means to preserve the privacy of people. We will target web services as the target application domain.

(Funded by EU-IST FET OPEN, FINISHED in 2003 )

In a dynamic, unstable and ever changing business environment like that where enterprises conduct e-businesses, the old-fashioned disclosure control and database inference protection techniques are inadequate to ensure complete data privacy. In a recent news article, fears were expressed for the online security of private information because a pharmaceutical company said that it had inadvertently released over the internet the e-mail addresses of more than 0$ of its customers who were on some special type of medication. Although this is an extreme example of direct disclosure, it signifies the multiple risks that companies may run into, if they do not consider seriously the risks of not securing the sensitive information that they manipulate. For this reason, organisations should be able to evaluate the risk of disclosing information and proceed in adopting new more efficient approaches for information disclosure control, in order to maintain their competitive edge in the market. The work on securing the data against intruders attacking the implicit sensitive information in the data has just started and is yet to cover the broad spectrum of data mining techniques. In order to make a publicly available system secure, we must ensure not only that private sensitive data have been trimmed out, but also to make sure that certain inference channels have been blocked as well. In other words it is not only the data but the hidden knowledge in this data, that should be made secure. Moreover, the need for making our system as open as possible - to the degree that data sensitivity is not jeopardised - asks for various techniques that account for the disclosure control of sensitive data. We aim at investigating all aspects of data (dimensionality, distribution) and data mining methods as a threat to data security. We plan to extend the initial work on data mining against data security to the wide spectrum of data mining methodologies and novel information types.

Click here for CODMINE website.