Privacy Preserving Data Analysis and Publishing in Education
(Funded by TUBITAK, special call on FATIH project, 2014-2016)
There will be an intensive data collection effort in the scope of the FAT›H project regarding students as well as the instructors. Such data is a valuable source for course, and class management as well as for researchers and educators. Development of new education models, enhancement of existing education techniques, and identification of problems in education will be possible by enabling researchers to utilize these education data sources. There are two means for reaching these data sources: (1) Data sharing (Data holder shares the student and instructor data and researchers perform analysis and modeling on the data) (2) Sharing of data analysis results (Data holder performs analysis or modeling and shares the results with the other researchers and public). In both cases, the personal data or analysis results could be used for the benefit of the society, but also for undesired purposes. Our aim with this project is to identify the privacy risks that will result from the collection, sharing and analysis of the data collected about students, parents, instructors, and school managers, and to develop techniques to overcome those risks. Our project proposal is targeting the ďData AnonymizationĒ and ďPrivacy Preserving Data AnalyticsĒ subjects of the BT0102 FATIH Project Security and Privacy call.
Assoc. Prof. YŁcel Saygin from Sabanci University will be the principle investigator of the project working with Asst. Prof. Ali ›nan from Isik University, and Asst. Prof. Ercan Nergiz from Zirve University.
(STREP, EU-FP7-ICT, UbiPOL(Ubiquitous Participation Platform for Policy Making). Starting date:01 January 2010.)
UbiPOL aims to develop a ubiquitous platform that allows citizens be involved in policy making
processes (PMPs) regardless their current locations and time. It is suggested that the more
citizens find connections between their as-usual life activities and relevant policies, the more they
become pro-active or motivated to be involved in the PMPs. For this reason, UbiPOL aims to
provide context aware knowledge provision with regard to policy making. That is citizens using
UbiPOL will be able to identify any relevant policies and other citizenís opinion whenever they
want wherever they are according to their as-usual life pattern. With the platform, citizens are
expected to be more widely aware of any relevant policies and PMPs for involvement during
their as-usual life therefore improved engagement and empowerment. Also, the platform will
provide policy tracking functionality via a workflow engine and opinion tag concept to improve
the transparency of the policy making processes. Finally, the platform enable policy makers to
collect citizen opinions more efficiently as the opinions are collected as soon as they are created
in the middle of citizenís usual life. UbiPOL is provides security and identity management
facility to ensure only authorised citizens can have access to relevant policies according to their
roles in policy making processes. The delivery of the opinion and policy data over the wireless
network is secure as the platform use leading edge encryption algorithm in its communication
kernels. UbiPOL is a scalable platform ensuring at least 100,000 citizens can use the system at
the same time (for example, for e-Voting applications) via its well proven automatic load
balancing mechanisms. The privacy ensuring opinion mining engine prevents unwanted
revealing of citizen identities and the mining engine prevents any unrelated commercial
advertisements are included in the opinion base to minimise misuse of the system.
(CA, Funded by EU-FP7-ICT FET OPEN, 2009-2012)
(Sabanci University is the coordinatorof this project.)
With GPS enabled devices and other positioning systems, mobility behavior of individuals
is captured for online or historical data analysis. For example, car insurance companies
have started to issue policies with respect to the driving behavior which is captured through
a GPS device installed under a special agreement. Such applications are enabled by mobility data mining
which aims to extract knowledge from mobility data with a lot of opportunities as well as risks.
The risks arise from the fact that mobility data is mostly about people, where they have been, at what times,
how often, and with whom. Therefore, privacy is a major concern for mobility data which needs
to be addressed before the opportunities of mobility data mining can be fully harvested.
A recently completed EU project, GeoPKDD (Geographic Privacy-aware Knowledge Discovery and Delivery,
www.geopkdd.eu) was the pioneer in this field. MODAP project, which started as of September
2009 with nearly one million euro funding for three years, aims to continue the efforts of
GeoPKDD by coordinating and boosting the research activities in the intersection of mobility,
data mining, and privacy. MODAP is a timely project since privacy risks associated with the
mobility behavior of people are still unclear, and it is not possible for mobility data mining
technology to thrive without sound privacy measures and standards for data collection, and
data/knowledge publishing. For that reason, MODAP aims to create a platform for technical
as well as non-technical people who are interested in mobility data mining together with privacy issues.
The site www.modap.org will be the main platform for all types of community activities and will be functional
as of October 15,2009.
Anonymization of Spatio-temporal Data Sets
(TUBITAK Career Grant, 2007-2010)
Service providers can now collect the location information of mobile users and
construct their trajectories. Trajectory of an object in general is the set of spatio-temporal points
for that object sampled at a certain interval of time. Using such trajectory information, we can construct
the behavioral patterns of people or moving object in general. These patterns can be used for the benefit
of the society such as traffic management but they can also be used in a way that violates the privacy of
individuals. For example our data can be handed over to third parties for commercial purposes leading
the spam messages when we least expect them. Privacy issues are one of the challenges that mobile services
Data confidentiality and access control have been studied for some time but privacy preserving data
management techniques are drawing the attention of researchers for the past 5 years. The first step towards
privacy is to strip-off the identity information from the
released data. However, it was shown that even when identity information is removed, we can still
link the confidential data to individuals via a collection of attributed called quasi-identifiers.
Optimal anonymization of
data sets while minimizing the data loss was shown to be an NP-Hard problem. Considering that
the data sources may be in gigabytes the problem becomes unmanageable. When we consider the spatio-temporal
data, things get even more complicated in terms of privacy and computation. This is due to the fact that
we can infer the work and home addresses of individuals from trajectory information and link that information
via yellow pages to reach the identities people following those trajectories. With this project, our aim is
to develop methods for spatio-temporal data anonymization in centralized and distributed environments.
(STREP, Funded by EU-IST FET OPEN, 2005-2009)
A flood of data pertinent to moving objects is available today, and will be more in the near
future, particularly due to the automated collection of privacy-sensitive telecom data from
mobile phones and other location-aware devices. Such wealth of data, referenced both in space
and time, may enable novel classes of applications of high societal and economic impact,
provided that the discovery of consumable and concise knowledge out of these raw data is
made possible. The goal of the GeoPKDD project is to develop theory, techniques and
systems for geographic knowledge discovery and delivery, based on new privacy-preserving
methods for extracting knowledge from large amounts of raw data referenced in space
and time. More precisely, we aim at devising knowledge discovery and analysis methods
for trajectories of moving objects; such methods will be designed to preserve the privacy
of the source sensitive data.
The fundamental hypothesis is that it is possible, in principle, to aid citizens in their
mobile activities by analyzing the traces of their past activities by means of data mining
techniques. For instance, behavioral patterns derived from mobile trajectories may allow
inducing traffic flow information, capable to help people travel efficiently, to help public
administrations in traffic-related decision making for sustainable mobility and security
management, as well as to help mobile operators in optimizing bandwidth and power allocation
on the network. However, it is clear that the use of personal sensitive data arouses
concerns about citizenís privacy rights. Obtaining the potential benefits by means of
a trustable technology, designed to prevent infringing privacy rights, is a highly
innovative goal; if fulfilled, it would enable a wider social acceptance of many new
services of public utility that would find in the advocated form of geographic knowledge
a key driver, such as in transport, environment and risk management.
(CA, Funded by EU- IST FET OPEN, 2005-2008)
This coordination action will bring together newly emerging research in ubiquitous knowledge discovery.
Research areas are:
This multi-disciplinary approach constitutes a paradigm shift for the field of knowledge discovery since
the idea of a standalone (desktop or workstation) analysis tool is abandoned in favour of process
integrated, distributed and autonomous analysis systems. Work done in this area merely scratches the
surface, is dispersed among several communities, and in a very early stage.
- data mining in mobile systems, wireless communication networks, calm technologies
- distributed architectures: distributed data mining, grid, P2P, autonomic computing, agents
- learning components: statistical learning (incl. online learning), evolutionary computing, anytime learning
- data types: spatio-temporal, stream, multimedia
- security & privacy: privacy preserving data mining, intrusion detection
- HCI & cognitive modelling: user interfaces of ubiquitous discovery systems
Integration of the various sub-areas involves considerable risk. The CA KDubiq will act to close
the gap and strengthen long-term research and applications in a new and future-oriented discipline
ubiquitous knowledge discovery. It faces many new challenges, e.g. because of technical
limitations in memory, CPU power, bandwidth etc, and can only succeed if privacy and security are
addressed in a principled and multi-disciplinary manner.
Web Users Clustering for introducing Personalization in Commercial Web Sites
(Funded by TUBITAK and GSRT, 2006-2008)
The e-commerce applications over the World Wide Web (WWW) have gained tremendous popularity and at
the same time they have recovered problems which are due to the lack of a unique structure (the Web
is characterized by semi-structured and structured data) and the exponentially increasing volume of
transactions (Web users are often facing long delays and poor quality of service). To resolve such
problems, this project proposes the adoption of effective Web users clustering techniques in order
to facilitate Web personalization in commercially-oriented Web sites.
The project will highlight the need to include flexible and scalable Web
data clustering schemes on personalization systems for commercial Web sites.
The proposed topic is quite challenging due to the high heterogeneity of the Web
data and the lack of effective clustering schemes on personalization systems.
The proposed research collaboration will focus on developing and evaluating Web data clustering approaches
in the context of the personalization systems.
Access Control Models For Privacy Preserving Data Mining
(Funded by TUBITAK and Egide, 2004-2006)
Data mining attracted many researchers from universities, and research labs especially
during the past 10 years with the increased capacity in data collection. Data mining field has
its roots in machine learning, artificial intelligence, statistics, and databases. The aim of
data mining is analyzing large collections of data and making this data useful for the data collectors.
The main data sources today are: WEB (especially web services), and internet traffic in general
which has multimedia content as well. Data collection efforts from different data sources gained
a speed up in the past 2 years with the aim of tracking people with possible malicious interests.
However with the powerful data mining tools and the ability of integrating distributed data sources
regarding the same topic also raised fears in the public about privacy. Privacy issues were
studied in the context of statistical databases starting from 1980ís. The aim than was to
secure confidential data attributes which could be accessed via powerful query tools running
over the databases. Data security in general was always a core topic in the database community
with the aim of developing flexible access control policies for various databases including
multimedia and WEB databases. The issue of privacy is the general discussion now in data mining
community frequently discussed in panels and workshops. The issue is provide policies for
privacy and to develop methods for privacy preserving data mining.. In this project we plan
to investigate access control methods specifically tailored for data mining tools running
on data warehouses as a means to preserve the privacy of people. We will target web services
as the target application domain.
(Funded by EU-IST FET OPEN, FINISHED in 2003 )
In a dynamic, unstable and ever changing business environment like that where enterprises conduct
e-businesses, the old-fashioned disclosure control and database inference protection techniques are
inadequate to ensure complete data privacy. In a recent news article, fears were expressed for the
online security of private information because a pharmaceutical company said that it had inadvertently
released over the internet the e-mail addresses of more than 0$ of its customers who were on some
special type of medication. Although this is an extreme example of direct disclosure, it signifies
the multiple risks that companies may run into, if they do not consider seriously the risks of not
securing the sensitive information that they manipulate. For this reason, organisations should be able
to evaluate the risk of disclosing information and proceed in adopting new more efficient approaches
for information disclosure control,
in order to maintain their competitive edge in the market.
The work on securing the data against intruders attacking the implicit sensitive information
in the data has just started and is yet to cover the broad spectrum of data mining techniques.
In order to make a publicly available system secure, we must ensure not only that private sensitive data
have been trimmed out, but also to make sure that certain inference channels have been blocked as well.
In other words it is not only the data but the hidden knowledge in this data, that should be made secure.
Moreover, the need for making our system as open as possible - to the degree that data sensitivity
is not jeopardised - asks for various techniques that account for the disclosure control of sensitive data.
We aim at investigating all aspects of data (dimensionality, distribution) and data mining methods
as a threat to data security. We plan to extend the initial work on data mining against data security
to the wide spectrum of data mining methodologies and novel information types.
Click here for CODMINE website.