Data Sciences

Dernière mise à jour : 
Dec 1, 2015
Master II
Scolarité : 
 € par an
The Data Sciences master is a track of the Mathematics and Applications master of Paris Saclay. It is operated by Ecole Polytechnique in collaboration with Telecom Paris Tech, ENSAE, ENS Cachan and Uni. of Paris Sud.

Experiments, observations, and numerical simulations in many areas of science and business are currently generating terabytes of data, and in some cases are on the verge of generating petabytes and beyond. Analyses of the information contained in these data sets have already led to major breakthroughs in fields ranging from genomics to astronomy and high energy physics and to the development of new information-based industries.

Traditional methods of analysis have been based largely on the assumption that analysts can work with data within the confines of their own computing environment, but the growth of “big data” is changing that paradigm, especially in cases in which massive amounts of data are distributed across locations.

Data mining of these massive data sets is transforming the way we think about crisis response, marketing, entertainment, cyber-security, and national intelligence. It is also transforming how we think about information storage and retrieval. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but also as potential sources of valuable information. Discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data.

Data Science and Bigdata are two key areas for positive interdisciplinary science involving mathematics, computer science. The context is the thorough presentation of state of the art mathematical and computational methods for the management and analysis of data of potentially very big scale.

A number of challenges in both data management and analysis require new approaches to support the big data era. These challenges span generation of the data, preparation for analysis, and policy-related challenges in its sharing and use, including the following:

  • Dealing with highly distributed data sources with parallel and distributed architectures
  • Tracking data provenance, from data generation through data preparation
  • Coping with sampling biases, different data formats and structures
  • Ensuring data integrity, security, sharing
  • Methods for massive data visualization
  • Learning from massive data and enabling predictions
  • Developing scalable and incremental algorithms for real-time analysis and decision-making.

The Math Bigdata Master aims to stimulate the demanding students with current and forward looking topics that integrate sound fundamental methods and practical applications in current and emerging domains.

The rationale

Data Science and Bigdata are two key areas for positive interdisciplinary science involving mathematics, computer science. The context is the management of heterogeneous data of potentially very big scale.

Data science is big deal across many industries, from retail to government to biotech. Our alumni found jobs in industries as various as:

  • Biotech
  • Energy
  • Finance
  • Gaming and hospitality
  • Government
  • Health care
  • Insurance
  • Internet
  • Manufacturing
  • Pharmaceutical
  • Retail
  • Telecom
  • Travel and transportation
  • Utilities

First Semester

Mandatory courses

Statistique en grande dimension (5 ECTS) [Christophe Giraud, Université ParisSud]

Probabilistic graphical inference (5 ECTS) [Francis Bach, Guillaume Obozinski, INRIA]

Computational Statistics and optimisation (5 ECTS) [Stéphane Gaiffas, Alexandre Gramfort, Joseph Salmon, Anne Sabourin, Pascal Bianchi]

Machine Learning : from theory to practice (5 ECTS) [Florence d’Alche Buc, Erwan Le Pennec]

Databases and Big Data Management (10 ECTS) [Michailis Vazirgiannis]

Elective courses:

Ecosystème Big Data (5 ECTS)

DataScience Forum (0 ECTS) [Eric Moulines, Joseph Salmon]

Second Semester

Elective courses

Machine learning for structured data (2,5 ECTS) [Florence d’Alche Buc, Eric Moulines]

Computational Statistics (5 ECTS) [Stéphanie Allassonnière]

Advance Machine Learing (2,5 ECTS) [Pierre Alquier, Massimilo Pontil]

Advanced Hadoop (2,5 ECTS)

Graph & Text Analytics for Big Data (5 ECTS) [Michalis Varzigiannis]

Master Thesis or Internship [20 ECTS, 14 weeks minimum]

Lieux d'enseignement