MSc Big Data

Dernière mise à jour : 
Dec 1, 2015
Master II
Scolarité : 
 € par an
ENSAI is accredited by the French Ministry of Higher Education and Research to deliver a Master of Science in Big Data (known as a Master international or DNM in the French nomenclature). ENSAI is currently the only French engineering school accredited to offer an all-English program with this unique training, combining Statistics and Computer Science.

The demand for skills in the field of high-dimensional data processing, otherwise known as Big Data, is increasing dramatically worldwide, yet academic programs for this are still quite rare.

ENSAI has many years of experience dispensing multidisciplinary training (Statistics, Computer Science, and Econometrics ), numerous international partnerships, a Big Data Academic Platform established by GENES, and a Big Data research chair financed by the Institut Louis Bachelier and the Fondation du Risque.

The Master of Science in Big Data offered by ENSAI therefore meets the very important need among corporations and organizations of all kinds for graduate-level training nearly non-existent in academic offerings on the French, European, and even international scale.

The structure of this Master's program is composed of 2 semesters of coursework at ENSAI, followed by an internship within the professional world or academia/research laboratories.

Since this program welcomes students with varying academic levels and skills in Computer Science, Mathematics, and Statistics, the structure aims to bring all students to the same scientific level in all three fields, with respect to their existing training, knowledge, and skills.

Therefore, in addition to common courses, the first semester includes courses tailored for students with different profiles. These courses take the form of two different tracks: Computer Science and Statistics, where students study the courses in which they need more training.

Graduates of the program are skilled Data Scientists. In addition to doctoral possibilities in research, they will have numerous career opportunities in international corporations and data start-ups in the following areas:

- Digital Marketing

- Business Analytics

- Risk Management

- Yield Management

- Industrial applications

- Supply and distribution

- Healthcare industry

- Social networks analysis

- Research and development in scientific domains

- Software industry



Probability, Algebra, and Analysis (Lectures: 15h)

This course introduces several essential notions in probability, algebra and analysis, required for all the following topics in Statistics.

Statistical Inference and Hypothesis Testing (Lectures: 10h, Tutorials: 5h)

This course provides a short introduction to some basic notions of Statistics.

Simulation and Monte Carlo Integration Methods (Lectures: 5h, Tutorials: 5h)

This course provides a short introduction to random variable generation and to Monte Carlo integration.

Regression Models (Lectures: 10h, Tutorials: 5h)

This course provides an introduction to the linear regression model and its generalizations. Some standard methods for dimension reduction (such as principal components analysis or multiple correspondence analysis) and for clustering (such as hierarchical clustering or the k-means clustering) will be introduced. Students will also learn how to interpret and to analyze software outputs with R, SAS or SPAD.

Basic Sampling Theory (Lectures: 10h, Tutorials: 5h)

This course provides an introduction to basic sampling techniques used in the case of finite population sampling and to the properties of the associated estimators. Topics include: Horvitz-Thompson estimator, measures of accuracy, simple random sampling, stratified sampling and unequal probability sampling.



Client – Server Architecture, JavaEE (Lectures: 10h, Tutorials: 15h)

The objective of this course is to learn how to develop and to deploy a dynamic website using Java.

Cloud Computing (Tutorials: 10h)

The need for software capable of storage and computing capacity has been increasing since the advent of computers. These resources are now available remotely, in the cloud. These tutorials will introduce the economic issues that gave rise to these services, as well as the technical solutions that allow for access to these web services.

JavaEE Project (Tutorials: 10h)

This project will begin by modeling an application using methods from software engineering techniques and will continue with its implementation in Java EE.

Computer Networks (Lectures: 20h, Tutorials: 20h)

This course aims to give students the foundations of operating systems in network architecture. Always-on connectivity, mobile devices, and connected objects are a part of our daily life. Data scientists need to take into account these new technologies. During this course, students will be given a primer on computer networks and the way they allow for new interactions.



Aggregation Methods in Statistics and Combinatorial Complexity (Lectures: 15h, Tutorials: 5h)

This course provides an introduction to aggregation methods used in statistical learning, such as bagging, random forest or boosting.

Association Rules Mining (Lectures: 5h, Tutorials: 5h)

The detection of association rules consists in finding high probability subsets for a finite dimensional random vector. This kind of problem occurs frequently in practice, in particular in market basket analysis, and then identifying customers’ most frequently purchased products as well as the association rules between some subsets of products.

Data Visualization (Tutorials: 10h)

These tutorials provide an introduction to some of these graphical tools for various data sets using in particular some R software packages, as well as Gephi or GGobbi software.

Olap, Multidimensional Databases (Lectures: 5h, Tutorials: 10h)

This course presents the multidimensional approach giving direct access to information according to multiple input points. It is used in Business Intelligence to facilitate decision making and to publish reports.

“Big Data” Databases (Lectures: 5h, Tutorials: 10h)

This course compares conventional approaches for dealing with Big Data issues, such as using datawarehouses. It will offer an overview of the very large databases that are available to face the new challenges of Big Data: velocity and variety.

NoSQL (Tutorials: 10h)

These tutorials explain to students the principles and foundations of the approach Not Only SQL, as well as a technical and practical overview of NoSQL used technologies, such as BigTable, Cassandra, Redis and MongoDB.

Penalized Regression (Lectures: 15h, Tutorials: 10h)

For regression models, high dimensional statistics refer to the situation when the number of predictors is one or several orders of magnitude larger than the sample size.

Variable Selection Methods (Lectures: 10h, Tutorials: 5h)

Variable selection in a high-dimensional setting has received considerable attention in recent years, in particular for regression models involving a large number of possible predictors.

Unix (shell script) (Lectures: 5h, Tutorials: 15h)

During this intensive workshop-style course, students will be walked through an installation of the most recent version of Linux on their machines. This will teach students how to use this system in depth, vital for their courses throughout the program.

Parallelized Systems (Lectures: 10h, Tutorials: 10h)

The founding principles of parallelism will be presented in order to design systems that simultaneously use different distributed resources. These systems are synchronized in their calculations and share other resources.

French Summer Program (Duration: 9 weeks: Classroom: 200h, Cultural activities: 100h)

Non-French speakers arrive 2 months early to France for a mandatory, intensive French language and culture course, while being hosted with a French family. This allows for students to acquire vital skills for daily life and cultural integration.

Language Courses (at CIREFE) (Duration: 2 or 4 hours/week over 11 weeks)

Designed for foreign students who are following a full-time academic program in Rennes, these weekly evening courses give students practical written and/or oral French skills, necessary for practical life in France.




Functional Data Analysis (Lectures: 15h, Tutorials: 10h)

This course provides an introduction to the modeling and the statistical analysis of functional data, and it also investigates the way functional data could be recovered from discretized observations.

Text Mining, Image Analysis (Lectures: 10h, Tutorials: 5h)

This course provides an introduction to the analysis of some specific data such as textual or image data. Text mining is the set of methods used for the automatic processing of natural language text data available in computer files.

Compressive sensing (Lectures: 15h, Tutorials: 5h)

Compressive sensing exploits the sparsity of a signal and proposes mathematical models that allow for acquiring and reconstructing signals using a few non-zero coefficients in a suitable basis or dictionary.

Parsimonious Representations (Lectures: 10h, Tutorials: 10h)

This course presents additional mathematical models and algorithms for low-dimension representations of large scale data. Several applications will be considered, for instance the pattern recognition and the analysis of large sets of images or videos.

Foundations of Big Data using MapReduce (Lectures: 10h, Tutorials: 10h)

This course presents IT issues arising in the real-time processing of massive and heterogeneous data.

Storm, HD File System (Lectures: 5h, Tutorials: 15h)

This course explains to students the principles and fundamentals of Hadoop followed by a technical and practical overview of used technologies directly related to Hadoop, such as Pig, Hive, Hbase, ZooKeeper, Mahoot, Spark, etc.

Programming with Big Data in R using Distributed Memory (Tutorials: 20h)

These tutorials show how to use the main R packages used in Big Data: some are for parallel computing, some are for working with data sets that are too large to be loaded into memory, some are for Map/Reduce programming, and some are for adding code in C, C++ or Fortran to R.

Statistical Libraries for Big Data (Mahout, SAS, HPA) (Tutorials: 20h)

These tutorials present alternatives to R for Big Data, with commercial solutions (SAS High-Performance Analytics) and Apache Mahout, which is a popular library of machine learning algorithms scalable to large data sets and mainly implemented on Hadoop.

Secure Pairing, Security Services against Piracy, Cryptography (Lectures: 10h, Tutorials: 20h)

The purpose of this course is to introduce the main principles of information security, two of these principles in particular: cryptography, one of the protection tools preventing disclosure, modification or illegitimate data access, and secure pairing, which preserves the anonymity of aggregated data.

Privacy (Lectures: 5h, Tutorials: 5h)

Privacy is a cornerstone in our digital economy. This course presents the regulations and laws in several countries relative to the privacy protection and personal data access protection. The course will present the technical solutions that make such protections possible.

Big Data Project (Lectures: 5h, Tutorials: 35h)

This project aims to deepen students’ knowledge of topics both previously learned in their courses and new. In small groups, students will work on projects centered on a current issue.

Courses for foreigners: Written and/or Oral French Language Courses (at CIREFE) (Duration: 2 or 4 hours/week over 11 weeks)

Designed for foreign students who are following a full-time academic program in Rennes, these weekly evening courses give students practical written and/or oral French skills, necessary for practical life in France.

End-of-Studies Internship (Duration: 5 months from May to September)

This final phase of the MSc in Big Data program involves a five month paid internship, which can take place either in France or abroad, in either the professional world or academic/research laboratories.

Lieux d'enseignement