logo dataxday

ON JUNE 27, 2019 – PAN PIPER, PARIS

SEPARATOR

The data centric conference in Paris

DataXDay is the data centric conference created for developers and enthusiasts.
Our vocation is to share ideas and visions around topics like data processing, streaming, data science, dataOps and security.
For the second edition of DataXDay, data experts will be giving you insights on the latests trends and sharing knowledge about data engineering and data science.
PICTO PEOPLE (1)

250 attendees

Come and meet more than 250 data lovers like yourself and let’s talk data!

PICTO DATA

For developers and Tech Leads

Our goal is to dive deeply into some of the hottest technical subjects at the crossroad between data science, data engineering, cloud computing, craftsmanship, dataOps and security.

PICTO CALENDAR

One Day

On June 27, spend your day enjoying talks from renowned speakers and sharing your ideas on what the future holds.

SEPARATOR

DataXDay will cover the following topics

topic1

Applied Machine Learning Done Right

Data science goes beyond prototyping and experimentation. What are the recipes to create valuable and good quality data products? Come learn about how machine learning integrates business workflows and the ways AI and deep learning impact industry.

corner
topic2

Reactive First

The trends in building applications are moving fast and they are coming with new ways to deal with data everyday. Things like stream processing, change data capture, microservices and serverless are changing the vision of what efficient data pipeline should look like.

corner
topic3

Data Intimacy

Data has become essential for businesses, it is vital to secure it. Join us to explore the latest architecture standards, craftsmanship techniques and security patterns to ensure data privacy on the cloud or on premise infrastructures.

corner
SEPARATOR

Program

9h - 9h05

Caroline Goulard
CEO, Dataveyes

Conférence en français 🇨🇵

Michaël Figuière
Software Engineer, Facebook

Conference en anglais 🇬🇧

Xavier Leaute
Software Engineer, Confluent

Reactive First
Conférence en anglais 🇬🇧

Learn how we optimized one of the more complex Kafka Streams applications we run at Confluent today. In the process you will become familiar with how to analyze and profile applications using tools such async-profiler and Java Flight Recorder. In the context of stream processing, this means understanding runtime behavior both at the JVM level, as well as interaction with the native libraries we often heavily rely on. We will explain how to identify bottlenecks, not only in terms of cpu usage, but also understand where your code might be blocking unexpectedly. This talk will go deep down the rabbit hole, from the high-level application structure down to lower-level system calls and back out. Today you need to be able to analyze applications not only on bare-metal but also in a container-based world, so we will also talk about the intricacies of performing some of those operations in Docker-based deployment models.

Par Xavier Leaute - Software Engineer, Confluent

Romain Sagean
Développeur Data - Xebia

App Machine Learning Done Right
Conférence en français 🇨🇵

Vos collègues Data Scientists ont choisi un modèle de deep learning, charge à vous d’assurer son industrialisation.
Comment paralléliser vos prédictions ? Êtes-vous obligés de mettre du Python en production ? Comment réentrainer le modèle à l’échelle ? Comment suivre ses performances ?
Autant de défis que nous avons relevés et dont nous vous partageons les solutions.

Par Romain Sagean - Développeur Data - Xebia

10h50 - 11h15

Kai Wähner
Technology Evangelist - Confluent

App Machine Learning Done Right
Conference en anglais 🇬🇧

This talk shows how to productionize Machine Learning models in mission-critical and scalable real time applications by leveraging Apache Kafka as streaming platform. The talk discusses the relation between Machine Learning frameworks such as TensorFlow, DeepLearning4J or H2O and the Apache Kafka ecosystem. A live demo shows how to build a mission-critical Machine Learning environment leveraging different Kafka components: Kafka messaging and Kafka Connect for data movement from and into different sources and sinks, Kafka Streams for model deployment and inference in real time, and KSQL for real time analytics of predictions, alerts and model accuracy.

Par Kai Wähner - Technology Evangelist - Confluent

Stéphane Mareek
CEO - DataCumulus

Reactive First
Conférence en anglais 🇬🇧

Apache Kafka has real-time capability and everyone knows that! The real challenge facing engineers comes from re-designing the existing data pipelines from batch to real-time. In this talk, we will do a case study on how to build an end-to-end real-time data pipeline by building four micro-services on top of Apache Kafka. It will give you insights into the Kafka Producer API, Avro and the Confluent Schema Registry, the Kafka Streams High-Level DSL, and Kafka Connect Sinks.

Par Stéphane Mareek - CEO - DataCumulus

12h55 - 14h

Jonathan Winandy
Dirigeant fondateur - Univalence

Reactive First
Conférence en français 🇨🇵

Avez-vous déjà rencontré un bug vraiment prise de tête ? Avez-vous souhaité pouvoir juste faire un ctrl-Z ?

Bien que les micro-services soient plus complexes à exploiter que leurs homologues monolithiques, ils laissent place à des architectures qui nous permettent d'analyser et de corriger les erreurs du passé et nous évitent des surprises dans le futur.

Après un rappel rapide sur le tracing distribué, nous verrons comment avec un Kafka récent et Jaeger on peut construire un système complet avec:

l'unification et la compression des données,
l'analyse de la cause et de la source des bugs et des effets,
et le ``voyage dans le temps``.
Aucunes connaissances préalables de ``Dapper`` et du fonctionnement des cabines téléphoniques sont requises ! 😉

Par Jonathan Winandy - Dirigeant fondateur - Univalence

Guillaume Michel
Machine Learning Engineer - Netatmo

App Machine Learning Done Right
Conférence en anglais 🇬🇧

Automated neural network architecture search (NAS) is a computation intensive task. Tools like Cloud AutoML have lower the technical barrier for adoption but they still lack the ability to optimize for multiple criteria at the same time. This talk presents the techniques we use at NETATMO to speed up Dvolver, our multi objective NAS engine and how we reduced 50 days of computations on a single GPU to a couple of days on multiple GPUs leveraging RabbitMQ, Docker and Kubernetes on Google Cloud Platform.

Par Guillaume Michel - Machine Learning Engineer - Netatmo

Olga Petrova
Machine Learning DevOps engineer - Scaleway

App Machine Learning Done Right
Conférence en anglais 🇬🇧

Generative Adversarial Networks (GANs), praised as “the most interesting idea in the last ten years in Machine Learning” by Yann LeCun, the director of Facebook AI, are unsupervised machine learning algorithms that have been used to generate anything from human faces and hotel rooms to cats etc. Despite their unsupervised origin, GANs are now increasingly being used for supervised ML tasks, such as face frontalization and super resolution. In this talk, I will present an easy-to-grasp explanation for why a GAN architecture may be useful for achieving photo-realistic results, and discuss what types of supervised ML projects are most suitable for using GANs.

Par Olga Petrova - Machine Learning DevOps engineer - Scaleway

16h10 - 16h40

Alban Perillat-Merceroz
Software Engineering Manager - Teads

Reactive First
Conférence en anglais 🇬🇧

This talk showcases how we built a platform that is capable of ingesting and transforming a stream of Billions of events a day using BigQuery, and how we use and abuse Redshift to deliver self-served, tailored views to many data visualisation clients and web apps.

Par Alban Perillat-Merceroz - Software Engineering Manager - Teads

Jacek Laskowski
Freelance IT Consultant

Reactive First
Conférence en anglais 🇬🇧

Let's talk about state management in Spark Structured Streaming. During this talk you will learn the streaming concepts that are particularly relevant for stateful stream processing in Structured Streaming, e.g. watermark and output modes, but also GroupState and GroupStateTimeout. We will be exploring simple stateful processing (with groupBy operator) and more advanced use cases with KeyValueGroupedDataset.mapGroupsWithState and the most advanced KeyValueGroupedDataset.flatMapGroupsWithState operator. In other words, you will learn how to use the stateful streaming API and understand the internals.

Par Jacek Laskowski - Freelance IT Consultant

18h20 - 20h

Download the program (PDF)
Caroline Goulard
CEO Dataveyes

Conférence en français 🇨🇵

Michaël Figuière
Software Engineer, Facebook

Conférence en français 🇨🇵

Pauline Nicolas / Théo Bontempelli
Data Scientist / Data Engineer - Deezer
Data

App Machine Learning Done Right
Conférence en français 🇨🇵

At Deezer, machine learning is at the heart of many aspects of the product. In the analytics team, we work on several tasks such as prediction and forecasting in order to provide meaningful insights for business and product teams. Through the discussions with these teams, we realised that for many projects, there was a real interest in productionalizing these models. This is what guided us to migrate our work from our exploratory Jupyter Notebooks to actual models in production with Scala Spark. For this purpose we have implemented processes from data architecture review to model implementation in Scala Spark. And to illustrate this, we will share with you our experience on churn prediction implementation and explain how data scientists and data engineers worked together for the success of this project.

Par Pauline Nicolas - Data Scientist - Deezer
et Théo Bontempelli - Data Engineer - Deezer

Thomas Franquelin
Staff Software Engineer - Contentsquare

Data Intimacy
Conférence en anglais 🇬🇧

In the course of 3 years, ContentSquare changed its main data store twice! We first moved from Redshift to Elasticsearch, then to Clickhouse.
During this time, we had to deal with a vast increase in data volume, and support more and more features. How did we ensure that we didn’t break our application in the process?
We’ll talk about a simple way to achieve this by replaying production load to legacy and new systems at the same time, and studying statistical differences between the two in order to pinpoint regressions. We’ll see that this method also makes from a coarse-grained, but fairly realistic load testing.

Par Thomas Franquelin - Staff Software Engineer - Contentsquare

10h50 - 11h15

Guillaume Laforge
Developer Advocate - Google

Data Intimacy
Conférence en français 🇨🇵

Des millions de personnes échangent des selfies sur Snapchat ou jouent à Super Mario Run tous les jours. Savez-vous où ces applications tournent ? C'est le vénérable Google App Engine et l'infrastructure de Google Cloud qui permettent aux développeurs de se focaliser sur leur code et de laisser le soin de scaler l'application à Google.
Que vous déployiez des applications avec App Engine, ou des fonctions avec Cloud Functions, vous ne vous souciez plus de serveurs ou de clusters à provisionner et vous payez proportionnellement à l'utilisation, sans coûts fixes.

Dans cette session, découvrons ensemble ce qu'il y a sous le capot, et surtout les nouveautés ``serverless`` de Google au delà d'App Engine et Cloud Functions.

Par Guillaume Laforge - Developer Advocate - Google

Tom Stringer / Antoine Isnardy
Junior Data Scientist / Data Scientist, NLP Lead - Quantmetry

App Machine Learning Done Right
Conférence en français 🇨🇵

Avec la multiplication des moyens numériques, une entreprise peut recevoir plusieurs milliers d'e-mails clients par jour. Un email est parfois transféré un grand nombre de fois avant que la bonne personne puisse le traiter, ralentissant ainsi le délais de réponse au client.

Le projet que nous présentons optimise le routage des emails en utilisant des algorithmes de traitement automatique du langage naturel. Il vise à pré-traiter les emails pour mieux les classer et les décrire par extraction de mots-clés. Le projet est actuellement en production à la MAIF, examinant 10 000 emails chaque jour. Il est maintenant disponible en open source sur https://github.com/MAIF/melusine.

Les différentes briques techniques reposent sur des méthodes statistiques complexes ainsi que sur des algorithmes d'apprentissage profond tels que les réseaux neuronaux à convolution.

Par Tom Stringer - Junior Data Scientist - Quantmetry
et Antoine Isnardy - Data Scientist, NLP Lead - Quantmetry

12h55 - 14h

Pauline Chavallard
Data Science Manager - Doctrine

App Machine Learning Done Right
Conférence en anglais 🇬🇧

Nearly 4 million decisions are delivered each year by French courts. Our goal at Doctrine is to help lawyers get straight to the point, by revealing the structure of court decisions.

In this talk, we'll present how we tackle the challenging issue of revealing the table of contents of court decisions. We will present the multiple solutions we iteratively developed and will focus on our best model on PyTorch using attention mechanisms on top of a paragraph embeddings representation of text.

Par Pauline Chavallard - Data Science Manager - Doctrine

Walid Haouari
Data Engineer - Xebia

Data Intimacy
Conférence en français 🇨🇵

Le design d'architecture data n'a jamais été chose facile. On rencontre souvent des risques d'inadaptation au besoin, des faibles performances, des blocages paresseux voir un accomplissement partiel des objectifs de départ. Dans la plupart du temps, ces problématiques sont directement liées à un manque ou une mauvaise gestion des resources.

L'Incremental Software Architecture est une méthode de conception avancée qui va permettre d'outrepasser ces risques tout en garantissant des systèmes élastiques, efficaces et rentables.

Nous allons voir ensemble comment adopter cette approche favorisant la productivité, étape par étape, le tout dans un contexte Data.

Par Walid Haouari - Data Engineer - Xebia

Victor Landeau
Ingénieur Machine Learning - OUI.sncf

App Machine Learning Done Right
Conférence en français 🇨🇵

Chez Oui.sncf, cela fait maintenant plusieurs années que nous utilisons des algorithmes de Machine Learning dans certains de nos applicatifs en production. Mais cela n'est pas sans poser de problème, notamment du fait du caractère non-déterministe de ces approches.

En effet, comment peut-on développer sereinement des applicatifs dont les sorties attendues ne sont pas connues par avance ?

Pour répondre à cette problématique, nous avons développé notre propre stratégie de tests, adaptée au monde incertain du Machine Learning. Cette approche se base sur trois grandes couches de tests que nous vous détaillerons dans ce Talk.

Par Victor Landeau - Ingénieur Machine Learning - OUI.sncf

16h10 - 16h40

Laurent Grangeau / Sylvain Lequeux
Cloud Solution Architect - Sogeti / Consultant - Xebia

App Machine Learning Done Right
Conférence en français 🇨🇵

L’intelligence artificielle est en train de révolutionner tous les domaines : médecine, pharmacie, automobile, même l’informatique en lui-même. Mais la multitude d’outils mis à disposition des développeurs rend la portabilité et l’entrainement de modèle compliqué, non répétable et non scalable. Durant ce talk, nous verrons comment déployer Kubeflow, un projet tirant partie de la puissance de Kubernetes afin d’entrainer des modèles de Machine Learning basé sur Tensorflow. Nous verrons aussi comment grâce aux GPU et Kubernetes, nous pouvons accélérer la phase d’apprentissage de chaque modèle. Enfin, nous verrons comment entrainer un modèle de machine learning simplement grâce à JupyterHub.

Par Laurent Grangeau - Cloud Solution Architect - Sogeti
et Sylvain Lequeux - Consultant - Xebia

Alexia Audevart
Data & Enthusiasm - Datactik

App Machine Learning Done Right
Conférence en français 🇨🇵

TensorFlow est l'un des frameworks majeurs pour faire du Deep Learning. Cependant, si vous avez construit des modèles en utilisant TensorFlow 1.x, il y a des chances que vous ayez trouvé la courbe d'apprentissage un peu raide et l'utilisation pas toujours intuitive, surtout si vous avez utilisé les APIs bas niveaux ! Google et la communauté de développement de TensorFlow ont pris en compte les différentes remarques et ont repensé le framework pour le rendre plus facilement utilisable ! C'est ce que je vous propose de parcourir ensemble en faisant le lien avec la version précédente. Info importante, TensorFlow v1.x ne sera plus maintenue dès que la release de tensorFlow v2.0 sera dans les bacs 😉

Par Alexia Audevart - Data & Enthusiasm - Datactik

18h20 - 20h

SEPARATOR

Tickets

Super Early Bird

30€

Sold out

Early Bird

70€

Buy ticket

Regular

95€

Coming soon

SEPARATOR

Speakers

Caroline Goulard

CEO, Dataveyes

Michaël Figuière

Software Engineer, Facebook

Guillaume Laforge

Developer Advocate, Google

Alban Perillat-Merceroz

Software Engineering Manager, Teads

GUILLAUME MICHEL

Machine Learning Engineer, Netatmo

JACEK LASKOWSKI

Freelance IT Consultant

ROMAIN SAGEAN

Développeur Data, Xebia

Sylvain Lequeux

Consultant BigData, Xebia

LAURENT GRANGEAU

Cloud Solution Architect, Sogeti

OLGA PETROVA

Machine Learning DevOps engineer, Scaleway

JONATHAN WINANDY

Dirigeant fondateur, Univalence

KAI WÄHNER

Technology Evangelist, Confluent

STÉPHANE MAREEK

CEO, DataCumulus

WALID HAOUARI

Data Engineer, Xebia

Théo Bontempelli

Data Engineer, Deezer

Antoine Isnardy

Data Scientist | NLP Lead, Quantmetry

ALEXIA AUDEVART

Data &Enthusiasm, Datactik

VICTOR LANDEAU

Ingénieur Machine Learning, OUI.sncf

TOM STRINGER

Junior Data Scientist, Quantmetry

PAULINE NICOLAS

Data Scientist, Deezer

Xavier Leaute

Software Engineer, Confluent

PAULINE CHAVALLARD

Data Science Manager, Doctrine

THOMAS FRANQUELIN

Staff Software Engineer, Contentsquare

SEPARATOR

Meet and Greet

The Meet & Greet evening is the perfect time to discover DataXday, meet our sponsors, discuss with international speakers and other Data Lovers, in a relaxed atmosphere.

The inscription is free, but places are limited.

Inscription

Free

Book my ticket

SEPARATOR

Contact & access

The venue is located in the 11th arrondissement of Paris, a few steps from Metro Station Philippe Auguste on Line 2 or a 5-minutes walk from Metro Station Charonne on Line 9.

COME TO THE CONFERENCE

PAN PIPER - 4 impasse Lamier, 75011 Paris

SEPARATOR

Sponsors

Sponsor Platinium

Sponsor Gold

SEPARATOR

With the support of

SEPARATOR

Brought to you by

Caroline Goulard

CEO, Dataveyes

Linkedin

Biographie

Caroline Goulard a co-fondé en 2010 Dataveyes, une entreprise spécialisée dans les interactions hommes-données. Avant cela, elle a étudié le journalisme de données, portée par la conviction que l’ère des données riches va transformer nos façons de travailler, d’apprendre et de communiquer.

Conférence

Keynote La confiance entre les humains et les données

Michaël Figuière

Software Engineer, Facebook

Linkedin

Biographie

Michaël Figuière is a software engineer at Instagram where he focuses on improving its database infrastructure for a better efficiency and reliability. Previously, he worked on other large scale database deployments at Netflix and Apple and he lead the Cassandra drivers team at DataStax.

Conférence

Keynote Database Infrastructure at Instagram Scale

Guillaume Laforge

Developer Advocate, Google

Twitter

Guillaume Laforge est Developer Advocate pour Google Cloud.
Il se focalise sur les solutions ""serverless"" offertes pas Google Cloud Platform, dont Google App Engine et Cloud Functions.
Par ailleurs, Guillaume est un Java Champions, le co-fondateur du langage de programmation Apache Groovy, et est un des membres fondateurs du podcast techniques Les Cast Codeurs.

Conférence

Quickies - Les nouveautés Serverless de GCP

Des millions de personnes échangent des selfies sur Snapchat ou jouent à Super Mario Run tous les jours. Savez-vous où ces applications tournent ? C'est le vénérable Google App Engine et l'infrastructure de Google Cloud qui permettent aux développeurs de se focaliser sur leur code et de laisser le soin de scaler l'application à Google.
Que vous déployiez des applications avec App Engine, ou des fonctions avec Cloud Functions, vous ne vous souciez plus de serveurs ou de clusters à provisionner et vous payez proportionnellement à l'utilisation, sans coûts fixes.
Dans cette session, découvrons ensemble ce qu'il y a sous le capot, et surtout les nouveautés "serverless" de Google au delà d'App Engine et Cloud Functions.

Alban Perillat-Merceroz

Software Engineering Manager, Teads

Linkedin Twitter

Biographie

I'm an Engineer Manager at Teads in Montpellier, helping the Analytics team build a platform to give sense to Billions events a day. Previously, I worked on search engine, payment engine and more at Viadeo in San Francisco and Paris, and graduated from Epita.

Conférence

16h40 - 17h25 / SALLE 1 - GIVE MEANING TO 100 BILLION ANALYTICS EVENTS A DAY, ANALYTICS AT TEADS

This talk showcases how we built a platform that is capable of ingesting and transforming a stream of Billions of events a day using BigQuery, and how we use and abuse Redshift to deliver self-served, tailored views to many data visualisation clients and web apps.

Guillaume Michel

Machine Learning Engineer, Netatmo

Linkedin Twitter

Biographie

Guillaume is a pationate machine learning engineer. He shares its time between software optimization and neural network architecture research. For the last 5 years, he has developed fast computer vision algorithms on embedded devices for NETATMO's security cameras.

Conférence

14h55 - 15h15 / SALLE 1 - HOW TO SCALE NEURAL NETWORK ARCHITECTURE SEARCH WITH RABBITMQ AND KUBERNETES

Automated neural network architecture search (NAS) is a computation intensive task. Tools like Cloud AutoML have lower the technical barrier for adoption but they still lack the ability to optimize for multiple criteria at the same time. This talk presents the techniques we use at NETATMO to speed up Dvolver, our multi objective NAS engine and how we reduced 50 days of computations on a single GPU to a couple of days on multiple GPUs leveraging RabbitMQ, Docker and Kubernetes on Google Cloud Platform.

Jacek Laskowski

Freelance IT Consultant

Linkedin Twitter

Biographie

Jacek Laskowski is a freelance IT consultant, software engineer and technical instructor specializing in Apache Spark, Apache Kafka and Kafka Streams (with Scala and sbt). He offers software development and consultancy services with hands-on in-depth workshops and mentoring. Reach out to him at jacek@japila.pl or @jaceklaskowski to discuss opportunities.

Conférence

17h35 - 18h20 / SALLE 1 - The Internals of Stateful Stream Processing in Spark Structured Streaming

Let's talk about state management in Spark Structured Streaming. During this talk you will learn the streaming concepts that are particularly relevant for stateful stream processing in Structured Streaming, e.g. watermark and output modes, but also GroupState and GroupStateTimeout. We will be exploring simple stateful processing (with groupBy operator) and more advanced use cases with KeyValueGroupedDataset.mapGroupsWithState and the most advanced KeyValueGroupedDataset.flatMapGroupsWithState operator. In other words, you will learn how to use the stateful streaming API and understand the internals.

Romain Sagean

Développeur Data, Xebia

Linkedin Twitter

Biographie

Développeur depuis 4 ans, Romain « scauglog » s’intéresse à tout ce qui tourne autour de l’écosystème Big Data. UX de formation, il aime rapprocher ces deux mondes et code secrètement en JavaScript.

Conférence

10h30-10h50 / SALLE 1 - DEEP LEARNING EN PRODUCTION VU PAR UN DATA ENGINEER

Vos collègues Data Scientists ont choisi un modèle de deep learning, charge à vous d’assurer son industrialisation.

Comment paralléliser vos prédictions ? Êtes-vous obligés de mettre du Python en production ? Comment réentrainer le modèle à l’échelle ? Comment suivre ses performances ?

Autant de défis que nous avons relevés et dont nous vous partageons les solutions.

Sylvain Lequeux

Consultant BigData, Xebia

Conférence

16h40 - 17h25 / SALLE 2 - KUBEFLOW : TENSORFLOW ON KUBERNETES

L’intelligence artificielle est en train de révolutionner tous les domaines : médecine, pharmacie, automobile, même l’informatique en lui-même. Mais la multitude d’outils mis à disposition des développeurs rend la portabilité et l’entrainement de modèle compliqué, non répétable et non scalable. Durant ce talk, nous verrons comment déployer Kubeflow, un projet tirant partie de la puissance de Kubernetes afin d’entrainer des modèles de Machine Learning basé sur Tensorflow. Nous verrons aussi comment grâce aux GPU et Kubernetes, nous pouvons accélérer la phase d’apprentissage de chaque modèle. Enfin, nous verrons comment entrainer un modèle de machine learning simplement grâce à JupyterHub.

Laurent Grangeau

Cloud Solution Architect, Sogeti

Linkedin Twitter

Biographie

Laurent Grangeau est un Cloud Solution Architect avec plus de 10 ans d'expérience. Ancien développeur Java, il a depuis développé en.NET, avec l'agilité et DevOps en tête. Il expérimente avec les fournisseurs de cloud computing depuis plus de 5 ans. Passionné de Docker depuis le début, il a fait l’expérience de construire des microservices et des systèmes distribués. Il aime automatiser les choses et exécuter des applications distribuées à l'échelle.

Conférence

16h40 - 17h25 / SALLE 2 - KUBEFLOW : TENSORFLOW ON KUBERNETES

L’intelligence artificielle est en train de révolutionner tous les domaines : médecine, pharmacie, automobile, même l’informatique en lui-même. Mais la multitude d’outils mis à disposition des développeurs rend la portabilité et l’entrainement de modèle compliqué, non répétable et non scalable. Durant ce talk, nous verrons comment déployer Kubeflow, un projet tirant partie de la puissance de Kubernetes afin d’entrainer des modèles de Machine Learning basé sur Tensorflow. Nous verrons aussi comment grâce aux GPU et Kubernetes, nous pouvons accélérer la phase d’apprentissage de chaque modèle. Enfin, nous verrons comment entrainer un modèle de machine learning simplement grâce à JupyterHub.

Olga Petrova

Machine Learning DevOps engineer, Scaleway

Linkedin

Biographie

Having spent 10 years in theoretical quantum physics, Olga embarked on a new journey into the exciting field of Deep Learning. She is now working as a Machine Learning engineer at Scaleway, where she spends her time building neural networks and shaping cloud products that open the world of AI to everyone willing to try.

Conférence

15h25 - 16h10 / SALLE 1 - HARNESSING THE POWER OF GENERATIVE ADVERSARIAL NETWORKS (GANS) FOR SUPERVISED LEARNING

Generative Adversarial Networks (GANs), praised as “the most interesting idea in the last ten years in Machine Learning” by Yann LeCun, the director of Facebook AI, are unsupervised machine learning algorithms that have been used to generate anything from human faces and hotel rooms to cats etc. Despite their unsupervised origin, GANs are now increasingly being used for supervised ML tasks, such as face frontalization and super resolution. In this talk, I will present an easy-to-grasp explanation for why a GAN architecture may be useful for achieving photo-realistic results, and discuss what types of supervised ML projects are most suitable for using GANs.

Jonathan Winandy

Dirigeant fondateur, Univalence

Linkedin Twitter

Biographie

Passionné par la data et la programmation qui fonctionne, Jonathan s’est spécialisé dans l'outillage et l’analyse des mouvements de données dans les différentes formes de systèmes d'information.

Dans la vie de tous les jours, c'est un adépte de Kafka, en particulier du procès !

Conférence

14h00 - 14h45 / SALLE 1 - L'INCROYABLE EFFICACITÉ DE L'UNIFICATION DES LOGS !

Avez-vous déjà rencontré un bug vraiment prise de tête ? Avez-vous souhaité pouvoir juste faire un ctrl-Z ?

Bien que les micro-services soient plus complexes à exploiter que leurs homologues monolithiques, ils laissent place à des architectures qui nous permettent d'analyser et de corriger les erreurs du passé et nous évitent des surprises dans le futur.

Après un rappel rapide sur le tracing distribué, nous verrons comment avec un Kafka récent et Jaeger on peut construire un système complet avec:

l'unification et la compression des données, l'analyse de la cause et de la source des bugs et des effets, et le ""voyage dans le temps"".
Aucunes connaissances préalables de ""Dapper"" et du fonctionnement des cabines téléphoniques sont requises ! 😉

Kai Wähner

Technology Evangelist, Confluent

Linkedin Twitter

Biographie

Kai Waehner works as Technology Evangelist at Confluent. Kai’s main area of expertise lies within the fields of Big Data Analytics, Machine Learning / Deep Learning, Messaging, Integration, Microservices, Stream Processing, Internet of Things and Blockchain.

Conférence

11h15 - 12h00 / SALLE 1 - HOW TO LEVERAGE THE APACHE KAFKA ECOSYSTEM TO PRODUCTIONIZE MACHINE LEARNING

This talk shows how to productionize Machine Learning models in mission-critical and scalable real time applications by leveraging Apache Kafka as streaming platform. The talk discusses the relation between Machine Learning frameworks such as TensorFlow, DeepLearning4J or H2O and the Apache Kafka ecosystem. A live demo shows how to build a mission-critical Machine Learning environment leveraging different Kafka components: Kafka messaging and Kafka Connect for data movement from and into different sources and sinks, Kafka Streams for model deployment and inference in real time, and KSQL for real time analytics of predictions, alerts and model accuracy.

Stéphane Mareek

CEO, DataCumulus

Linkedin Twitter

Biographie

Stephane is an enthusiastic member of the Apache Kafka community. He’s passionate about helping others learn, leverage and master Apache Kafka, enabling them to solve their various challenges. Stephane is probably best known to the community for his contributions on learning Apache Kafka through video courses on Udemy. To this date over 30,000 students are learning in the “Apache Kafka Series,” an ensemble of eight courses with over 30 hours of video. Stephane has worked in France, the U.S. and Australia. He has implemented Kafka at scale for various clients during his consulting engagements, in fully secured cloud environments.

Conférence

12h10- 12h50 / SALLE 1 - HOW TO USE APACHE KAFKA TO TRANSFORM A BATCH PIPELINE INTO A REAL-TIME ONE?

Apache Kafka has real-time capability and everyone knows that! The real challenge facing engineers comes from re-designing the existing data pipelines from batch to real-time. In this talk, we will do a case study on how to build an end-to-end real-time data pipeline by building four micro-services on top of Apache Kafka. It will give you insights into the Kafka Producer API, Avro and the Confluent Schema Registry, the Kafka Streams High-Level DSL, and Kafka Connect Sinks.

Walid Haouari

Data Engineer, Xebia

Linkedin

Biographie

Walid est Data Engineer chez Xebia. Passionné par la data, il s'intéresse à tout le cycle de vie de cette dernière. Il milite au quotidien pour la mettre au service de la société.

Conférence

10h30-10h50 / SALLE 2 - INCREMENTAL DATA ARCHITECTURE

Le design d'architecture data n'a jamais été chose facile. On rencontre souvent des risques d'inadaptation au besoin, des faibles performances, des blocages paresseux voir un accomplissement partiel des objectifs de départ. Dans la plupart du temps, ces problématiques sont directement liées à un manque ou une mauvaise gestion des resources.

L'Incremental Software Architecture est une méthode de conception avancée qui va permettre d'outrepasser ces risques tout en garantissant des systèmes élastiques, efficaces et rentables.

Nous allons voir ensemble comment adopter cette approche favorisant la productivité, étape par étape, le tout dans un contexte Data.

Théo Bontempelli

Data Engineer, Deezer

Linkedin

Biographie

Théo holds a MS degree in both mathematics and computer science. After one year as a data scientist at Deezer, he switched team to become a data engineer. He is currently working on data pipeline and architecture.

Conférence

10h - 10h20 / SALLE 2 - FROM JUPYTER NOTEBOOKS TO MACHINE LEARNING IN PRODUCTION

At Deezer, machine learning is at the heart of many aspects of the product. In the analytics team, we work on several tasks such as prediction and forecasting in order to provide meaningful insights for business and product teams. Through the discussions with these teams, we realised that for many projects, there was a real interest in productionalizing these models. This is what guided us to migrate our work from our exploratory Jupyter Notebooks to actual models in production with Scala Spark. For this purpose we have implemented processes from data architecture review to model implementation in Scala Spark. And to illustrate this, we will share with you our experience on churn prediction implementation and explain how data scientists and data engineers worked together for the success of this project.

Antoine Isnardy

Data Scientist | NLP Lead, Quantmetry

Linkedin

Biographie

Antoine Isnardy has been working at Quantmetry since 2017 as an Experienced Data Scientist Consultant. He helps companies (luxury, banking, life services…) to leverage AI into their business. He had the opportunity to work on a wide range of scientific challenges, including natural language processing. Antoine leads Quantmetry NLP’s practice, a dedicated team in charge of bringing NLP’s state-of-the-art to the market. Antoine is a graduate of Télécom ParisTech and Ensae ParisTech, and holds a Data Science Master’s degree from Ecole Polytechnique.

Conférence

12h10- 12h50 / SALLE 2 - Natural language processing for emails qualification

With the multiplication of digital means, a company can receive thousands of emails a day. An email can therefore be forwarded a large number of times before the right person can process it.

<br/

Technically, the project involves complex statistical methods as well as deep learning algorithms such as convolutional neural networks. We would be thrilled to present you the methods used to calibrate and train these algorithms.

Alexia Audevart

Data &Enthusiasm, Datactik

Linkedin Linkedin

Biographie

Alexia Audevart est “Data & Enthusiasm” et fondatrice de la société datactik.
Elle baigne dans la donnée sous toutes ses formes : Big Data, Data Science, Machine Learning, Deep Learning et Intelligence Artificielle.
Quand elle n'accompagne pas ses clients dans la valorisation de leurs données, elle fédère la communauté meetup Toulouse Data Science.
Elle fait partie des 100 français qui font l’IA selon l’Usine Nouvelle et a récemment rejoint la communauté des Google Developer Expert Machine Learning.

Conférence

17h35 - 18h20 / SALLE 2 - TENSORFLOW 1.X N'EST PLUS, VIVE TENSORFLOW 2.0

TensorFlow est l'un des frameworks majeurs pour faire du Deep Learning. Cependant, si vous avez construit des modèles en utilisant TensorFlow 1.x, il y a des chances que vous ayez trouvé la courbe d'apprentissage un peu raide et l'utilisation pas toujours intuitive, surtout si vous avez utilisé les APIs bas niveaux ! Google et la communauté de développement de TensorFlow ont pris en compte les différentes remarques et ont repensé le framework pour le rendre plus facilement utilisable ! C'est ce que je vous propose de parcourir ensemble en faisant le lien avec la version précédente. Info importante, TensorFlow v1.x ne sera plus maintenue dès que la release de tensorFlow v2.0 sera dans les bacs ;-)

Victor Landeau

Ingénieur Machine Learning, OUI.sncf

Linkedin

Biographie

Ingénieur machine learning chez Oui.sncf, je suis passionné par la combinaison de la data science, du software craftmanship et des méthodes agiles.

Conférence

Conférence - Quelle stratégie de test pour vos applicatifs de machine learning

Chez Oui.sncf, cela fait maintenant plusieurs années que nous utilisons des algorithmes de Machine Learning dans certains de nos applicatifs en production. Mais cela n'est pas sans poser de problème, notamment du fait du caractère non-déterministe de ces approches.

En effet, comment peut-on développer sereinement des applicatifs dont les sorties attendues ne sont pas connues par avance ?

Pour répondre à cette problématique, nous avons développé notre propre stratégie de tests, adaptée au monde incertain du Machine Learning. Cette approche se base sur trois grandes couches de tests que nous vous détaillerons dans ce Talk.

Tom Stringer

Junior Data Scientist, Quantmetry

Linkedin

Biographie

Tom Stringer joined Quantmetry in 2018 and developed a deep knowledge in Natural Language Processing during a project for Maif, a major french insurer. He has notably developed and industrialized a state-of-the-art deep learning model for email classification. The model was then released as an Open Source library called Melusine. Tom Stringer is a graduate from CentraleSupélec and from ESCP Europe with a specialization in applied mathematics.

Conférence

12h10- 12h50 / SALLE 2 - Natural language processing for emails qualification

With the multiplication of digital means, a company can receive thousands of emails a day. An email can therefore be forwarded a large number of times before the right person can process it.

The project we would like to present allows us to optimize the routing of emails by using natural language processing algorithms. The package specifically aims at preprocessing emails to better classify them and describe them by extracting keywords. The package is currently running in production at MAIF, examining 5000 mails every day. It also leads to an open source development now available at https://github.com/MAIF/melusine.

Technically, the project involves complex statistical methods as well as deep learning algorithms such as convolutional neural networks. We would be thrilled to present you the methods used to calibrate and train these algorithms.

Pauline Nicolas

Data Scientist, Deezer

Linkedin

Biographie

Pauline is a data scientist at Deezer. She works in the analytics team and is currently in charge of churn prediction and clustering projects.
She holds a master in mathematics and data science.

Conférence

10h - 10h20 / SALLE 2 - FROM JUPYTER NOTEBOOKS TO MACHINE LEARNING IN PRODUCTION

At Deezer, machine learning is at the heart of many aspects of the product. In the analytics team, we work on several tasks such as prediction and forecasting in order to provide meaningful insights for business and product teams. Through the discussions with these teams, we realised that for many projects, there was a real interest in productionalizing these models. This is what guided us to migrate our work from our exploratory Jupyter Notebooks to actual models in production with Scala Spark. For this purpose we have implemented processes from data architecture review to model implementation in Scala Spark. And to illustrate this, we will share with you our experience on churn prediction implementation and explain how data scientists and data engineers worked together for the success of this project.

Xavier Leaute

Software Engineer, Confluent

Conférence

Quickies - KAFKA STREAMS, PROFILING, FLAMEGRAPHS, OH MY!

Learn how we optimized one of the more complex Kafka Streams applications we run at Confluent today. In the process you will become familiar with how to analyze and profile applications using tools such async-profiler and Java Flight Recorder. In the context of stream processing, this means understanding runtime behavior both at the JVM level, as well as interaction with the native libraries we often heavily rely on. We will explain how to identify bottlenecks, not only in terms of cpu usage, but also understand where your code might be blocking unexpectedly. This talk will go deep down the rabbit hole, from the high-level application structure down to lower-level system calls and back out. Today you need to be able to analyze applications not only on bare-metal but also in a container-based world, so we will also talk about the intricacies of performing some of those operations in Docker-based deployment models.

Pauline Chavallard

Data Science Manager, Doctrine

Linkedin

Biographie

Après 3 années en école d'ingénieur, elle est aujourd'hui Data Scientist à Paris auprès de l'entreprise Doctrine.fr. Au quotidien, elle contribue à la compréhension de documents juridiques (décisions et commentaires de justice, textes législatifs) par des méthodes d'apprentissage automatique. Ses précédentes expériences lui ont permis d'acquérir une expertise en traitement de la donnée et en traitement du langage naturel.

Conférence

14h - 14h45 / SALLE 2 - STRUCTURING LEGAL DOCUMENTS WITH DEEP LEARNING

Nearly 4 million decisions are delivered each year by French courts. Our goal at Doctrine is to help lawyers get straight to the point, by revealing the structure of court decisions.is talk, we'll present how we tackle the challenging issue of revealing the table of contents of court decisions. We will present the multiple solutions we iteratively developed and will focus on our best model on PyTorch using attention mechanisms on top of a paragraph embeddings representation of text.

Thomas Franquelin

Staff Software Engineer, Contentsquare

Linkedin

Biographie

Thomas has worked for 10 years in companies big and small and in various industries as software engineer, tech lead and staff engineer. He has spent a good part of his career in London, where he developed an appreciation for pub crawls and tongue-in-cheek humour.

Conférence

10h30 - 10h50 / SALLE 2 - Confident Data Migration: Automatic regression testing

In the course of 3 years, ContentSquare changed its main data store twice! We first moved from Redshift to Elasticsearch, then to Clickhouse.
During this time, we had to deal with a vast increase in data volume, and support more and more features. How did we ensure that we didn’t break our application in the process?
We’ll talk about a simple way to achieve this by replaying production load to legacy and new systems at the same time, and studying statistical differences between the two in order to pinpoint regressions. We’ll see that this method also makes from a coarse-grained, but fairly realistic load testing.