The organization of the 4th Conference on Statistics and Data Science will be carried out in colaboration with the Department
of Statistics at the Federal University of Bahia, Brazil.
The purpose of the CSDS 2022 is to bring together researchers and practitioners, from the academy
and from the industry, that develop and apply statistical and computational methods for data
science. This conference will provide a forum to share and discuss ways to improve the
access to knowledge, and promote interdisciplinary collaborations.
The scientific program will be very appealing for most statisticians and data scientists
interested in quantitative methods for decision making and will include plenary talks,
invited sessions, short courses, round tables, and contributed posters.
NOTE: For a paper to be included in the scientific program, it must have the abstract approved by the
Scientific Program Committee and the authors must have submitted the 4 slides poster by November 20, 2022.
Alexandra M. Schmidt is Professor of Biostatistics and holds the endowed University Chair in the Department of Epidemiology, Biostatistics and Occupational Health (EBOH) at McGill University. She is an Elected Fellow of the American Statistical Association (2020) and an Elected Member of the International Statistical Institute (2010). She was awarded the Distinguished Achievement Medal (2017) from the American Statistical Association’s Section on Statistics and the Environment and the Abdel El-Shaarawi Young Investigator Award (2008), from The International Environmetrics Society. Her main area of research is on the development of flexible spatial and spatio-temporal models.
Degree in Mathematics and Master in Statistics, University of São Paulo/Brazil, and PhD in Biostatistics, University of North Carolina at Chapel Hill/USA. Professor at Federal University of Santa Catarina, working in the Graduate Programs: PPGEP, Department of Production Engineering, and PPGMGA, Department of Informatics and Statistics. Associate Researcher at Vunesp Foundation and Consultant at INEP/MEC in Quantitative Methods for Educational Assessment. He has experience in the areas of Probability and Statistics, with an emphasis on Data Analysis, working mainly on the following topics: Item Response Theory, Educational Assessment, Latent Variable Models, Longitudinal Data and Linear and Nonlinear Hierarchical/Multilevel Models.
Genevera Allen is an Associate Professor of Electrical and Computer Engineering, Statistics, and Computer Science at Rice University and an investigator at the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital and Baylor College of Medicine. She is also the Founding Director of the Rice Center for Transforming Data to Knowledge, informally called the Rice D2K Lab.
Dr. Allen’s research develops new statistical machine learning tools to help people make reproducible data-driven discoveries. She is known for her methods and theory work in the areas of unsupervised learning, interpretable machine learning, data integration, graphical models, and high-dimensional statistics. Her work is often motivated by solving real scientific problems, especially in the areas of neuroscience and bioinformatics. Dr. Allen is also a leader in data science education. In 2018, she founded the Rice D2K Lab, a campus hub for experiential learning and data science education. Through her leadership of the D2K Lab, Dr. Allen developed new interdisciplinary data science degree programs, established a novel capstone program in data science and machine learning, and led Rice’s engagement with corporate and community partners in data science.
Dr. Allen is the recipient of several honors for both her research and educational efforts including a National Science Foundation Career Award, Rice University’s Duncan Achievement Award for Outstanding Faculty, the Curriculum Innovation Award, and the School of Engineering’s Research and Teaching Excellence Award. In 2014, she was named to the “Forbes ’30 under 30′: Science and Healthcare” list. She is also an elected fellow of the International Statistics Institute and the American Statistical Association. Dr. Allen currently serves as an Action Editor for the Journal of Machine Learning Research and a Series Editor for Springer Texts in Statistics. Dr. Allen received her Ph.D. in statistics from Stanford University, under the mentorship of Prof. Robert Tibshirani, and her bachelors, also in statistics, from Rice University.
McGill University, Canada
American Statistical Association - ASA
Coupled Markov switching count models for monitoring the spread of infectious diseases
Spatio-temporal counts of infectious disease cases often contain an excess of zeros. It is important for decision makers to identify periods of persistence (presence to presence) and reemergence (absence to presence) of a disease. Similarly, when modelling hospital admissions it is of interest to identify epidemic or endemic periods to predict hospital capacity. In this talk I will discuss a class of coupled nonhomogeneous Markov switching models that addresses these issues. Inference and prediction are performed under the Bayesian paradigm. To showcase the ability of the proposed models in addressing the above issues we analyze spatio-temporal counts of dengue fever cases in Rio de Janeiro and COVID-19 hospital admissions in the 30 largest Quebec hospitals. This is joint work with Dirk Douwes-Schultz, PhD student in the Program of Biostatistics, McGill University.
Federal University of Bahia, Brazil.
Union College in Schenectady, NY, USA.
Dr. Roger W. Hoerl is the Brate-Peschel Associate Professor of Statistics at Union College in Schenectady, NY. Previously, he led the Applied Statistics Lab at GE Global Research. While at GE, Dr. Hoerl led a team of statisticians, applied mathematicians, and computational financial analysts who worked on some of GE’s most challenging research problems, such as developing personalized medicine protocols, enhancing the reliability of aircraft engines, and management of risk for a half-trillion dollar portfolio.
Dr. Hoerl has been named a Fellow of the American Statistical Association and the American Society for Quality, and has been elected to the International Statistical Institute and the International Academy for Quality. He has received the Brumbaugh and Hunter Awards, as well as the Shewhart Medal, from the American Society for Quality, and the Founders Award and Deming Lectureship Award from the American Statistical Association. While at GE Global Research, he received the Coolidge Fellowship, honoring one scientist a year from among the four global GE Research and Development sites for lifetime technical achievement. His book with Ron Snee, Statistical Thinking: Improving Business Performance, now in its 3 rd edition, was called “the most practical introductory statistics textbook every published in a business context” by the journal Technometrics.
Faculty of Math and Science at Brock University, Canada.
Professor S. Ejaz Ahmed is Professor of Statistics and Dean of the Faculty of Math and Science at Brock University, Canada. Previously, he was Professor and Head of the Mathematics and Statistics Department at the University of Windsor, Canada and University of Regina, Canada as well as Assistant Professor at the University of the Western Ontario, Canada. He holds adjunct professorship positions at many Canadian and International universities. He has supervised more than 20 Ph.D. Students, and organized several international workshops and conferences around the globe. He is a Fellow of the American Statistical Association and held prestigious ASEAN Chair Professorship position. His areas of expertise include big data analysis, statistical learning, and shrinkage estimation strategy. Having authored several books, he edited and co-edited several volumes and special issues of scientific journals. He is Technometrics Review Editor for past ten years. Further, he is Editor and associate editor of many statistical journals. Overall, he published more than 200 articles in scientific journals and reviewed more than 100 books. Having been among the Board of Directors of the Statistical Society of Canada, he was also Chairman of its Education Committee. Moreover, he was Vice President of Communications for The International Society for Business and Industrial Statistics (ISBIS) as well as a member of the "Discovery Grants Evaluation Group" and the "Grant Selection Committee" of the Natural Sciences and Engineering Research Council of Canada.
SERASA EXPERIAN, Brazil.
Frederico is passionate for learning and helping decision making based on appropriate data science approaches for the scenario in hands. After a BSc, MSc and PhD in Statistics from University of São Paulo, he had a mix of experiences ranging from banking (Citi and HSBC), consulting (freelance and Moody’s Analytics), insurance (LexisNexis and MAPFRE), and credit risk bureaux (Boa Vista Serviços and Serasa Experian).
Secretario Geral da ABJ, Brasil
R, Zero to Hero: Relatórios Reprodutíveis.
Já pensou em escrever um relatório que é atualizado automaticamente? Imagine poder usar o mesmo formato de documento para gerar relatórios, fazer listas de exercícios, livros, apresentações e até sites. Isso tudo misturando texto e código, de forma completamente reprodutível. Neste minicurso, os participantes construirão um relatório do zero e o deixarão público, utilizando como ferramentas o R e o Quarto, que foi lançado em 2022. Click aqui para material de referência.
Doutorando em Estatística pelo IME-USP. Secretário-geral da Associação Brasileira de Jurimetria (ABJ). Sócio da Terranova consultoria e da Curso-R treinamentos. Professor auxiliar de Ciência de Dados e Decisão no Insper.
University of Porto
The Catholic Porto Business School
Symbolic Data Analysis: Parametric Multivariate Analysis of Interval Data
Symbolic Data is concerned with analysing data with intrinsic variability, which is to be taken into account. In Data Mining, Multivariate Data Analysis and classical Statistics, the elements under analysis are generally individual entities for which a single value is recorded for each variable - e.g., individuals, described by age, salary, education level, etc. But when the elements of interest are classes or groups of some kind - the citizens living in given towns; car models, rather than specific vehicles - then there is variability inherent to the data. Symbolic data goes beyond the usual data representation model, considering variables whose observed values for each element are no longer necessarily single real values or categories, but may assume the form of sets, intervals, or, more generally, distributions. In this Tutorial we focus on the analysis of interval data, i.e., when the variables’ values are intervals of IR, adopting a parametric approach. The proposed modelling allows for multivariate parametric analysis; in particular M(ANOVA), discriminant analysis, model-based clustering, robust estimation and outlier detection are addressed. The referred modelling and methods are implemented in the R package MAINT.Data, available on CRAN.
Paula Brito is Associate Professor at the Faculty of Economics of the University of Porto, and member of the Artificial Intelligence and Decision Support Research Group (LIAAD) of INESC TEC, Portugal. She holds a doctorate degree in Applied Mathematics from the University Paris Dauphine, and a Habilitation in Applied Mathematics from the University of Porto. Her current research focuses on the analysis of multidimensional complex data, known as symbolic data, for which she develops statistical approaches and multivariate analysis methodologies. She has been involved in two European research projects and coordinated the Portuguese participation in the H2020 FinTech project. Paula Brito was president of the International Association for Statistical Computing (IASC-ISI) in 2013-2015. She has authored a large number of papers in highly ranked journals in her field, has been invited speaker at several international conferences, is regularly member of international program committees, and has been chair of the international conferences COMPSTAT 2008 and IFCS 2022.
Pedro Duarte Silva is an Associate Professor at the Catholic Porto Business School, and member of its research center (CEGE). He holds a doctorate degree in Business Administration from the Terry College of Business of the University of Georgia. His research focuses on the intersection between Data Analysis and Machine Learning, Multivariate Statistics and Operations Research, with a particular focus on the development of novel methodologies for the analysis of big and complex data. He is the author of numerous communications at reputed scientific conferences and his research has been published in highly ranked scientific journals such as The European Journal of Operation Research, Computational Statistic and Data Analysis, Decision Sciences, Computational Statistics, and The Journal of Multivariate Analysis.
Federal University of Santa Catarina, Brazil
Statistical Methods in Educational Assessment: Theory, Applications, Computational Aspects and Challenges.
In this lecture we will be presenting and discussing the main models and concepts of the Item Response Theory - IRT, for the measurement of the latent trait proficiency, and the hierarchical/multilevel modeling, for the study of factors associated with proficiency. Theoretical, applied and computational aspects will be dealt with, seeking to point out some research challenges/topics. Applications of IRT in other areas, such as Psychiatry, Nutrition, Physiotherapy and Engineering will also be presented.
Paraná Federal University (UFPR), Brazil.
Wagner Hugo Bonat is Researcher and Lecturer of the Department of Statistics at Paraná Federal University - UFPR, where he has been since 2010. He is the Head of the Data Science and Big Data program (DSBD) and a member of the Laboratory of Statistics and Geoinformation (LEG). He received a B.S. from Paraná Federal University in 2008, and an M.S. from the Paraná Federal University in 2010. He received his Ph.D. in Mathematics and Computer Science from the University of Southern Denmark in 2016. His research lies on statistical modelling and estimating functions. Much of his work has been on extending the generalized linear model class to deal with multiple response variables. His main contribution is a new class of multivariate regression models called Multivariate Covariance Generalized Linear models (McGLMs) and the associated R package (mcglm).
California State University- San Bernardino (CSUSB), USA.
Sastry G. Pantula, Dean of the College of Natural Sciences at California State University- San Bernardino (CSUSB), is nationally and internationally recognized as a leader in statistical sciences. Most recently, he has served as the Director of Data Analytics programs at Oregon State University and a Professor of Statistics. He has served as the dean of the College of Science for four years at Oregon State University from August 2013 to August 2017, after serving a three-year term as Director for the Division of Mathematical Sciences at the National Science Foundation. Pantula spent more than 30 years as a statistics professor at North Carolina State University (NCSU), where he began his academic career in 1982. At NCSU, he also served as the Director of Graduate Programs (1994-2002) and the Head of the Department of Statistics (2002-2010). In all of his administrative roles, he has focused on enhancing the quality, quantity and diversity within the department, the division and the college. His core values are excellence, diversity and harmony: strive for excellence, enhance diversity and foster harmony.
He is a Fellow of the American Association for the Advancement of Science (AAAS) and the American Statistical Association (ASA). He served as ASA president in 2010 and received the ASA Founders Award in 2014. Pantula is a member of the honor societies Phi Kappa Phi, Sigma Xi and Mu Sigma Rho. He is also a member of the NCSU Academy of Outstanding Teachers. Pantula received bachelor’s and master’s degrees in statistics from the Indian Statistical Institute in Kolkata, India, and a Ph.D. in statistics from Iowa State University.
Erasmus School of Economics (ESE), Netherlands.
Patrick J.F. Groenen is a professor of statistics at the Erasmus School of Economics (ESE). He currently is also dean of that school. Professor Groenen's work focuses on data science techniques and their numerical algorithms. He is the co-author of several textbooks on multidimensional scaling published by Springer and has published articles in the top peer-reviewed journals including, among others, the Journal of Machine Learning Research, the Journal of Marketing Research, Psychological Methods, Psychometrika, the Journal of Classification, Computational Statistics and Data Analysis, the British Journal of Mathematical and Statistical Psychology, and the Journal of Empirical Finance.
Federal University of São Carlos (UFSCar), Brazil.
Uncertainty Quantification in Machine Learning
Machine learning methods have an increasing ability to create models with good predictive power. However, even the best models make mistakes. To mitigate the effect of these errors on subsequent decision-making, it is essential to be able to quantify the uncertainty associated with each prediction. In this seminar, I will discuss recent methods of uncertainty quantification that I have recently developed.
Rafael is an Assistant Professor at the Department of Statistics of the Federal University of São Carlos (UFSCar), Brazil. He obtained his PhD degree in the Department of Statistics & Data Science at Carnegie Mellon University (CMU), USA. Prior to that, he graduated and received Master’s degree at the University of São Paulo. He is a CNPq Research Fellow and is interested in theory, methodology, applications, and foundations of statistics and machine learning.
Active learning: A way to cope with large unlabelled data sets
In an age where people produce large amounts of data, often times labeling such data becomes extremely costly. Such annotation problem occurs mainly in fields of knowledge where it is required the performance of specialists who are difficult to access or even with little time dedicated to curating a data set. One of the strategies to get around this problem is to use active learning, which uses machine learning to learn with little annotated data and then be able to annotate large volumes of unlabelled data with the help of an oracle. In this talk, I will discuss the related issues and how the active learning field helps to solve such issues.
Luciano Rebouças holds a Ph.D. in Electrical and Computer Engineering, from the Institute of Systems and Robotics University of Coimbra, a master's degree in Mechatronics, and a bachelor’s in computer science at the Federal University of Bahia (UFBA). He is an Associate Professor at the Dept. of Computer Science, at Institute of Computing, UFBA, and head of the Intelligent Vision Research Lab (http://ivisionlab.ufba.br). He is a specialist in the field of Computer Vision and Machine Learning while his applied research is focused mainly on robotics, smart cities, biometric systems, and biomedicine.
Federal University of Paraná, Brazil.
Convolutional Support Vector Model: prediction of coronavirus disease using chest x-rays
The disease caused by the coronavirus (COVID-19) has been plaguing the world for the last two years. In this paper, a complete and applied study of convolutional support machines will be presented to classify patients infected with COVID-19 using X-ray data and comparing them with traditional convolutional neural networks (CNN). Based on the fitted models, it was possible to observe that the proposed convolutional support vector machine with the polynomial kernel has a better predictive performance. In addition to the results obtained based on real images, the behavior of the models studied was observed through simulated images, where it was possible to observe the advantages of support vector machine (SVM) models.
Bachelor degree (2009) and Master degree (2011) in Statistics, titles obtained at Federal University of São Carlos (UFSCar). PhD in Statistics (2016) through the Graduate Programs in Statistics (PPGEst-UFSCar) and Graduate Studies in Computer Science (PPG-CC-UFSCar). Lecturer in the Specialization in Data Science & Big Data (DSBD-UFPR), MBA in Financial Analytics (DAAGE-UTFPR) and in Specialization in Data Science and Big Data (ECD-UFBA). Since August 2021, Assistant Professor at Department of Statistics, Federal University of Paraná (DEST-UFPR), Curitiba-PR, Brazil. Assistant Professor at Department of Statistics, Federal University of Bahia (DEST-UFBA), Salvador-BA, Brazil (2017-2021). Lecturer at Faculty of Technology SENAI-SP, São Carlos-SP, Brazil (2009-2015). His research areas include statistical machine learning, statistical inference, computational methods and big data analytics.
Rice University, USA.
CFast Minipatch Ensemble Strategies for Discovery and Inference
Enormous quantities of data are collected in many industries and disciplines; this data holds the key to solving critical societal and scientific problems. Yet, fitting models to make discoveries from this huge data often poses both computational and statistical challenges. In this talk, we propose a new ensemble learning strategy primed for fast, distributed, and memory-efficient computation that also has many statistical advantages. Inspired by random forests, stability selection, and stochastic optimization, we propose to build ensembles based on tiny subsamples of both observations and features that we term minipatches. While minipatch learning can easily be applied to prediction tasks similarly to random forests, this talk focuses on using minipatch ensemble approaches in unconventional ways: making data-driven discoveries and for statistical inference. Specifically, we will discuss using this ensemble strategy for feature selection, clustering, and graph learning as well as for distribution-free and model-agnostic inference for both predictions and important features. Through huge real data examples from neuroscience, genomics and biomedicine, we illustrate the computational and statistical advantages of our minipatch ensemble learning approaches.
(Duke University, USA)
Introduction of Some of Data Science
We review some of the applications of data science in classification, cluster analysis, and text analytic applications. Topics include support vector machines, random forests, boosting, hierarchical agglomerative clustering, mixture models, and latent Dirichlet allocation.
David Banks is a professor of statistics at Duke University and a fellow of the ASA, IMS and AAAS. He is a past editor of the Journal of the American Statistical Association and founding editor of Statistics and Public Policy. His research areas include agent-based models, adversarial risk analysis, dynamic networks, text data, and human rights statistics.
Federal University of Bahia, Brazil.
Statistical Process Control (SPC) for overdispersed count and unit data using R
The great competitiveness in the current market makes the companies’ search for excellence highly necessary. In this context, Statistical Process Control (SPC) is a very important and widely used alternative. One of its main techniques is the control charts, in which it is possible to observe whether the process is out of statistical control or not. Two types of data that have been receiving a lot of attention in the SPC literature are: (i) count data, which are present in various everyday situations (such as the number of nonconforming/defective items in a production line, the number of COVID-19 cases or deaths per epidemiological week, etc.) and often exhibit overdispersion (that is, variance greater than mean); and (ii) continuous data in the interval (0,1), or unit data, with applications in a wide range of areas, such as ecology, economics and industry, among others (relative air humidity and inflation rate are some examples of unit variables). Therefore, the main objective of this short course is to discuss SPC in this framework and also to provide the implementation and availability of functions in R, capable of generating control charts for the statistical monitoring of non-normal processes (e.g., count and unit data) via classical and Bayesian inferential approaches.
Paulo Henrique Ferreira da Silva received the B.Sc., M.Sc. and Ph.D. degrees in statistics from the Federal University of São Carlos (UFSCar), Brazil, in 2009, 2011 and 2015, respectively. He is currently a Professor of Statistics with the Institute of Mathematics and Statistics, Federal University of Bahia (UFBA), Brazil. He held a Postdoctoral Training with the University of São Paulo (USP), Brazil, in 2019. His main research interests include survival and reliability analysis, data mining, and statistical process control.
Federal University of Sergipe (UFS)
Federal University Of Bahia (UFBA)