DATA PRODUCTION AND ANALYSIS | Università degli studi di Bergamo

DATA PRODUCTION AND ANALYSIS

Attività formativa monodisciplinare
Codice dell'attività formativa: 
110010-ENG

Scheda dell'insegnamento

Per studenti immatricolati al 1° anno a.a.: 
2021/2022
Insegnamento (nome in italiano): 
DATA PRODUCTION AND ANALYSIS
Insegnamento (nome in inglese): 
Data Production and Analysis
Tipo di attività formativa: 
Attività formativa Caratterizzante
Tipo di insegnamento: 
Obbligatoria
Settore disciplinare: 
STATISTICA ECONOMICA (SECS-S/03)
Anno di corso: 
1
Anno accademico di offerta: 
2021/2022
Crediti: 
6
Responsabile della didattica: 
Altri docenti: 

Altre informazioni sull'insegnamento

Modalità di erogazione: 
Didattica Convenzionale
Lingua: 
Inglese
Ciclo: 
Primo Semestre
Obbligo di frequenza: 
No
Ore di attività frontale: 
48
Ore di studio individuale: 
102
Ambito: 
Statistico-matematico
Prerequisites

None

Educational goals

“Data production and analysis” offers a solid background for professional activities in the field of data production, collection, treatment and analysis. This overview aims at supporting students in empirical research developed in different fields (such as statistics, economics and social sciences). The course focuses on the official data production system, within the modernization and coordination framework of official statistics producers. On the one hand, the course will guide and encourage a professional approach characterized by the need of official statistics, eventually integrated with other types of available sources. On the other hand, it introduces the most common data collection and production methods (including both official and survey data frameworks). Such data can be used in various fields (economic and social data, health, and so forth) for various purposes. Students will achieve ability in applying adequate methods at different steps of the data production or collection process (such as: quality check, missing data treatment, imputation procedures, outlier detection) in order to check or enhance the data quality. Student will also practice the data production and analysis “cycle”, treating practical cases throughout both course lectures and laboratories. Ability to process, analyze and model the data will be also practically deepened learning to use (and to program) one of the most widely spread statistical software, worldwide: SAS.

The main specific course objectives are:
• Students will be able to design and manage data production processes (for different fields).
• Students will be able to evaluate as well as enhance data quality (having in mind the definition of the main dimensions of data quality, learning how to evaluate and detect data issue and applying suitable methods for optimizing the data quality).
• Students will learn how to analyse data with a hierarchical structure (multilevel modelling).
• Students will acquire ability in critically choosing and properly using different types of data sources (including censuses, cross sectional or longitudinal surveys, administrative sources).
• Students will become familiar with the most common business concepts linked to official statistics production (e.g., Generic Statistical Business Process Model, data archiving, metadata management, statistical standard classification, imputation).

The course is fully coherent with the education aims of the EMOS (European Master in Official Statistics) label, of which it is a milestone, as well as of the Master course in EDA (Economics and Data Analysis).

Course content

The course is organized into two consequent and complementary modules.

Module 1 (teacher: Daniele Toninelli)

• GSBPM (Generic Statistical Business Process Model) step by step: how an official statistics business model is organized and works for data production processes.
• Role of metadata: in order to enhance the quality standards of the information produced and to provide best practices in clearly communicating/sharing and visualizing statistical or quantitative outputs of any type.
• Data editing and imputation methods in practice: how to professionally fix the most common issues with collected raw data optimizing both their quality, usability and trust.
• Introduction to the SAS statistical software by means of an user-friendly interface: how to use SAS Enterprise Guide and first steps in SAS programming.
• Multilevel modelling: when it is useful, how to estimate such models with SAS, how to interpret and use the main output for decision making in practical contexts.

Module 2 (teacher: Annamaria Bianchi)

• The steps in the data production process, the decisions due at each step. How to be aware of interactions between the different steps, pros and cons for statistical purposes.
• Probability-based surveys: basic concepts, sampling methods, mode of data collection, errors and total survey error paradigm, quality framework, European Statistics Code of Practice, questionnaire design, non-response analysis and non-response correction methods, estimation.
• Sample selection, estimation and non-response analysis with SAS.
• Non-probability samples: convenience samples, quota samples, volunteer web panels.
• Coverage and self-selection problems in non-probability samples.

Teaching methods

The course includes both lectures and lab sessions with constant teacher-student interaction and discussion; it stimulates an active student participation.
Thematic seminars and workshop (e.g., about programming with SAS) will be also proposed.

Personal research projects can also be proposed to students.

Assessment and Evaluation

The course exam will be organized into two different parts, corresponding to the topics of Module 1 and 2; if possible, there will be a lag of at least 1 to 5 days between the two parts of the exam. For each of the two parts the maximum score is 31 (30 cum laude).
The final evaluation is given by the simple average of the two scores.

Each module evaluation can be based on:

• Module 1: theoretical written final exam including tests (T/F) or multiple choice questions and open-ended questions and/or questions of other types (e.g., brief exercises or applications). In addition, the ability of using SAS will be also assessed with a practical challenge in data analysis and/or with personal analyses developed by students.
Module 2: written exam including theoretical questions (T/F tests, multiple choice, and open-ended questions) and exercises.

Each module’s exam final score can be integrated by:
• Presentations (made by individual students or by group) of case studies, research results and/or deeper discussions about specific course topics.
• Evaluation of assessments provided by the teachers, including case studies, reports and presentations for mates.
• Other periodical evaluations during the course.

The average final exam results will be published on the “sportello internet”; for Module 1, detailed exam scores will be also published on the eLearning.

Further information

The course will include the launch of the official SAS Certification path: students will have the chance to start the process in order to obtain the SAS Certification in SAS base programming.

Important note: if the course will take place (partially or fully) online, some changes can be introduced in the course syllabus. This will be necessary in order to adapt both the course and the exam to an online attendance/participation.