Coding for Data Science and Data Management, DSE UniMI – 2019/2020

The course aims at providing technical skills about coding/scripting aspects for data analysis and to manage persistent data storage of sources and results involved in analysis. On the one side, the Python programming language and the R framework are illustrated. The goal is to deal with essential notions about data structures and control structures of both Python and R. On the other side, the goal is to present the core notions of relational databases, such as keys, integrity, and primary/foreign key constraints, as well as the SQL language for data definition, manipulation, and query. Recent and innovative NoSQL solutions are also discussed, with special focus on a document-oriented system called MongoDB.

Course Structure

  1. R
  2. Python
  3. Databases

Syllabus (R)

Coding for Data Science and Data Management (R), DSE UniMI – 2019/2020

Lectures (R)

  • Introduction to the R framework and R Studio (html)
  • Basic Data Types (html)
  • Basic Data Structures (html)
  • Basic operations (html)
  • Time Series (html)
  • Control Structures (html)
  • User-Defined Functions (html)
  • Performance Optimization (html)
  • Data Acquisition (html)
  • Data visualization (ggplot2) (plotly)
  • Building interactive interfaces, documents and websites (shiny) (rmarkdown)
  • Building R packages (structure) (metadata) (data)

Midterm Exam (R)

Midterm exam: R package (pdf) (grades)

Important notice:  

  • the grade obtained in the R midterm is valid until Feb 2021
  • if you passed the R midterm you don’t have to take the R module in the written exam
  • if you decide to take the R module in the written exam, this will overwrite the midterm whatever the new result is (i.e. the midterm grade won’t be valid any longer)
  • if you wish to immediately reject the midterm grade you can contact me at [email protected]

COVID-19 Exam (R)

Supported by the Institute for Data Valorization IVADO, Canda, we aim at providing the research community with a unified data hub by collecting worldwide fine-grained data merged with demographics, air pollution, and other exogenous variables helpful for a better understanding of COVID-19. The data are collected with the R package COVID19.

Jump on the mission! Extend the package with a new data source and you will score 30/30 in the R module of the course. Notes:

  • this is a pass-or-fail exam: score 30/30 or fail.
  • if you already passed the previous R midterm, you can only improve your grade.
  • you don’t need to subscribe anywhere to take the exam. Email me after you successfully extended the package with a new real time data source and you will score 30/30.
  • there is no formal deadline for the midterm. Try, learn, try again and succeed!
  • after passing the exam, you will be acknowledged among the contributors of the project.
  • this is a real-life, open-source project. You can work on it independently of the exam.
  • get ready to work with an international team of developers. Be patient, be friendly, and focus on ideas. We are all here to learn and improve!

Project website: https://covid19datahub.io

For any question, contact me at [email protected]