Coding for Data Science and Data Management

This course is part of the MSc in Data Science for Economics, University of Milan, Italy. Academic year 2019/2020.

Description

The course aims at providing technical skills about coding/scripting aspects for data analysis and to manage persistent data storage of sources and results involved in analysis. On the one side, the Python programming language and the R framework are illustrated. The goal is to deal with essential notions about data structures and control structures of both Python and R. On the other side, the goal is to present the core notions of relational databases, such as keys, integrity, and primary/foreign key constraints, as well as the SQL language for data definition, manipulation, and query. Recent and innovative NoSQL solutions are also discussed, with special focus on a document-oriented system called MongoDB.

Course structure

The course is divided in three modules:

  • R (Emanuele Guidotti)
  • Python (Nicolò Cesa-Bianchi)
  • Databases (Stefano Montanelli)

Syllabus (R module)

Introduction to the R framework and R Studio

  • Installation
  • Comments
  • Variable assignment
  • Functions
  • Extension packages
  • R help, StackOverflow, Google

Basic Data Types

  • numeric
  • integer
  • complex
  • logical
  • character

Basic Data Structures

  • vector
  • matrix
  • data.frame
  • list
  • environment

Basic operations

  • Subsetting
  • Arithmetics

Time Series

  • The zoo package
  • The xts package

Control Structures

  • if-else
  • for loops, while loops, repeat, break
  • apply, lapply, sapply

Custom Functions

  • Code quality and modularity
  • How to implement R functions

Speeding up the code

  • The Rcpp package
  • References to parallel computing

Data Acquisition

  • Local & Remote files
  • R packages
  • RESTful APIs
  • Web scraping

Data visualization

  • Base plotting system
  • The ggplot2 package
  • The plotly package

Building interactive interfaces, documents and websites

  • R Shiny
  • Rmarkdown
  • Rbookdown
  • Rblogdown

Building R packages

  • Roxygen
  • Base packages
  • Rcpp packages
  • R Shiny packages

If time allows…

  • Debugging in Rstudio
  • Connect R to Python: the reticulate package
  • Connect R to databases: the RMySQL, RSQLite, RMongo packages
  • Q&A

Exam (R module)

Create an R package which deals with task(s) of your choice. The aim of the project is to asses your ability in dealing with the R programming language. You are free to choose the task(s) you like most.

The project is an individual project, even if you are strongly encouraged to work in groups and help each other. Please do take advantage of online resources and note that copying code snippets is totally fine. On the other hand, copying the whole project or most of it is (obviously) not allowed.

Delivery

Upload the code to your repository1 available at https://github.com/unimi-dse. The package must include a README.md file containing a short description of the package and how to use it.

Deadline

16 February 2020, 23:59.

Evaluation

Passing grade (18/30)

The package must be installed with the following command:

devtools::install_github("unimi-dse/ID")

where ID is the ID of your repository. If the package cannot be installed or raises errors, the exam is failed.

Additional points (9/30)

The package must deal with some of the following topics. Choose the topics you like most. In any case, max 9 points can be earned.

titledescriptionpoints
Data AcquisitionImport data via local/remote files, R packages, APIs or web scraping.3
Data VisualizationVisualize data with ggplot2 or plotly3
Data AnalysisUse R packages to perform some kind of data analysis. The complexity of the analysis is not relevant. The aim is to show your ability to work with R packages.3
Interactive InterfaceBuild an interactive interface with shiny3
Code OptimizationSpeed up the code using Rcpp or parallel computing3

Additional Points (3/30)

Up to 3 additional points are assigned based on:

titledescriptionpoint
READMEUse the Markdown syntax to provide a high quality README.md file.1
DocumentationDocument your functions using roxygen2. It must be possible to access the documentation of your function(s) using the standard syntax ?functioname. If you are developing an interactive interface you still need to document the function that runs the interface.1
Code qualityCoding style and modularity.1

Laude (30/30 with honors)

If you achieve 30/30 in the previous steps, honors can be awarded based on the code quality.

https://www.unimi.it/en/education/degree-programme-courses/2020/coding-data-science-and-data-management


  1. to set up the repository create an account on GitHub and email me with your GitHub username. The email must be sent from your University address and use “DSE Coding Midterm” as subject. On the other hand, I suggest to create your GitHub account using your personal email address. ↩︎