Sessions: Difference between revisions
Line 72: | Line 72: | ||
== Session 6 - Societal issues. Privacy. GDPR == | == Session 6 - Societal issues. Privacy. GDPR == | ||
* Kitchin, chapters | * Kitchin, chapters 13-14 and 17-19 | ||
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?] | * [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?] | ||
Line 78: | Line 78: | ||
Supplementary: | Supplementary: | ||
* Kitchin, | * Kitchin, chapters 12 and 15-16 are also recommended reading | ||
* EU's [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text | * EU's [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text | ||
Revision as of 11:21, 1 November 2022
Tentative themes for each session
- Thursday August 18th: Introduction meeting File:IntroductionMeeting.pdf
- Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
- Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter
- Thursday September 29th: Session 3 - Streaming Spark. Big-data architectures. Kafka
- Thursday October 13th: Session 4 - Cloud computing. NREC an Openstack
- Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
- Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
- Thursday November 24th: Session 7 - Essay presentations
- Thursday December 8th: Session 8 - Project demonstrations
Session 1 - Introduction to big data. Big-data processing. Spark
- Kitchin, chapters 1, 4-5
- Chambers & Zaharia, chapters 1-3, 12, 15
- Slides: File:S01-BigData-published.pdf File:S01-Spark-published.pdf
Supplementary:
- Section 1 in Opdahl, A. L., & Nunavath, V. (2020). Big Data. Big Data in Emergency Management: Exploitation Techniques for Social and Mobile Data, 15-29. Paper
- Spark 3.3.0 Overview and Quick Start (with Python examples)
Session 2 - More about Spark. Data sources. Twitter
- Chambers & Zaharia, chapters 4-9 (chapter 10 on SQL is also very relevant)
- Kitchin, chapter 3
- Slides: File:S02-OrganisationINFO319-published.pdf File:S02-DataSources-published.pdf File:DanielRosnes-Introduction-to-Tweepy-and-Twitter-API-2.0.pdf File:S02-MoreSpark-published.pdf
Guest presentation: Daniel Rosnes on using Twitter data for the news: Introduction to Twitter API v2 and Tweepy
Supplementary:
- Chambers & Zaharia, chapter 10 (perhaps mandatory too)
- Twitter API v2
- Tweepy: Twitter for Python
- Tweepy Documentation
Session 3 - Streaming Spark. Big-data architectures. Kafka
- Chambers & Zaharia, chapters 20-21
- Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper and poster: File:A1-Poster-NIKT2021.pdf
- Kafka Introduction
- Slides: File:S03-StreamingSpark-published.pdf File:S03-MoreSpark-published.pdf File:S03-Kafka-published.pdf File:S03-ResearchMethod-published.pdf
The Guest Talk on architectures and the News Hunter platform is postponed to a later session.
Supplementary:
- kafka-python API
- Structured Streaming Spark Programming Guide
- News Hunter:
- Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. Paper
- Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper
- Design science research method:
- Design Science in Information Systems Research by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. (You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)
- Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. File:Hevner2007-ThreeCycleView-SJIS.pdf
Session 4 - Cloud computing. NREC and Openstack
- NREC and OpenStack, the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
- Slides: File:S04-OpenStack-published.pdf File:S04-UbuntuLinux-published.pdf
Guest presentation: Sohail Khan on computer vision and deep networks for image analysis. His slides and demo code are uploaded to mitt.uib.no under Files (size and file-type limitations).
There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.
Session 5 - Cloud management. Terraform and Ansible.
- TerraForm and NREC part I, part II, and part III
- How Ansible Works and the Ansible Community portal
- Slides: File:S05-Terraform-Ansible-published.pdf File:S05-NewsAngler-published.pdf
Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture. Slides: File:MarcGallofre-BigDataArchitecture.pdf
Comment: Hopefully, we can introduce Docker and Kubernetes in later sesssions.
Session 6 - Societal issues. Privacy. GDPR
- Kitchin, chapters 13-14 and 17-19
- What is GDPR, the EU’s new data protection law?
Guest presentation: Laurence Dierickx on aspects of big-data quality
Supplementary:
- Kitchin, chapters 12 and 15-16 are also recommended reading
- EU's General Data Protection Regulation (GDPR) - the official legal text