Sessions: Difference between revisions

From info319
Line 44: Line 44:
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Structured Streaming Spark Programming Guide]
* [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Structured Streaming Spark Programming Guide]
* [https://www.jstor.org/stable/25148625#metadata_info_tab_contents Design Science in Information Systems Research] by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. ''(You need to be on UiB's network to access.)''
* Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. [[File:Hevner2007-ThreeCycleView-SJIS.pdf]]


== Session 4 - Cloud computing. NREC and Openstack ==
== Session 4 - Cloud computing. NREC and Openstack ==

Revision as of 21:45, 26 September 2022

Tentative themes for each session

  • Thursday August 18th: Introduction meeting File:IntroductionMeeting.pdf
  • Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
  • Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter
  • Thursday September 29th: Session 3 - Streaming Spark. Big-data architectures. Kafka
  • Thursday October 13th: Session 4 - Cloud computing. NREC an Openstack
  • Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
  • Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
  • Thursday November 24th: Session 7 - Essay presentations
  • Thursday December 8th: Session 8 - Project demonstrations

Session 1 - Introduction to big data. Big-data processing. Spark

Supplementary:

Session 2 - More about Spark. Data sources. Twitter

Guest presentation: Daniel Rosnes on using Twitter data for the news: Introduction to Twitter API v2 and Tweepy

Supplementary:

Session 3 - Streaming Spark. Big-data architectures. Kafka

  • Chambers & Zaharia, chapters 20-21
  • Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper and poster: File:A1-Poster-NIKT2021.pdf
  • Kafka Introduction

We may have to postpone the Guest Talk on architecture and the News Hunter platform to a later session.

Supplementary:

Session 4 - Cloud computing. NREC and Openstack

  • NREC and OpenStack, the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console

Guest presentation: Sohail Khan on computer vision and deep networks for image analysis

Comment: There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.

Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes

Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture

Comment: I really hope to have the time to introduce Docker and Kubernetes properly in the course. In a pinch, we can use some of sessions 7 and 8 too for this.

Session 6 - Societal issues. Privacy. GDPR

Guest presentation: Laurence Dierickx on aspects of big-data quality

Supplementary:

Session 7 - Essay presentations

Session 8 - Project demonstrations