Sessions: Difference between revisions

From info319
No edit summary
Line 2: Line 2:
* Thursday August 18th: Introduction meeting [[File:IntroductionMeeting.pdf]]
* Thursday August 18th: Introduction meeting [[File:IntroductionMeeting.pdf]]
* Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
* Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
* Thursday September 15th: Session 2 - More about Spark. Twitter's API and tweepy
* Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter's API and tweepy
* Thursday September 29th: Session 3 - Streaming Spark. Big data architecture. Kafka
* Thursday September 29th: Session 3 - Streaming Spark. Big data architecture. Kafka
* Thursday October 13th: Session 4 - Cloud, NREC and Openstack
* Thursday October 13th: Session 4 - Cloud computing. NREC and Openstack
* Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker
* Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
* Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
* Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
* Thursday November 24th: Session 7 - Essay presentations
* Thursday November 24th: Session 7 - Essay presentations
Line 19: Line 19:
* Spark 3.3.0  [https://spark.apache.org/docs/latest/overview.html Overview] and [https://spark.apache.org/docs/latest/quick-start.html Quick Start (with Python examples)]
* Spark 3.3.0  [https://spark.apache.org/docs/latest/overview.html Overview] and [https://spark.apache.org/docs/latest/quick-start.html Quick Start (with Python examples)]


== Session 2 - More about Spark ==
== Session 2 - More about Spark. Data sources. Twitter's API and tweepy ==
* Chambers & Zaharia, chapters 4-9
* Chambers & Zaharia, chapters 4-9
* Kitchin, chapter 3
Guest presentation: Daniel Rosnes on using Twitter data for the news


Supplementary:
Supplementary:
* Twitter API, tweepy
* [https://developer.twitter.com/en/docs/twitter-api Twitter API v2]
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python]
* [https://docs.tweepy.org/en/latest/ Tweepy Documentation]


== Session 3 - Streaming Spark. Big data architecture. Kafka ==
== Session 3 - Streaming Spark. Big data architecture. Kafka ==
* Chambers & Zaharia, chapters 20-21
* Chambers & Zaharia, chapters 20-21
* ''Paper on big-data architecture (TBA)''
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] [Poster]
* [https://kafka.apache.org/intro Kafka Introduction]
* [https://kafka.apache.org/intro Kafka Introduction]


== Session 4 - Cloud, NREC and Openstack ==
Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture
* NREC. OpenStack
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] [Poster]


Supplementary:
Supplementary:
* Berven at al. ...
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]


== Session 5 - Cloud management. Terraform and Ansible. Docker ==
== Session 4 - Cloud computing. NREC and Openstack ==
* NREC. OpenStack
 
Guest presentation: Sohail Khan on computer vision and deep networks for image analysis
 
== Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes ==
* Terraform, Ansible
* Terraform, Ansible
* Docker
* Docker, Kubernetes


== Session 6 - Societal issues. Privacy. GDPR ==
== Session 6 - Societal issues. Privacy. GDPR ==
* Kitchin, chapters 12-19
* Kitchin, chapters 12-19 (only a few of them are mandatory, perhaps 12-13 and 17)
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
Guest presentation: Laurence Dierickx on aspects of big-data quality
Supplementary:
* Kitchin, the rest of chapters 12-19 (perhaps 14, 18 and 19)
* [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text


== Session 7 - Essay presentations ==
== Session 7 - Essay presentations ==


== Session 8 - Project demonstrations ==
== Session 8 - Project demonstrations ==

Revision as of 15:54, 4 September 2022

Tentative themes for each session

  • Thursday August 18th: Introduction meeting File:IntroductionMeeting.pdf
  • Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
  • Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter's API and tweepy
  • Thursday September 29th: Session 3 - Streaming Spark. Big data architecture. Kafka
  • Thursday October 13th: Session 4 - Cloud computing. NREC and Openstack
  • Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
  • Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
  • Thursday November 24th: Session 7 - Essay presentations
  • Thursday December 8th: Session 8 - Project demonstrations

Session 1 - Introduction to big data. Big-data processing. Spark

Supplementary:

Session 2 - More about Spark. Data sources. Twitter's API and tweepy

  • Chambers & Zaharia, chapters 4-9
  • Kitchin, chapter 3

Guest presentation: Daniel Rosnes on using Twitter data for the news

Supplementary:

Session 3 - Streaming Spark. Big data architecture. Kafka

  • Chambers & Zaharia, chapters 20-21
  • Paper on big-data architecture (TBA)
  • Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper [Poster]
  • Kafka Introduction

Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture

Supplementary:

  • Berven at al. ...
  • Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper

Session 4 - Cloud computing. NREC and Openstack

  • NREC. OpenStack

Guest presentation: Sohail Khan on computer vision and deep networks for image analysis

Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes

  • Terraform, Ansible
  • Docker, Kubernetes

Session 6 - Societal issues. Privacy. GDPR

Guest presentation: Laurence Dierickx on aspects of big-data quality

Supplementary:

Session 7 - Essay presentations

Session 8 - Project demonstrations