Sessions: Difference between revisions
No edit summary |
|||
Line 2: | Line 2: | ||
* Thursday August 18th: Introduction meeting [[File:IntroductionMeeting.pdf]] | * Thursday August 18th: Introduction meeting [[File:IntroductionMeeting.pdf]] | ||
* Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark | * Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark | ||
* Thursday September 15th: Session 2 - More about Spark. Twitter's API and tweepy | * Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter's API and tweepy | ||
* Thursday September 29th: Session 3 - Streaming Spark. Big data architecture. Kafka | * Thursday September 29th: Session 3 - Streaming Spark. Big data architecture. Kafka | ||
* Thursday October 13th: Session 4 - Cloud | * Thursday October 13th: Session 4 - Cloud computing. NREC and Openstack | ||
* Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker | * Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes | ||
* Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR | * Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR | ||
* Thursday November 24th: Session 7 - Essay presentations | * Thursday November 24th: Session 7 - Essay presentations | ||
Line 19: | Line 19: | ||
* Spark 3.3.0 [https://spark.apache.org/docs/latest/overview.html Overview] and [https://spark.apache.org/docs/latest/quick-start.html Quick Start (with Python examples)] | * Spark 3.3.0 [https://spark.apache.org/docs/latest/overview.html Overview] and [https://spark.apache.org/docs/latest/quick-start.html Quick Start (with Python examples)] | ||
== Session 2 - More about Spark == | == Session 2 - More about Spark. Data sources. Twitter's API and tweepy == | ||
* Chambers & Zaharia, chapters 4-9 | * Chambers & Zaharia, chapters 4-9 | ||
* Kitchin, chapter 3 | |||
Guest presentation: Daniel Rosnes on using Twitter data for the news | |||
Supplementary: | Supplementary: | ||
* Twitter API | * [https://developer.twitter.com/en/docs/twitter-api Twitter API v2] | ||
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python] | |||
* [https://docs.tweepy.org/en/latest/ Tweepy Documentation] | |||
== Session 3 - Streaming Spark. Big data architecture. Kafka == | == Session 3 - Streaming Spark. Big data architecture. Kafka == | ||
* Chambers & Zaharia, chapters 20-21 | * Chambers & Zaharia, chapters 20-21 | ||
* ''Paper on big-data architecture (TBA)'' | |||
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] [Poster] | |||
* [https://kafka.apache.org/intro Kafka Introduction] | * [https://kafka.apache.org/intro Kafka Introduction] | ||
Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture | |||
Supplementary: | Supplementary: | ||
* Berven at al. ... | |||
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper] | * Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper] | ||
== Session 5 - Cloud management. Terraform and Ansible. Docker == | == Session 4 - Cloud computing. NREC and Openstack == | ||
* NREC. OpenStack | |||
Guest presentation: Sohail Khan on computer vision and deep networks for image analysis | |||
== Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes == | |||
* Terraform, Ansible | * Terraform, Ansible | ||
* Docker | * Docker, Kubernetes | ||
== Session 6 - Societal issues. Privacy. GDPR == | == Session 6 - Societal issues. Privacy. GDPR == | ||
* Kitchin, chapters 12-19 | * Kitchin, chapters 12-19 (only a few of them are mandatory, perhaps 12-13 and 17) | ||
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?] | * [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?] | ||
Guest presentation: Laurence Dierickx on aspects of big-data quality | |||
Supplementary: | |||
* Kitchin, the rest of chapters 12-19 (perhaps 14, 18 and 19) | |||
* [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text | |||
== Session 7 - Essay presentations == | == Session 7 - Essay presentations == | ||
== Session 8 - Project demonstrations == | == Session 8 - Project demonstrations == |
Revision as of 15:54, 4 September 2022
Tentative themes for each session
- Thursday August 18th: Introduction meeting File:IntroductionMeeting.pdf
- Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
- Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter's API and tweepy
- Thursday September 29th: Session 3 - Streaming Spark. Big data architecture. Kafka
- Thursday October 13th: Session 4 - Cloud computing. NREC and Openstack
- Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
- Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
- Thursday November 24th: Session 7 - Essay presentations
- Thursday December 8th: Session 8 - Project demonstrations
Session 1 - Introduction to big data. Big-data processing. Spark
- Kitchin, chapters 1, 4-5
- Chambers & Zaharia, chapters 1-3, 12, 15
- Slides: File:S01-BigData-published.pdf File:S01-Spark-published.pdf
Supplementary:
- Section 1 in Opdahl, A. L., & Nunavath, V. (2020). Big Data. Big Data in Emergency Management: Exploitation Techniques for Social and Mobile Data, 15-29. Paper
- Spark 3.3.0 Overview and Quick Start (with Python examples)
Session 2 - More about Spark. Data sources. Twitter's API and tweepy
- Chambers & Zaharia, chapters 4-9
- Kitchin, chapter 3
Guest presentation: Daniel Rosnes on using Twitter data for the news
Supplementary:
Session 3 - Streaming Spark. Big data architecture. Kafka
- Chambers & Zaharia, chapters 20-21
- Paper on big-data architecture (TBA)
- Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper [Poster]
- Kafka Introduction
Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture
Supplementary:
- Berven at al. ...
- Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper
Session 4 - Cloud computing. NREC and Openstack
- NREC. OpenStack
Guest presentation: Sohail Khan on computer vision and deep networks for image analysis
Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
- Terraform, Ansible
- Docker, Kubernetes
Session 6 - Societal issues. Privacy. GDPR
- Kitchin, chapters 12-19 (only a few of them are mandatory, perhaps 12-13 and 17)
- What is GDPR, the EU’s new data protection law?
Guest presentation: Laurence Dierickx on aspects of big-data quality
Supplementary:
- Kitchin, the rest of chapters 12-19 (perhaps 14, 18 and 19)
- General Data Protection Regulation (GDPR) - the official legal text