Sessions: Difference between revisions

From info319
 
(33 intermediate revisions by the same user not shown)
Line 20: Line 20:


== Session 2 - More about Spark. Data sources. Twitter ==
== Session 2 - More about Spark. Data sources. Twitter ==
* Chambers & Zaharia, chapters 4-9
* Chambers & Zaharia, chapters 4-9 (chapter 10 on SQL is also very relevant)
* Kitchin, chapter 3
* Kitchin, chapter 3
* Slides: [[File:S02-OrganisationINFO319-published.pdf]] [[File:S02-DataSources-published.pdf]] [[File:DanielRosnes-Introduction-to-Tweepy-and-Twitter-API-2.0.pdf]] [[File:S02-MoreSpark-published.pdf]]


Guest presentation: Daniel Rosnes on using Twitter data for the news
Guest presentation: Daniel Rosnes on using Twitter data for the news: Introduction to Twitter API v2 and Tweepy


Supplementary:
Supplementary:
* Chambers & Zaharia, chapter 10 ''(perhaps mandatory too)''
* [https://developer.twitter.com/en/docs/twitter-api Twitter API v2]
* [https://developer.twitter.com/en/docs/twitter-api Twitter API v2]
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python]
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python]
Line 32: Line 34:
== Session 3 - Streaming Spark. Big-data architectures. Kafka ==
== Session 3 - Streaming Spark. Big-data architectures. Kafka ==
* Chambers & Zaharia, chapters 20-21
* Chambers & Zaharia, chapters 20-21
* ''Paper on big-data architecture (TBA)''
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] and poster: [[file:A1-Poster-NIKT2021.pdf]]
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] [Poster]
* [https://kafka.apache.org/intro Kafka Introduction]
* [https://kafka.apache.org/intro Kafka Introduction]
* Slides: [[file:S03-StreamingSpark-published.pdf]] [[file:S03-MoreSpark-published.pdf]] [[file:S03-Kafka-published.pdf]] [[file:S03-ResearchMethod-published.pdf]]


Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture
''The Guest Talk on architectures and the News Hunter platform is postponed to a later session.''


Supplementary:
Supplementary:
* Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. [https://scholar.google.com/scholar?output=instlink&q=info:0K5dB1_9nusJ:scholar.google.com/&hl=en&as_sdt=0,5&as_ylo=2018&scillfp=11776208952974186557&oi=lle Paper]
* [https://kafka-python.readthedocs.io/en/master/ kafka-python API]
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Structured Streaming Spark Programming Guide]
* News Hunter:
** Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. [https://scholar.google.com/scholar?output=instlink&q=info:0K5dB1_9nusJ:scholar.google.com/&hl=en&as_sdt=0,5&as_ylo=2018&scillfp=11776208952974186557&oi=lle Paper]
** Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* Design science research method:
** [https://www.jstor.org/stable/25148625#metadata_info_tab_contents Design Science in Information Systems Research] by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. ''(You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)''
** Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. [[File:Hevner2007-ThreeCycleView-SJIS.pdf]]


== Session 4 - Cloud computing. NREC and Openstack ==
== Session 4 - Cloud computing. NREC and Openstack ==
* [https://docs.nrec.no/index.html NREC and OpenStack], the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
* [https://docs.nrec.no/index.html NREC and OpenStack], the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
* Slides: [[file:S04-OpenStack-published.pdf]] [[file:S04-UbuntuLinux-published.pdf]]


Guest presentation: Sohail Khan on computer vision and deep networks for image analysis
Guest presentation: Sohail Khan on computer vision and deep networks for image analysis. His slides and demo code are uploaded to mitt.uib.no under Files (size and file-type limitations).


== Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes ==
There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.
 
== Session 5 - Cloud management. Terraform and Ansible. <!-- Docker and Kubernetes --> ==
* [https://docs.nrec.no/terraform-part1.html TerraForm and NREC part I], [https://docs.nrec.no/terraform-part2.html part II], and [https://docs.nrec.no/terraform-part3.html part III]
* [https://docs.nrec.no/terraform-part1.html TerraForm and NREC part I], [https://docs.nrec.no/terraform-part2.html part II], and [https://docs.nrec.no/terraform-part3.html part III]
* [https://www.ansible.com/overview/how-ansible-works How Ansible Works] and [https://docs.ansible.com/ansible_community.html the Ansible Community portal]
* [https://www.ansible.com/overview/how-ansible-works How Ansible Works] and [https://docs.ansible.com/ansible_community.html the Ansible Community portal]
* Material on Docker and Kubernetes (TBA)
<!--
* Docker Docs: [https://docs.docker.com/get-started/overview/ Docker overview] and [https://docs.docker.com/get-started/overview/ Get started]
* [https://kubernetes.io/docs/tutorials/kubernetes-basics/ Learn Kubernetes basics], modules 1-6
-->
* Slides: [[File:S05-Terraform-Ansible-published.pdf]] [[File:S05-NewsAngler-published.pdf]]
 
Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture. Slides: [[File:MarcGallofre-BigDataArchitecture.pdf]]
 
Comment: Hopefully, we can introduce Docker and Kubernetes in later sesssions.


== Session 6 - Societal issues. Privacy. GDPR ==
== Session 6 - Societal issues. Privacy. GDPR ==
* Kitchin, chapters 12-19 (only a few of them are mandatory, perhaps 12-13 and 17 - TBA)
* Kitchin, chapters 13-14 and 17-19
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
* Slides: [[File:S06-Privacy.pdf]]


Guest presentation: Laurence Dierickx on aspects of big-data quality
Guest presentation: Ghazaal Sheiki on fact checking. Slides:  [[File:GhazaalSheiki-AutomatedFactChecking.pdf]]


Supplementary:
Supplementary:
* Kitchin, the rest of chapters 12-19 (perhaps 14, 18 and 19)
* Kitchin, chapters 12 and 15-16 are also recommended reading
* [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text
* EU's [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text


== Session 7 - Essay presentations ==
== Session 7 - Essay presentations ==


== Session 8 - Project demonstrations ==
== Session 8 - Project demonstrations ==

Latest revision as of 21:16, 10 November 2022

Tentative themes for each session

  • Thursday August 18th: Introduction meeting File:IntroductionMeeting.pdf
  • Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
  • Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter
  • Thursday September 29th: Session 3 - Streaming Spark. Big-data architectures. Kafka
  • Thursday October 13th: Session 4 - Cloud computing. NREC an Openstack
  • Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
  • Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
  • Thursday November 24th: Session 7 - Essay presentations
  • Thursday December 8th: Session 8 - Project demonstrations

Session 1 - Introduction to big data. Big-data processing. Spark

Supplementary:

Session 2 - More about Spark. Data sources. Twitter

Guest presentation: Daniel Rosnes on using Twitter data for the news: Introduction to Twitter API v2 and Tweepy

Supplementary:

Session 3 - Streaming Spark. Big-data architectures. Kafka

The Guest Talk on architectures and the News Hunter platform is postponed to a later session.

Supplementary:

  • kafka-python API
  • Structured Streaming Spark Programming Guide
  • News Hunter:
    • Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. Paper
    • Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper
  • Design science research method:
    • Design Science in Information Systems Research by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. (You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)
    • Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. File:Hevner2007-ThreeCycleView-SJIS.pdf

Session 4 - Cloud computing. NREC and Openstack

Guest presentation: Sohail Khan on computer vision and deep networks for image analysis. His slides and demo code are uploaded to mitt.uib.no under Files (size and file-type limitations).

There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.

Session 5 - Cloud management. Terraform and Ansible.

Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture. Slides: File:MarcGallofre-BigDataArchitecture.pdf

Comment: Hopefully, we can introduce Docker and Kubernetes in later sesssions.

Session 6 - Societal issues. Privacy. GDPR

Guest presentation: Ghazaal Sheiki on fact checking. Slides: File:GhazaalSheiki-AutomatedFactChecking.pdf

Supplementary:

Session 7 - Essay presentations

Session 8 - Project demonstrations