Sessions: Difference between revisions

From info319
No edit summary
 
(26 intermediate revisions by the same user not shown)
Line 20: Line 20:


== Session 2 - More about Spark. Data sources. Twitter ==
== Session 2 - More about Spark. Data sources. Twitter ==
* Chambers & Zaharia, chapters 4-9
* Chambers & Zaharia, chapters 4-9 (chapter 10 on SQL is also very relevant)
* Kitchin, chapter 3
* Kitchin, chapter 3
* Slides: [[File:S02-OrganisationINFO319-published.pdf]] [[File:S02-DataSources-published.pdf]] [[File:DanielRosnes-Introduction-to-Tweepy-and-Twitter-API-2.0.pdf]] [[File:S02-MoreSpark-published.pdf]]


Guest presentation: Daniel Rosnes on using Twitter data for the news
Guest presentation: Daniel Rosnes on using Twitter data for the news: Introduction to Twitter API v2 and Tweepy


Supplementary:
Supplementary:
* Chambers & Zaharia, chapter 10 ''(perhaps mandatory too)''
* [https://developer.twitter.com/en/docs/twitter-api Twitter API v2]
* [https://developer.twitter.com/en/docs/twitter-api Twitter API v2]
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python]
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python]
Line 32: Line 34:
== Session 3 - Streaming Spark. Big-data architectures. Kafka ==
== Session 3 - Streaming Spark. Big-data architectures. Kafka ==
* Chambers & Zaharia, chapters 20-21
* Chambers & Zaharia, chapters 20-21
* ''Paper on big-data architecture (TBA)''
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] and poster: [[file:A1-Poster-NIKT2021.pdf]]
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] and poster: [[file:A1-Poster-NIKT2021.pdf]]
* [https://kafka.apache.org/intro Kafka Introduction]
* [https://kafka.apache.org/intro Kafka Introduction]
* Slides: [[file:S03-StreamingSpark-published.pdf]] [[file:S03-MoreSpark-published.pdf]] [[file:S03-Kafka-published.pdf]] [[file:S03-ResearchMethod-published.pdf]]


Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture
''The Guest Talk on architectures and the News Hunter platform is postponed to a later session.''


Supplementary:
Supplementary:
* Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. [https://scholar.google.com/scholar?output=instlink&q=info:0K5dB1_9nusJ:scholar.google.com/&hl=en&as_sdt=0,5&as_ylo=2018&scillfp=11776208952974186557&oi=lle Paper]
* [https://kafka-python.readthedocs.io/en/master/ kafka-python API]
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Structured Streaming Spark Programming Guide]
* News Hunter:
** Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. [https://scholar.google.com/scholar?output=instlink&q=info:0K5dB1_9nusJ:scholar.google.com/&hl=en&as_sdt=0,5&as_ylo=2018&scillfp=11776208952974186557&oi=lle Paper]
** Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://link.springer.com/article/10.1007/s10270-020-00801-w Paper]
* Design science research method:
** [https://www.jstor.org/stable/25148625#metadata_info_tab_contents Design Science in Information Systems Research] by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. ''(You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)''
** Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. [[File:Hevner2007-ThreeCycleView-SJIS.pdf]]


== Session 4 - Cloud computing. NREC and Openstack ==
== Session 4 - Cloud computing. NREC and Openstack ==
* [https://docs.nrec.no/index.html NREC and OpenStack], the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
* [https://docs.nrec.no/index.html NREC and OpenStack], the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
* Slides: [[file:S04-OpenStack-published.pdf]] [[file:S04-UbuntuLinux-published.pdf]]


Guest presentation: Sohail Khan on computer vision and deep networks for image analysis
Guest presentation: Sohail Khan on computer vision and deep networks for image analysis. His slides and demo code are uploaded to mitt.uib.no under Files (size and file-type limitations).


Comment: There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.
There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.


== Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes ==
== Session 5 - Cloud management. Terraform and Ansible. <!-- Docker and Kubernetes --> ==
* [https://docs.nrec.no/terraform-part1.html TerraForm and NREC part I], [https://docs.nrec.no/terraform-part2.html part II], and [https://docs.nrec.no/terraform-part3.html part III]
* [https://docs.nrec.no/terraform-part1.html TerraForm and NREC part I], [https://docs.nrec.no/terraform-part2.html part II], and [https://docs.nrec.no/terraform-part3.html part III]
* [https://www.ansible.com/overview/how-ansible-works How Ansible Works] and [https://docs.ansible.com/ansible_community.html the Ansible Community portal]
* [https://www.ansible.com/overview/how-ansible-works How Ansible Works] and [https://docs.ansible.com/ansible_community.html the Ansible Community portal]
* ''Material on Docker and Kubernetes (TBA)''
<!--
* Docker Docs: [https://docs.docker.com/get-started/overview/ Docker overview] and [https://docs.docker.com/get-started/overview/ Get started]
* [https://kubernetes.io/docs/tutorials/kubernetes-basics/ Learn Kubernetes basics], modules 1-6
-->
* Slides: [[File:S05-Terraform-Ansible-published.pdf]] [[File:S05-NewsAngler-published.pdf]]


Comment: I really hope to have the time to introduce Docker and Kubernetes properly in the course. In a pinch, we can use some of sessions 7 and 8 too for this.
Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture. Slides: [[File:MarcGallofre-BigDataArchitecture.pdf]]
 
Comment: Hopefully, we can introduce Docker and Kubernetes in later sesssions.


== Session 6 - Societal issues. Privacy. GDPR ==
== Session 6 - Societal issues. Privacy. GDPR ==
* Kitchin, chapters 12-19 ''(only a few of them are mandatory, perhaps 12-13 and 17 - TBA)''
* Kitchin, chapters 13-14 and 17-19
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
* Slides: [[File:S06-Privacy.pdf]]


Guest presentation: Laurence Dierickx on aspects of big-data quality
Guest presentation: Ghazaal Sheiki on fact checking. Slides:  [[File:GhazaalSheiki-AutomatedFactChecking.pdf]]


Supplementary:
Supplementary:
* Kitchin, the rest of chapters 12-19 (perhaps 14, 18 and 19)
* Kitchin, chapters 12 and 15-16 are also recommended reading
* EU's [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text
* EU's [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text



Latest revision as of 21:16, 10 November 2022

Tentative themes for each session

  • Thursday August 18th: Introduction meeting File:IntroductionMeeting.pdf
  • Thursday September 1st: Session 1 - Introduction to big data. Big-data processing. Spark
  • Thursday September 15th: Session 2 - More about Spark. Data sources. Twitter
  • Thursday September 29th: Session 3 - Streaming Spark. Big-data architectures. Kafka
  • Thursday October 13th: Session 4 - Cloud computing. NREC an Openstack
  • Thursday October 27th: Session 5 - Cloud management. Terraform and Ansible. Docker and Kubernetes
  • Thursday November 10th: Session 6 - Societal issues. Privacy. GDPR
  • Thursday November 24th: Session 7 - Essay presentations
  • Thursday December 8th: Session 8 - Project demonstrations

Session 1 - Introduction to big data. Big-data processing. Spark

Supplementary:

Session 2 - More about Spark. Data sources. Twitter

Guest presentation: Daniel Rosnes on using Twitter data for the news: Introduction to Twitter API v2 and Tweepy

Supplementary:

Session 3 - Streaming Spark. Big-data architectures. Kafka

The Guest Talk on architectures and the News Hunter platform is postponed to a later session.

Supplementary:

  • kafka-python API
  • Structured Streaming Spark Programming Guide
  • News Hunter:
    • Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. Paper
    • Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper
  • Design science research method:
    • Design Science in Information Systems Research by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. (You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)
    • Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. File:Hevner2007-ThreeCycleView-SJIS.pdf

Session 4 - Cloud computing. NREC and Openstack

Guest presentation: Sohail Khan on computer vision and deep networks for image analysis. His slides and demo code are uploaded to mitt.uib.no under Files (size and file-type limitations).

There are not so many readings for this session, because it is where we will start running Spark in a cluster, so there will be practical work that takes some time. Computer networks and image analysis is not a mandatory part of the course, but something you may want to use in your projects. Sohail's presentation will include suggestions for further reading.

Session 5 - Cloud management. Terraform and Ansible.

Guest presentation: Marc Gallofré Ocaña on the News Hunter platform and its big-data ready architecture. Slides: File:MarcGallofre-BigDataArchitecture.pdf

Comment: Hopefully, we can introduce Docker and Kubernetes in later sesssions.

Session 6 - Societal issues. Privacy. GDPR

Guest presentation: Ghazaal Sheiki on fact checking. Slides: File:GhazaalSheiki-AutomatedFactChecking.pdf

Supplementary:

Session 7 - Essay presentations

Session 8 - Project demonstrations