Readings: Difference between revisions

From info319
No edit summary
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Books ==
== Books ==
We will use two text books:
Text books:
* Rob Kitchin. ''The Data Revolution - Big Data, Open Data, Data Infrastructures & Their Consequences''. Sage, 2014.
* Rob Kitchin. ''The Data Revolution - A Critical Analysis of Big Data, Open Data and Data Infrastructures'', 2nd Edition. Sage, 2021.
** At least chapters 1-5 and some later chapters are mandatory.
** chapters 1, 3-5, 13-14, 17-19 are mandatory (12 and 15-16 are supplementary)
* Bill Chambers and Matei Zaharia: ''Spark: The Definitive Guide - Big Data Processing Made Simple''. O'Riley, 2018. [[File:Spark-TheDefinitiveGuide.pdf | (PDF)]]
** At least chapters 1-9 and some later chapters are mandatory.
<!--  * preliminary chapter list: Part I - chapters 1-3, Part II - chapters 4-9, Part III - chapter 12, Part IV - chapter 15, Part V - chapters 20-21, Part VI - chapter 24 (ca 260 pages). -->


<!-- GDPR -->
* Bill Chambers and Matei Zaharia: ''Sprk: The Definitive Guide - Big Data Processing Made Simple''. O'Riley, 2018. [[File:Spark-TheDefinitiveGuide.pdf]]
** chapters 1-9, 12, 15, 20-21 are mandatory (chapter 10 on SQL is also highly relevant)
** [https://github.com/databricks/Spark-The-Definitive-Guide GitHub repository with code and data examples]


== Technical introductions ==
== Papers ==
Selected web pages will become available here, including:
Selected papers will become available here, including:
* Spark 3.3.0 [https://spark.apache.org/docs/latest/index.html Overview] and [https://spark.apache.org/docs/latest/quick-start.html Quick Start (with Python examples)]
* [https://arxiv.org/pdf/2012.09109 Section 1] in Opdahl, A. L., & Nunavath, V. (2020). Big Data. Big Data in Emergency Management: Exploitation Techniques for Social and Mobile Data, 15-29. Book chapter
<!-- * [https://docs.nrec.no/intro.html NREC Introduction - The Norwegian Research and Education Cloud] -->
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. [https://ojs.bibsys.no/index.php/NIK/article/view/939/792 Short Paper] and poster: [[File:A1-Poster-NIKT2021.pdf]]
* OpenStack
<!-- Architecture stuff:
* TerraForm
* Lambda: Introduced in Mathan Marz and James Warren (2013). Big Data Principles and Best Practices of Scalable Real-Time Data Systems. Slides 14-27 in [http://2014.berlinbuzzwords.de/sites/2014.berlinbuzzwords.de/files/media/documents/michael_hausenblas_-_lambda_architecture.pdf this presentation] gives an overview of the idea!
* Ansible
* Kappa: Kreps, J.: Questioning the lambda architecture (2014). [https://www.oreilly.com/radar/questioning-the-lambda-architecture/ White paper]
* [https://kafka.apache.org/intro Kafka Introduction]
* Liquid: Fernandez, Raul Castro, Peter R. Pietzuch, Jay Kreps, Neha Narkhede, Jun Rao, Joel Koshy, Dong Lin, Chris Riccomini, and Guozhang Wang. "Liquid: Unifying nearline and offline big data integration." In CIDR. 2015. [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1088.2602&rep=rep1&type=pdf Paper]
* Sigma: Cassavia, N., & Masciari, E. (2021, March). Sigma: a scalable high performance big data architecture. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (pp. 236-239). IEEE. [https://bibsys-almaprimo.hosted.exlibrisgroup.com/primo-explore/openurl?sid=google&auinit=N&aulast=Cassavia&atitle=Sigma:%20a%20scalable%20high%20performance%20big%20data%20architecture&id=doi:10.1109%2FPDP52278.2021.00044&vid=UBB&institution=UBB&url_ctx_val=&url_ctx_fmt=null&isSerivcesPage=true Paper]
* Maamouri, A., Sfaxi, L., & Robbana, R. (2021, December). Phi: A Generic Microservices-Based Big Data Architecture. In European, Mediterranean, and Middle Eastern Conference on Information Systems (pp. 3-16). Springer, Cham. [https://link.springer.com/chapter/10.1007/978-3-030-95947-0_1 Paper]


Additional non-mandatory materials will be made available to support the exercises.
Marc:
You found the other Phi architecture. 😃 The one I meant was: https://ieeexplore.ieee.org/abstract/document/8712381 But both have interesting contributions. The one you found considers the training part which it is not instantiated in the others.


<!-- Twitter, tweepy. GDELT -->
This is the "original publication" of Lambda: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html , it is a blog entry.
-->


== Papers ==
<!--
Selected papers will become available here, including:
* Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia (2010). A view of cloud computing. Communications of the ACM 53 (4), 50-58. [https://dl.acm.org/doi/fullHtml/10.1145/1721654.1721672 Paper]
* Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper [Poster]
* M Zaharia, M Chowdhury, MJ Franklin, S Shenker, I Stoica (2010). Spark: Cluster computing with working sets. 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10). [https://www.usenix.org/event/hotcloud10/tech/full_papers/Zaharia.pdf Paper]
* Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, Ion Stoica (2012). Resilient distributed datasets: A Fault-Tolerant abstraction for In-Memory cluster computing. In Prof. 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 15-28. [https://scholar.google.com/citations?view_op=view_citation&hl=en&user=I1EvjZsAAAAJ&citation_for_view=I1EvjZsAAAAJ:Tyk-4Ss8FVUC Paper]
* Karun, A. K., & Chitharanjan, K. (2013, April). A review on hadoop—HDFS infrastructure extensions. In 2013 IEEE conference on information & communication technologies (pp. 132-137). IEEE. [https://scholar.google.com/scholar?output=instlink&q=info:GIm8aG-ScOsJ:scholar.google.com/&hl=en&as_sdt=0,5&scillfp=6854624816870725192&oi=lle Paper]
* Kafka?
-->


Supplementary:
Supplementary:
* Section 1 in Opdahl, A. L., & Nunavath, V. (2020). Big Data. Big Data in Emergency Management: Exploitation Techniques for Social and Mobile Data, 15-29. Paper
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. [https://scholar.google.com/scholar?output=instlink&q=info:pKELE6iBzpAJ:scholar.google.com/&hl=en&as_sdt=0,5&as_ylo=2021&scillfp=4299025271368542631&oi=lle Paper]
* Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper
* Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. [https://scholar.google.com/scholar?output=instlink&q=info:0K5dB1_9nusJ:scholar.google.com/&hl=en&as_sdt=0,5&as_ylo=2018&scillfp=11776208952974186557&oi=lle Paper]
* [https://www.jstor.org/stable/25148625#metadata_info_tab_contents Design Science in Information Systems Research] by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. ''(You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)''
* Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. [[File:Hevner2007-ThreeCycleView-SJIS.pdf]]


<!-- Architectures: kappa, lambda, phi, Liquid -->
<!-- Architectures: kappa, lambda, phi, Liquid -->
Line 34: Line 43:
<!-- Privacy? -->
<!-- Privacy? -->


== Lecture Slides==
== Technical introductions ==
See the [[Sessions|Session page]] for lecture slides.
Selected web pages will become available here, including:
* [https://kafka.apache.org/intro Kafka Introduction]
* [https://docs.nrec.no/index.html NREC and OpenStack], the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
* [https://docs.nrec.no/terraform-part1.html TerraForm and NREC part I], [https://docs.nrec.no/terraform-part2.html part II], and [https://docs.nrec.no/terraform-part3.html part III]
* [https://www.ansible.com/overview/how-ansible-works How Ansible Works] and [https://docs.ansible.com/ansible_community.html the Ansible Community portal]
* Docker Docs: [https://docs.docker.com/get-started/overview/ Docker overview] and [https://docs.docker.com/get-started/overview/ Get started]
* [https://kubernetes.io/docs/tutorials/kubernetes-basics/ Learn Kubernetes basics], modules 1-6
* [https://gdpr.eu/what-is-gdpr/ What is GDPR, the EU’s new data protection law?]
 
Supplementary:
* Spark 3.3.0 [https://spark.apache.org/docs/latest/index.html Overview] and [https://spark.apache.org/docs/latest/quick-start.html Quick Start (with Python examples)]
* [https://developer.twitter.com/en/docs/twitter-api Twitter API v2]
* [https://github.com/tweepy/tweepy Tweepy: Twitter for Python]
* [https://docs.tweepy.org/en/latest/ Tweepy Documentation]
* [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Structured Streaming Spark Programming Guide]
* Apache Spark [https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss/index.html Structured Streaming API]
* [https://kafka-python.readthedocs.io/en/master/ kafka-python API]
* EU's [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] - the official legal text
 
<!-- GDELT -->
 
== Lecture slides==
See the [[Sessions|Session page]] for lecture slides after each session.


==Suitable readings==
==Readings for each session==
The [[Sessions|Session page]] contains specific readings for each session.
The [[Sessions|Sessions page]] will suggest specific readings for each session and its associated exercise.

Latest revision as of 13:04, 2 December 2022

Books

Text books:

  • Rob Kitchin. The Data Revolution - A Critical Analysis of Big Data, Open Data and Data Infrastructures, 2nd Edition. Sage, 2021.
    • chapters 1, 3-5, 13-14, 17-19 are mandatory (12 and 15-16 are supplementary)

Papers

Selected papers will become available here, including:

  • Section 1 in Opdahl, A. L., & Nunavath, V. (2020). Big Data. Big Data in Emergency Management: Exploitation Techniques for Social and Mobile Data, 15-29. Book chapter
  • Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper and poster: File:A1-Poster-NIKT2021.pdf


Supplementary:

  • Opdahl, A. L., & Tessem, B. (2021). Ontologies for finding journalistic angles. Software and Systems Modeling, 20(1), 71-87. Paper
  • Berven, A., Christensen, O. A., Moldeklev, S., Opdahl, A. L., & Villanger, K. J. (2020). A knowledge-graph platform for newsrooms. Computers in Industry, 123, 103321. Paper
  • Design Science in Information Systems Research by Alan R. Hevner, Salvatore T. March, Jinsoo Park and Sudha Ram. MIS Quarterly 28(1):75-105, March 2004. (You need to be on UiB's network to access the link - I have uploaded it under Files in mitt.uib.no, but it may soon be deleted from there...)
  • Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian journal of information systems, 19(2), 4. File:Hevner2007-ThreeCycleView-SJIS.pdf


Technical introductions

Selected web pages will become available here, including:

Supplementary:


Lecture slides

See the Session page for lecture slides after each session.

Readings for each session

The Sessions page will suggest specific readings for each session and its associated exercise.