Readings: Difference between revisions
From info319
No edit summary |
No edit summary |
||
| Line 15: | Line 15: | ||
* Lambda: Introduced in Mathan Marz and James Warren (2013). Big Data Principles and Best Practices of Scalable Real-Time Data Systems. Slides 14-27 in [http://2014.berlinbuzzwords.de/sites/2014.berlinbuzzwords.de/files/media/documents/michael_hausenblas_-_lambda_architecture.pdf this presentation] gives an overview of the idea! | * Lambda: Introduced in Mathan Marz and James Warren (2013). Big Data Principles and Best Practices of Scalable Real-Time Data Systems. Slides 14-27 in [http://2014.berlinbuzzwords.de/sites/2014.berlinbuzzwords.de/files/media/documents/michael_hausenblas_-_lambda_architecture.pdf this presentation] gives an overview of the idea! | ||
* Kappa: Kreps, J.: Questioning the lambda architecture (2014). [https://www.oreilly.com/radar/questioning-the-lambda-architecture/ White paper] | * Kappa: Kreps, J.: Questioning the lambda architecture (2014). [https://www.oreilly.com/radar/questioning-the-lambda-architecture/ White paper] | ||
* Liquid: Fernandez, Raul Castro, Peter R. Pietzuch, Jay Kreps, Neha Narkhede, Jun Rao, Joel Koshy, Dong Lin, Chris Riccomini, and Guozhang Wang. "Liquid: Unifying nearline and offline big data integration." In CIDR. 2015. [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1088.2602&rep=rep1&type=pdf Paper] | |||
* Sigma: Cassavia, N., & Masciari, E. (2021, March). Sigma: a scalable high performance big data architecture. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (pp. 236-239). IEEE. [https://bibsys-almaprimo.hosted.exlibrisgroup.com/primo-explore/openurl?sid=google&auinit=N&aulast=Cassavia&atitle=Sigma:%20a%20scalable%20high%20performance%20big%20data%20architecture&id=doi:10.1109%2FPDP52278.2021.00044&vid=UBB&institution=UBB&url_ctx_val=&url_ctx_fmt=null&isSerivcesPage=true Paper] | * Sigma: Cassavia, N., & Masciari, E. (2021, March). Sigma: a scalable high performance big data architecture. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (pp. 236-239). IEEE. [https://bibsys-almaprimo.hosted.exlibrisgroup.com/primo-explore/openurl?sid=google&auinit=N&aulast=Cassavia&atitle=Sigma:%20a%20scalable%20high%20performance%20big%20data%20architecture&id=doi:10.1109%2FPDP52278.2021.00044&vid=UBB&institution=UBB&url_ctx_val=&url_ctx_fmt=null&isSerivcesPage=true Paper] | ||
* Maamouri, A., Sfaxi, L., & Robbana, R. (2021, December). Phi: A Generic Microservices-Based Big Data Architecture. In European, Mediterranean, and Middle Eastern Conference on Information Systems (pp. 3-16). Springer, Cham. [https://link.springer.com/chapter/10.1007/978-3-030-95947-0_1 Paper] | * Maamouri, A., Sfaxi, L., & Robbana, R. (2021, December). Phi: A Generic Microservices-Based Big Data Architecture. In European, Mediterranean, and Middle Eastern Conference on Information Systems (pp. 3-16). Springer, Cham. [https://link.springer.com/chapter/10.1007/978-3-030-95947-0_1 Paper] | ||
Revision as of 11:50, 17 August 2022
Books
We will use two text books:
- Rob Kitchin. The Data Revolution - Big Data, Open Data, Data Infrastructures & Their Consequences. Sage, 2014.
- At least chapters 1-5 and some later chapters are mandatory.
- Bill Chambers and Matei Zaharia: Spark: The Definitive Guide - Big Data Processing Made Simple. O'Riley, 2018. File:Spark-TheDefinitiveGuide.pdf
- At least chapters 1-9 and some later chapters are mandatory.
Papers
Selected papers will become available here, including:
- Gallofré, M., Opdahl, A. L., Stoppel, S., Tessem, B., & Veres, C. (2021). The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms. In Proceedings of Norsk IKT-konferanse for forskning og utdanning. Short Paper File:A1-Poster-NIKT2021.pdf
Technical introductions
Selected web pages will become available here, including:
- Spark 3.3.0 Overview and Quick Start (with Python examples)
- NREC and OpenStack, the following sections/pages: Introduction, Project application, Logging in, The dashboard, Create a Linux virtual machine (skip: Windows), Using SSH, Working with Security Groups, Create and manage volumes, Create and manage snapshots (skip: images), Instance console
- TerraForm and NREC part I, part II, and part III
- How Ansible Works and the Ansible Community portal
- Kafka Introduction
Additional non-mandatory materials will be made available to support the exercises further.
Lecture slides
See the Session page for lecture slides after each session.
Readings for each session and exercise
The Sessions and Exercises pages will suggest specific readings for each session and exercise.
