Programming project

From info319

The project shall develop an application that uses big data technologies on social-media and/or other open data data. At least a part of the project shall use Spark and run in the NREC cloud. The project should be carried out in groups of three, and never more. Working individually or in pairs is possible, but not recommended.

This autumn, we specifically invite projects that use big data for the news.

More information about possible projects, deadlines, and other requirements will appear here soon.

Proposing a theme: deadline

Everyone who intends to take the course must be included in a project proposal sent by email to Andreas.Opdahl@uib.no and with all the group members on Cc. The subject line must contain the string "INFO319 Project Proposal".

Proposal deadline: Wednesday October 12th 1500

Project presentations

Final project presentations: Thursday December 8th 1015

Depending a little on the number of project groups, each presentation will be brief: 15 minutes for each group + 5 minutes for questions and comments.

You may demonstrate your project live (most convincing), or you may replay a recorded demonstration (which is good to have as a backup in any case). In addition, I expect each presentation to address/answer at least these points:

  • what have you made? - or: what is your application doing?
  • which technologies have you used (languages, APIs, other software etc.)
  • where did you get your data from? - and/or: which datasets have you used?
  • why is it a good idea to do this using big data and big data technologies? - or: what does your system do that was not possible (or at least not easy) to do before?
  • exactly what have you done and programmed so far?
  • what are you planning to do in the final few days?
  • have you got any particular problems you need to address?

Project submission

Final project submission: December 12th.

Submit your project through Inspera as a single ZIP archive. The version of your project that you submit should be anonymous.

Since the project is graded, this is an official deadline. If you do not submit on time, you will be not allowed to take the course exam a week later.

Provide a short video (max 5 minutes) that shows your system running, which voice comments.

Comment your code sparsely and in-line. You do not need additional documentation, but you should provide a precise description for how to run your system. For example, explain:

  • which additional packages that need to be installed
  • which datasets that need to be downloaded
    • do not include large datasets >10M in your Zip file
    • but it is fine to include smaller test datasets
  • if credentials (like a Twitter token) is needed to run the code, explain where they must be added
  • which other systems that must be running first (e.g., Kafka, HDFS, YARN)
  • how to start your system (in particular if it consists of several programs)