info319 - User contributions [en]

Essay

2020-11-17T09:26:16Z

Vimala:

The essay shall present and discuss selected theory, technology and tools related to the big data and emergency management, backed by scholarly and other references.

This autumn, we specifically invite essays that are preferably related to either big data, IoT, social media and emergency management.

== Theme ==
Your mandatory essay is supposed to be "an individual, theoretical essay on chosen topic". In practice, it is you who propose a theme, and then I either accept it as is or guide you towards a more suitable theme.

I am open for essays that are not theoretical, but based on your own investigation and data collection, for example through interviews and/or archival analyses. But in any case, your essay should demonstrate "thoughtful research and discussion". It should be well backed by scholarly literature. More literature backing is needed if you have not done any investigation/data collection of your own beyond your literature search.

== Finding a theme ==
It is a good idea to start thinking about possible essay themes already now. At least to get the processes started! Send me your ideas in an email, and I will provide early feedback. All I need is a few keywords or a sentence about one or more ideas your are considering. Later we can talk face to face.

* Here is an example of an initial idea on the keyword stage: "Ethics and Privacy in using social media for emergency management."
* A step further: "I want to look into interoperability of different freely available emergency data sources using big data technologies. What information is needed? who needed?"
* Here is an even more developed idea: "Something about how realistic it is to analyse lots of Twitter messages using sentiment analysers. Analysing messages takes computing resources, and there are so many Twitter messages posted after an emergency occurrence. I want to write something about the accuracy of the twitter data analysis."

== Proposing a theme/ Selection deadline ==
Everyone who intends to take the course must send an essay proposal by email to [mailto:vimala.nunavath@usn.no vimala.nunavath@usn.no].

'''Deadline:''' Friday August 28th 1400

The proposal does not have to be long, but the following points must be made clear:
* Which problem or opportunity you are planning to address.
* Why this problem or opportunity is real.
* How you are going to investigate the problem or opportunity.
* Which relevant literature or other information sources you know already.
* How your are going to find more suitable literature.
* Whether you plan to collect your own data and, if so, from where and how.
* And, if so, how you are going to present and use them in your essay.
* How you plan to structure the essay (coarsely).

== Length ==
Suggested length is 4000-6000 words. A theoretical essay should normally be a little longer than an essay based on own data collection. But effort quality is much more important than length. I encourage submitting the essay wholly or partially as one or several Wikipedia pages, or as clear extensions to existing pages.

== Essay submission ==
'''Hard deadline:''' Friday November 20th 1400

Essays are submitted through Inspera in PDF format. The file name and title page should clearly indicate who the author is.

== Evaluation criteria ==
These are the main evaluation criteria we will use:
* Clear, good and well-motivated problem.
* Extent and coverage of your literature, in particular of the academic research literature.
* Depth of literature is more important than breadth.
* Originality, in particular of your independent/own contribution.
* How well argued and/or empirically backed the contribution is.
* Extent: how hard you have worked and how much you have done.
* Quality of presentation.

== Essay presentations ==
The session on Thursday, November 26th will focus on essay presentations. 10 minutes and around 5-8 slides are an appropriate length.

Depending on the number of people who take the course, we may not have much time per essay. This includes: getting your slides up and running, presenting the actual essay, presenting critique, and posing/answering questions. In addition to presenting your own essay. you are supposed to offer comments to at least one other essay.

This means that you need to be really well prepared. A short presentation is much harder than a long one: "Lincoln [... w]hen asked to appear upon some important occasion and deliver a five-minute speech, he said that he had no time to prepare five-minute speeches, but that he could go and speak an hour at any time." (H. H. Markham, Governor of California, 1893)

== Presentation outline ==
The presentation form is up to you, but you should probably touch most of these points:
* What is the problem you have addressed?
* Why is this problem important?
* Which existing area(s) of research and/or practice are you building on?
** explain the ones we have not yet touched in the course!
** spend some time on this: for the listeners it may be the most interesting part
* What is your independent/own contribution?
** examples: how your field(s) has evolved over time; missing knowledge/research; own data collection, e.g., through interviews; comparing different approaches - in practice or theory; designing a new solution for something; suggesting a framework of how different contributions fit together and are different; ...
* What has been your strategy to make that contribution?
** how did you convince the reader that your contribution was sound?
* Which academic research literature did you use (in addition to the curriculum in INFO319)?
** show your literature list, but don't try to explain the papers :-)

'''Important:''' Upload your draft presentations to the portal in advance on Wednesday December 4rth. Upload them in the folder EssayPresentations in Canvas (https://mitt.uib.no/courses/13673/).

== Essay oppositions ==
These are very brief and even more free form. You could address two or more of the issues below, or perhaps other ones. Do not try to address them all:
* Have you got reasons to think this problem unclear - if so how?
* Have you got reasons to think this problem is less relevant?
* Are there other areas of research and/or practice that are relevant here?
* Do you think the plan for an independent/own contribution are realistic? Is it the right type of contribution?
* Do you think this contribution is likely to be convincing? If no, which problems do you see?
* Have you got concrete ideas for finding more literature (but not just: maybe you can search in the library/on the internet...)?

Essay

2020-11-17T09:24:46Z

Vimala:

The essay shall present and discuss selected theory, technology and tools related to the big data and emergency management, backed by scholarly and other references.

This autumn, we specifically invite essays that are preferably related to either big data, IoT, social media and emergency management.

== Theme ==
Your mandatory essay is supposed to be "an individual, theoretical essay on chosen topic". In practice, it is you who propose a theme, and then I either accept it as is or guide you towards a more suitable theme.

I am open for essays that are not theoretical, but based on your own investigation and data collection, for example through interviews and/or archival analyses. But in any case, your essay should demonstrate "thoughtful research and discussion". It should be well backed by scholarly literature. More literature backing is needed if you have not done any investigation/data collection of your own beyond your literature search.

== Finding a theme ==
It is a good idea to start thinking about possible essay themes already now. At least to get the processes started! Send me your ideas in an email, and I will provide early feedback. All I need is a few keywords or a sentence about one or more ideas your are considering. Later we can talk face to face.

* Here is an example of an initial idea on the keyword stage: "Ethics and Privacy in using social media for emergency management."
* A step further: "I want to look into interoperability of different freely available emergency data sources using big data technologies. What information is needed? who needed?"
* Here is an even more developed idea: "Something about how realistic it is to analyse lots of Twitter messages using sentiment analysers. Analysing messages takes computing resources, and there are so many Twitter messages posted after an emergency occurrence. I want to write something about the accuracy of the twitter data analysis."

== Proposing a theme/ Selection deadline ==
Everyone who intends to take the course must send an essay proposal by email to [mailto:vimala.nunavath@usn.no vimala.nunavath@usn.no].

'''Deadline:''' Friday August 28th 1400

The proposal does not have to be long, but the following points must be made clear:
* Which problem or opportunity you are planning to address.
* Why this problem or opportunity is real.
* How you are going to investigate the problem or opportunity.
* Which relevant literature or other information sources you know already.
* How your are going to find more suitable literature.
* Whether you plan to collect your own data and, if so, from where and how.
* And, if so, how you are going to present and use them in your essay.
* How you plan to structure the essay (coarsely).
* Whether you plan to publish the essay on some kind of social media.

== Length ==
Suggested length is 4000-6000 words. A theoretical essay should normally be a little longer than an essay based on own data collection. But effort quality is much more important than length. I encourage submitting the essay wholly or partially as one or several Wikipedia pages, or as clear extensions to existing pages.

== Essay submission ==
'''Hard deadline:''' Friday November 20th 1400

Essays are submitted through Inspera in PDF format. The file name and title page should clearly indicate who the author is.

== Evaluation criteria ==
These are the main evaluation criteria we will use:
* Clear, good and well-motivated problem.
* Extent and coverage of your literature, in particular of the academic research literature.
* Depth of literature is more important than breadth.
* Originality, in particular of your independent/own contribution.
* How well argued and/or empirically backed the contribution is.
* Extent: how hard you have worked and how much you have done.
* Quality of presentation.

== Essay presentations ==
The session on Thursday, November 26th will focus on essay presentations. 10 minutes and around 5-8 slides are an appropriate length.

Depending on the number of people who take the course, we may not have much time per essay. This includes: getting your slides up and running, presenting the actual essay, presenting critique, and posing/answering questions. In addition to presenting your own essay. you are supposed to offer comments to at least one other essay.

This means that you need to be really well prepared. A short presentation is much harder than a long one: "Lincoln [... w]hen asked to appear upon some important occasion and deliver a five-minute speech, he said that he had no time to prepare five-minute speeches, but that he could go and speak an hour at any time." (H. H. Markham, Governor of California, 1893)

== Presentation outline ==
The presentation form is up to you, but you should probably touch most of these points:
* What is the problem you have addressed?
* Why is this problem important?
* Which existing area(s) of research and/or practice are you building on?
** explain the ones we have not yet touched in the course!
** spend some time on this: for the listeners it may be the most interesting part
* What is your independent/own contribution?
** examples: how your field(s) has evolved over time; missing knowledge/research; own data collection, e.g., through interviews; comparing different approaches - in practice or theory; designing a new solution for something; suggesting a framework of how different contributions fit together and are different; ...
* What has been your strategy to make that contribution?
** how did you convince the reader that your contribution was sound?
* Which academic research literature did you use (in addition to the curriculum in INFO319)?
** show your literature list, but don't try to explain the papers :-)

'''Important:''' Upload your draft presentations to the portal in advance on Wednesday December 4rth. Upload them in the folder EssayPresentations in Canvas (https://mitt.uib.no/courses/13673/).

== Essay oppositions ==
These are very brief and even more free form. You could address two or more of the issues below, or perhaps other ones. Do not try to address them all:
* Have you got reasons to think this problem unclear - if so how?
* Have you got reasons to think this problem is less relevant?
* Are there other areas of research and/or practice that are relevant here?
* Do you think the plan for an independent/own contribution are realistic? Is it the right type of contribution?
* Do you think this contribution is likely to be convincing? If no, which problems do you see?
* Have you got concrete ideas for finding more literature (but not just: maybe you can search in the library/on the internet...)?

Final exam

2020-11-13T13:57:33Z

Vimala:

Final Exam is on 1st December, 2020.

Reading material for exam:

*Rob Kitchin. The Data Revolution - Big Data, Open Data, Data Infrastructures & Their Consequences. Sage, 2014.
*Carlos Castillo. Big Crisis Data - Social Media in Disasters and Time-Critical Situations. Cambridge, 2016.
*Lecture slides in Canvas.

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:48:49Z

Vimala:

'''Practical Session'''

* I will share the jupyter notebook (which contains the python code) and dataset during the practical session

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://github.com/tthustla/setiment_analysis_pyspark/blob/master/Sentiment%20Analysis%20with%20PySpark.ipynb Sentiment Analysis with Pyspark]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''
*[https://dzone.com/articles/streaming-machine-learning-pipeline-for-sentiment Sentiment analysis with Apache Spark streaming and MLlib]] or [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark] or [https://databricks.com/wp-content/uploads/2015/10/STEP-3-Sentiment_Analysis.html]

'''3) Real-time Analysis of different trends in COVID-19 cases with Spark'''
**[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:48:09Z

Vimala:

'''Practical Session'''

* I will share the jupyter notebook (which contains the python code) during the practical session

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://github.com/tthustla/setiment_analysis_pyspark/blob/master/Sentiment%20Analysis%20with%20PySpark.ipynb Sentiment Analysis with Pyspark]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''
*[https://dzone.com/articles/streaming-machine-learning-pipeline-for-sentiment Sentiment analysis with Apache Spark streaming and MLlib]] or [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark] or [https://databricks.com/wp-content/uploads/2015/10/STEP-3-Sentiment_Analysis.html]

'''3) Real-time Analysis of different trends in COVID-19 cases with Spark'''
**[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:34:15Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://github.com/tthustla/setiment_analysis_pyspark/blob/master/Sentiment%20Analysis%20with%20PySpark.ipynb Sentiment Analysis with Pyspark]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''
*[https://dzone.com/articles/streaming-machine-learning-pipeline-for-sentiment Sentiment analysis with Apache Spark streaming and MLlib]] or [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark] or [https://databricks.com/wp-content/uploads/2015/10/STEP-3-Sentiment_Analysis.html]

'''3) Real-time Analysis of different trends in COVID-19 cases with Spark'''
**[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:31:48Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''
*[https://dzone.com/articles/streaming-machine-learning-pipeline-for-sentiment Sentiment analysis with Apache Spark streaming and MLlib]]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark] or [https://databricks.com/wp-content/uploads/2015/10/STEP-3-Sentiment_Analysis.html]

'''3) Real-time Analysis of different trends in COVID-19 cases with Spark'''
**[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:30:06Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''
*[https://dzone.com/articles/streaming-machine-learning-pipeline-for-sentiment Sentiment analysis with Apache Spark streaming and MLlib]]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark]

'''3) Real-time Analysis of different trends in COVID-19 cases with Spark'''
**[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:29:53Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''
*[https://dzone.com/articles/streaming-machine-learning-pipeline-for-sentiment/ Sentiment analysis with Apache Spark streaming and MLlib]]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark]

'''3) Real-time Analysis of different trends in COVID-19 cases with Spark'''
**[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:28:05Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

'''2) Sentiment analysis with pyspark:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:27:53Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://towardsdatascience.com/sentiment-analysis-with-pyspark-bc8e83f80c35/ Sentiment Analysis with pyspark]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:26:24Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Analysis of different trends in COVID-19 cases with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:25:37Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://community.ibm.com/community/user/watsonstudio/blogs/sharyn-richard1/2020/04/15/analysis-of-johns-hopkins-covid-19-case-data/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:24:15Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2017/09/twitter-sentiment-analysis-using-spark/ | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-29T15:23:09Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time | Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for Twitter data analysis

2020-10-29T13:17:54Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installation guide for Spark streaming for scala]]

For python:

*[[https://github.com/jleetutorial/python-spark-streaming |Python Spark Streaming Code]]
*[[https://learning.oreilly.com/videos/apache-spark-streaming/9781789808223/9781789808223-video1_5|Python Spark Streaming Video]]
*[[https://medium.com/@ch.nabarun/easy-to-play-with-twitter-data-using-spark-structured-streaming-76fe86f1f81c |Spark Structured Streaming]]

'''2)Or you can follow the below steps for SCALA:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "your key")
System.setProperty("twitter4j.oauth.consumerSecret", "your key")
System.setProperty("twitter4j.oauth.accessToken", "your key")
System.setProperty("twitter4j.oauth.accessTokenSecret", "your key")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-28T16:07:39Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installation guide for Spark streaming for scala]]

For python:

*[[https://github.com/jleetutorial/python-spark-streaming |Python Spark Streaming Code]]
*[[https://learning.oreilly.com/videos/apache-spark-streaming/9781789808223/9781789808223-video1_5|Python Spark Streaming Video]]
*[[https://medium.com/@ch.nabarun/easy-to-play-with-twitter-data-using-spark-structured-streaming-76fe86f1f81c|Spark Structured Streaming]]

'''2)Or you can follow the below steps for SCALA:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "your key")
System.setProperty("twitter4j.oauth.consumerSecret", "your key")
System.setProperty("twitter4j.oauth.accessToken", "your key")
System.setProperty("twitter4j.oauth.accessTokenSecret", "your key")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-28T15:55:22Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installation guide for Spark streaming for scala]]

For python:
*[[https://github.com/jleetutorial/python-spark-streaming |Python Spark Streaming]]
*[[https://medium.com/@ch.nabarun/easy-to-play-with-twitter-data-using-spark-structured-streaming-76fe86f1f81c|Spark Structured Streaming]]

'''2)Or you can follow the below steps for SCALA:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "your key")
System.setProperty("twitter4j.oauth.consumerSecret", "your key")
System.setProperty("twitter4j.oauth.accessToken", "your key")
System.setProperty("twitter4j.oauth.accessTokenSecret", "your key")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-28T08:56:53Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installation guide for Spark streaming]]
*[[https://github.com/jleetutorial/python-spark-streaming |Python Spark Streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "your key")
System.setProperty("twitter4j.oauth.consumerSecret", "your key")
System.setProperty("twitter4j.oauth.accessToken", "your key")
System.setProperty("twitter4j.oauth.accessTokenSecret", "your key")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-28T08:56:16Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]
*[[https://github.com/jleetutorial/python-spark-streaming |Python Spark Streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "your key")
System.setProperty("twitter4j.oauth.consumerSecret", "your key")
System.setProperty("twitter4j.oauth.accessToken", "your key")
System.setProperty("twitter4j.oauth.accessTokenSecret", "your key")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T13:34:12Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]
*[[https://github.com/jleetutorial/python-spark-streaming |Python Spark Streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T13:33:51Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]
*[[https://github.com/jleetutorial/python-spark-streaming |Example]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for sentiment analysis

2020-10-26T10:24:13Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Guide to Sentiment analysis]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time/ Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-26T10:19:50Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [https://wiki.uib.no/info319/index.php/File:GuideS6.pdf Accessing Sentiment analysis Folder]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time/ Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-26T10:18:34Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [:File:GuideS6.pdf Access to Sentiment analysis folder]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time/ Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-26T10:18:18Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [[[:File:GuideS6.pdf]] Access to Sentiment analysis folder]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time/ Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-26T10:18:09Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [[:File:GuideS6.pdf] Access to Sentiment analysis folder]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time/ Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

Practical session, Spark streaming for sentiment analysis

2020-10-26T10:17:49Z

Vimala:

'''Practical Session'''

'''Sentiment analysis with Apache Spark streaming and MLlib'''
* [[:File:GuideS6.pdf]Access to Sentiment analysis folder]
* [https://stdatalabs.com/2016/09/spark-streaming-part-1-real-time/ Real-time Sentiment Analysis of Twitter data with Spark Streaming and MLlib]

'''More useful links'''

'''1) Sentiment analysis with Apache Spark streaming and MLlib:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark tutorial]

'''2) Sentiment analysis with IBM Bluemix:'''

*[https://developer.ibm.com/clouddataservices/2016/01/15/real-time-sentiment-analysis-of-twitter-hashtags-with-spark/ Real-time Sentiment Analysis of Twitter Hashtags with Spark and IBM Bluemix]

File:GuideS6.pdf

2020-10-26T10:17:12Z

Vimala: File uploaded with MsUpload

File uploaded with MsUpload

Practical session, Spark streaming for Twitter data analysis

2020-10-26T10:09:51Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T10:09:29Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

* See practical session folder in Canvas
*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T10:05:49Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]
* See practical session folder in Canvas

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T10:04:49Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming]
*[[C:\Users\vnuna\OneDrive - USN\Info319| Spark Streaming]]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T10:03:22Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming]
*[[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0| Spark Streaming]]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T10:01:56Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming]
*[[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0| Spark Streaming]]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]
[[Media:SparkStreamingOverview.ogg]]
[[Media:SparkTwitter.ogg]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T09:57:18Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming]
*[[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0| Spark Streaming]]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

File:Guide-20.pdf

2020-10-26T09:56:46Z

Vimala: Blanked the page

File:Guide-20.pdf

2020-10-26T09:56:44Z

Vimala: File uploaded with MsUpload

File uploaded with MsUpload

Practical session, Spark streaming for Twitter data analysis

2020-10-26T09:56:37Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming Folder]
*[[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0| Spark Streaming]]
*[[:File:guide-20.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

File:Guide.pdf

2020-10-26T09:55:39Z

Vimala: Blanked the page

File:Guide.pdf

2020-10-26T09:55:36Z

Vimala: File uploaded with MsUpload

File uploaded with MsUpload

Practical session, Spark streaming for Twitter data analysis

2020-10-26T09:54:28Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming Folder]
*[[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0| Spark Streaming]]
*[[:File:Guide.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T09:54:12Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming Folder]
*[[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0|Spark streaming]]
*[[:File:Guide.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

File:SS.zip

2020-10-26T09:53:16Z

Vimala:

[https://www.youtube.com/playlist?list=PLKjwSP1bnrvNotmCafZrIgiuihBgoleV0 Spark Streaming]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T09:44:40Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[https://www.youtube.com/watch?v=2NiT6sK8Yf4/ Accessing Spark streaming Folder]
*[[:File:SS.zip|Spark streaming Tutorial files]]
*[[:File:Guide.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Practical session, Spark streaming for Twitter data analysis

2020-10-26T09:43:51Z

Vimala:

==Register a Twitter app==
Create a http://twitter.com account if you do not have one and log into it.

Go to https://apps.twitter.com/ . Make sure you are still logged in: you should see a drop-down menu in the upper right-hand corner of the page.

Click ''Create New App'' and fill in as many details as you can (you can change most of them later). Click ''Create your Twitter application''.

Go to ''Keys and Access Tokens''. Click ''Create my access token''. You will need the following four key strings need later (keep them secret to protect your Twitter account):
* Consumer Key (API Key)
* Consumer Secret (API Secret)
* Access Token
* Access Token Secret

==Stream Twitter messages into Spark==
'''1) You can use the following links to access the tutorial.'''

*[:File:v1.mp4/ Accessing Spark streaming Folder]
*[[:File:SS.zip|Spark streaming Tutorial files]]
*[[:File:Guide.pdf|Installaion guide for Spark streaming]]

'''2)Or you can follow the below steps:'''

To test that login works, open a ''spark-shell'' (remember the --jars SPARK_JARS option if you defined such an environment variable earlier), and import these APIs:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

Set logging level (to avoid warnings from running Spark standalone), and create a new ''spark streaming context (ssc)'':
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))

Set system properties for each of your keys and access tokens provided by Twitter earlier:
System.setProperty("twitter4j.oauth.consumerKey", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.consumerSecret", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessToken", "...copy this string from the Twitter app page...")
System.setProperty("twitter4j.oauth.accessTokenSecret", "...copy this string from the Twitter app page...")

Create a Spark stream with messages from Twitter:
val stream = TwitterUtils.createStream(ssc, None)

Because Spark has lazy evaluation, nothing happens until you have defined some transformations on the stream and started it. The next two lines collect messages from Twitter, identified by their ''status'', split each message text into words, pick out only those words that start with a ''#'', and print them to the console:
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

==Starting and stopping streams==
You are now ready to start the stream:
ssc.start

This should write out current hashtags to the console. After a while, stop the stream with:
ssc.stop(false, true)
Here is the documentation page for [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.StreamingContext Spark's StreamingContext and other classes.]

When a streaming context has been stopped, it cannot be restarted, but it can be recreated as follows:
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

You can create an initialisation file, for example ''init-twitter.scala'', as follows:
<nowiki>
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._

// set Twitter oauth properties
System.setProperty("twitter4j.oauth.consumerKey", "2AvbG84msbL34BEHslMsWZTUR")
System.setProperty("twitter4j.oauth.consumerSecret", "gzEENqmZoMl2hhHvDIdaWzIF9ShMSLO0o7gh8csZnfqKdK6Y9H")
System.setProperty("twitter4j.oauth.accessToken", "14113114-1we8sJQs1z54dWfjWbUwZtDtkQYf3kDOrXLUMBFkZ")
System.setProperty("twitter4j.oauth.accessTokenSecret", "6Sq95ezVVRNTuLw7grKzm4czA32VqmlM0QwvaLjWLNl5A")

// create stream
sc.setLogLevel("ERROR")
val ssc = new StreamingContext(sc, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None)

// define transformations
val hashTags = stream.flatMap(status => status.getText.split(" ").filter(_.startsWith("#")))
hashTags.foreachRDD(rdd => rdd.foreach(str => println(":: " + str)))

// start
ssc.start

// to stop, use: ssc.stop(false, true)</nowiki>

Start ''spark-shell'' with an initialisation file as follows:
spark-shell -i init-twitter.scala

'''Todo:'''
# partition by hashtag, save message text, save bags of words [https://www.toptal.com/apache/apache-spark-streaming-twitter]

Sessions

2020-10-22T12:25:28Z

Vimala:

This is the schedule for INFO319 in the autumn of 2019. The dates should be fixed at this stage, but the themes may change a little.

== Dates and Tentative Themes ==
*Monday 2020-08-18 1415: [[Information meeting via Zoom]]
* Tuesday 2020-08-20 1015: [[Session 1 - Introduction to INFO319 and Emergency Management]]
* Tuesday 2020-08-20 1400: [[Practical session, trying emergency response tool (Ushahidi)]]
* Wednesday 2020-08-21 1015: [[Session 2 - Big Data for emergency management]]
* Wednesday 2020-08-21 1400: [[Practical session, introduction to Spark]],
* Friday 2020-08-28 14:00 [[Essay | Essay selection deadline]] (email to [mailto:vimala.nunavath@usn.no vimala.nunavath@usn.no])
* Thursday 2020-10-01 1015: [[Session 3 - Emergency data sources]]
* Thursday 2020-10-01 1400: [[Practical session, using Spark for emergency datasources]]
* Friday 2020-10-02 1015: [[Session 4 - Sensors/IoT for Emergency Management]]
* Friday 2020-10-09 1400:[[Programming project | Project selection deadline]] (email to [mailto:vimala.nunavath@usn.no vimala.nunavath@usn.no])
* Thursday 2020-10-29 1015: [[Session 5 - Social Media for Emergency Management]]
* Thursday 2020-10-29 1400: [[Practical session, Spark streaming for Twitter data analysis]]
* Friday 2020-10-30 1015: [[Session 6 - Machine learning and NLP for EM ]]
* Friday 2020-10-30 1400: [[Practical session, Spark streaming for sentiment analysis]]
* Friday 2020-11-20 1400: [[Essay | Essay submission deadline]] (submit PDF-file through Inspera)
* Thursday 2020-11-26 1015: [[Session 7 - Essay presentations (mandatory)]]
* Wednesday 2020-11-25 1400: [[Programming project | Project deadline]] (submit ZIP-file through Inspera)
* Friday 2020-11-27 1015: [[Session 8 - Student programming project presentations (mandatory)]]
* Tuesday 2020-12.01: Written exam

[https://tp.uio.no/uib/timeplan/timeplan.php?id=INFO319&type=course&sort=week&sem=18h&lang=en.]

== Locations ==
* '''Information meeting:''' via Zoom.
* '''Regular sessions:''' Lauritz Meltzers hus (the Social Science / SV building), seminar room 548 (6th floor)

== Format ==
The regular sessions will be a combination of lectures, student presentations and discussions. You will all be expected to present 3-4 papers/chapters as part of the course. I will try to balance workload evenly (so if some people get two papers and other three, the two papers will be longer and the three papers shorter).

We will try to finish each session by 1600.

== Readings ==
The detailed readings for each session will be made available on this page in due time.

== Presenting a chapter from the book ==
* Prepare presentation for about 20 minutes. We will set off 5-10 more minutes for discussion and comments.
* Each slide should contain at least one of the sections from the allocated chapter.
* Prepare slides. For a 20 minute presentation, 10 slides is the maximum.
* Rehearse a few times beforehand. Talk through the presentation out loud for yourself (not just "inside your head").
* Share your slides by uploading them to the file section here in the portal.

== Presenting a research article ==
Here are a few points about the paper presentations and preparations:
* Make sure you start reading at least a week before, not in the last 2-3 days. You need time to let the paper sink in a bit before you start preparing the presentation. That way it is easier to see and present the big picture.
* This may be the first time you read a research paper. I have tried to choose papers that are rather short and simple but, nevertheless, some parts of almost every paper will be hard for you to understand. If you come across difficult details, try to focus on the purpose of what they are doing. When they mention, for example, statistical techniques, I do not expect that you read up on statistics. But explain why they need statistics and tell us the names of the techniques they use and on what data.
* Plan each presentation for about 20 minutes. We will set off 5-10 more minutes for discussion and comments.
* Prepare slides. For a 20 minute presentation, 10 slides is the maximum.
* Your presentation should try to answer at least the following: What is the problem the paper addresses? Why is this an important problem? Are the authors targetting a particular usage domain? What solutions do they propose? How does the solution work? Have they evaluated the solution? If so how? If not yet, how are they planning to evaluate it - or how do you think they should evaluate it? What are the limitations of the proposal? Do you see problems with what they are doing?
* These questions are not all suitable for all papers, so you must make a pick! Maybe there are other things you should say about the paper too. Some of the papers mostly describe a problem or a case study, for example, so the presentations will be quite different.
* Rehearse a few times beforehand. Talk through the presentation out loud for yourself (not just "inside your head").
* Share your slides by uploading them to the file section here in the portal.
* Some papers are longer and some shorter, some easier and some harder. This is how it has to be, but I will try to balance it out so that the workload on each of you is as equal as possible.

== Uploading your presentation ==
Sharing your presentation slides is a mandatory part of the presentation. You can upload your slides through Inspera in this group
[https://mitt.uib.no/groups/? mitt.uib.no/groups/?]. If you are not already a member, you can register yourself (the group is open).

Please use file names like this: "Session2-Pathologies-ALO.pdf", so that "Session2" is the session, "Pathologies" is a central term in the paper title and "ALO" are your initials.

Session 5 - Social Media for Emergency Management

2020-10-19T12:11:52Z

Vimala:

'''Date:''' Thursday, October 29th, 2020.

'''Scientific Papers'''
* 1)Lu (Lucy) Yan , Alfonso J. Pedraza-Martinez*, '''Social Media for Disaster Management: Operational Value of the Social Conversation''', [[:File:Lu.pdf]].
* 2)Angel Martína,⁎, Ana Belén Anquela Juliána, Fernando Cos-Gayónb, '''Analysis of Twitter messages using big data tools to evaluate and locate the activity in the city of Valencia (Spain)''',
[[:File:angel.pdf]].
* 3) Umair Qazi, Muhammad Imran, Ferda Ofli, '''GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information''', [[:File:umair.pdf]]

'''Chapters are from the book: Big Data in emergency Management: Exploitation techniques for social and mobile data'''
* Chapter 5: Social Media Mining for Disaster Management and Community Resilience by Hemant Purohit and Steve Peterson [chapter can be found in Canvas]
* Chapter 8: Emergency Information Visualization by Hoang Long Nguyen and Rajendra Akerkar [chapter can be found in Canvas]

'''Suitable readings:'''
* [[http://spark.apache.org/docs/latest/streaming-programming-guide.html Spark Streaming Programming Guide]]
* [[https://mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis/ Spark Streaming, Minimized]]
* [https://www.infoq.com/articles/apache-spark-streaming Spark Streaming]
*[[https://www.the-lazy-dev.com/en/spark-for-beginners-tutorials-apache-spark-streaming-twitter-java-example/ Create a Twitter developer account]]
* Social media and disasters: Current uses, future options, and policy considerations: [[:File:CRS.pdf]]
* [http://infolab.stanford.edu/~ullman/mmds/ch10.pdf Social Network Graphs]
'''Topics:'''
* Follow up:
** about the programming project topics

* Paper presentations by you:
**See Canvas for paper allocation

'''Slides:'''
Session-5: [[See canvas]]

Session 5 - Social Media for Emergency Management

2020-10-19T11:19:45Z

Vimala:

'''Date:''' Thursday, October 29th, 2020.

'''Scientific Papers'''
* 1)Lu (Lucy) Yan , Alfonso J. Pedraza-Martinez*, '''Social Media for Disaster Management: Operational Value of the Social Conversation''', [[:File:Lu.pdf]].
* 2)Angel Martína,⁎, Ana Belén Anquela Juliána, Fernando Cos-Gayónb, '''Analysis of Twitter messages using big data tools to evaluate and locate the activity in the city of Valencia (Spain)''',
[[:File:angel.pdf]].
* 3) Umair Qazi, Muhammad Imran, Ferda Ofli, '''GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information''', [[:File:umair.pdf]]
* 4)Ferda Ofli, Firoj Alam, Muhammad Imran, '''Analysis of Social Media Data using Multimodal Deep Learning for Disaster Response''', [[:File:ofli.pdf]]

'''Chapters are from the book: Big Data in emergency Management: Exploitation techniques for social and mobile data'''
* Chapter 5: Social Media Mining for Disaster Management and Community Resilience by Hemant Purohit and Steve Peterson [chapter can be found in Canvas]
* Chapter 8: Emergency Information Visualization by Hoang Long Nguyen and Rajendra Akerkar [chapter can be found in Canvas]

'''Suitable readings:'''
* [[http://spark.apache.org/docs/latest/streaming-programming-guide.html Spark Streaming Programming Guide]]
* [[https://mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis/ Spark Streaming, Minimized]]
* [https://www.infoq.com/articles/apache-spark-streaming Spark Streaming]
*[[https://www.the-lazy-dev.com/en/spark-for-beginners-tutorials-apache-spark-streaming-twitter-java-example/ Create a Twitter developer account]]
* Social media and disasters: Current uses, future options, and policy considerations: [[:File:CRS.pdf]]
* [http://infolab.stanford.edu/~ullman/mmds/ch10.pdf Social Network Graphs]
'''Topics:'''
* Follow up:
** about the programming project topics

* Paper presentations by you:
**See Canvas for paper allocation

'''Slides:'''
Session-5: [[See canvas]]

Session 6 - Machine learning and NLP for EM

2020-10-19T11:16:29Z

Vimala:

'''Date:''' Friday, October 30th, 2020.

'''Scientific papers for presentation'''
* 1) Ebtesam Alomari1, Rashid Mehmood2, and Iyad Katib1,2019 '''Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark, [[:File:ebtesam.pdf]]
* 2) J. Rexiline Raginia, P.M. Rubesh Anandb,⁎, Vidhyacharan Bhaskarc, 2018, '''Big data analytics for disaster response and recovery through sentiment analysis''', International Journal of Information Management, PP: 1--13, [[:File:P5-2019.pdf]]
* 3) Deniz Kılınç, 2019, '''A Spark-based big data analysis framework for real-time sentiment prediction on streaming data''', Journal of Software, Practice and Experience, PP: 1--13, [[:File:P6-2019.pdf]]
* 4)Van Quan Nguyen , Tien Nguyen Anh and Hyung-Jeong Yang '''Real-time event detection using recurrent neural network in social sensors''', [[:File:van.pdf]]

'''Chapter from the Book: The Data Revolution by Rob Kitchin'''
* Chapter 6:Data analytics

'''Chapters from Book:Big Data in emergency Management:Exploitation Techniques for social and mobile data'''
* Chapter 4: Knowledge Graphs and Natural-Language Processing [Chapter will be uploaded in Canvas]

'''Suitable readings:'''
* [https://spark.apache.org/docs/latest/ml-guide.html Machine Learning Library (MLlib) Guide]
* [https://www.edureka.co/blog/spark-streaming/ Spark Streaming Tutorial – Sentiment Analysis Using Apache Spark]
* [https://www.technologyreview.com/s/513696/deep-learning/ Artificial intelligence is finally getting smart]
* [https://www.kaggle.com/kanncaa1/machine-learning-tutorial-for-beginners Machine learning]
* [https://databricks.com/blog/2014/10/15/application-spotlight-tableau-software.html Data visualization on Tableau with Spark]
* [https://spark.apache.org/docs/1.4.1/ml-guide.html#pipeline Spark MLlib]
* Francesco Tarasconi, Michela Farina, Antonio Mazzei, Alessio Bosca, 2017, '''The Role of Unstructured Data in Real-Time Disaster-related Social Media Monitoring''', IEEE International Conference on Big Data, PP:1--10, [[:File:P6a.pdf]].

* Chapter and Paper presentations by you:

** See Canvas for paper allocation

'''Slides:'''
Session-6: [[See canvas]]

'''Useful Links:'''
* [https://www.youtube.com/watch?v=5Zg-C8AAIGg The Beauty of Data Visualization by David]
*[https://www.youtube.com/watch?v=vFu7EFfC-Xc NLP for Social Media]