Twitter Tasks
To authenticate with Twitter, run
massmine --task=twitter-auth
Check out the general usage examples to learn how to use MassMine. Below is a description of the tasks available for Twitter.
Task parameters marked * are required. For parameters marked with a + choose only one.
➾ twitter-auth
Sets up MassMine to make data requests under your Twitter account privileges. This task must be ran before using any other Twitter tasks, or an error will be returned.
Parameters
- auth: (Optionally) supply path to credentials file
Example
massmine --task=twitter-auth
➾ twitter-followers
Returns information each follower for a specified user.
Parameters
- user*: A Twitter user name
Example
massmine --task=twitter-followers --user=quinoa
➾ twitter-friends
Returns information on each friend of a specified user.
Parameters
- user*: A Twitter user name
Example
massmine --task=twitter-friends --user=quinoa
➾ twitter-locations
Returns a list of valid geo-locations as Yahoo Where on Earth Indentifiers (WOEIDS) accepted by Twitter. These WOEIDs can be used with some Twitter tasks that accept a geo parameter.
Parameters
-none-
Example
massmine --task=twitter-locations
➾ twitter-rehydrate
Returns “rehydrated” tweets based on supplied tweet IDs. This is Twitter’s preferred method for reviving old tweets. Note that this is also one of the few (or only) ways that Twitter allows researchers to share data with one another. The process involves gathering tweets (using other massmine twitter tasks) and then sharing only the tweet ID field of each tweet. This is allowed under specific conditions, and up to a specified number of tweets per unit of time—see Twitter’s API terms and conditions for up-to-date details. Next, the shared tweet IDs can be “rehydrated” using this task to retrieve the full tweet object’s data. In this roundabout way, you can pass a curated data set from one researcher to another. Note that there is no guarantee of consistency over time. This reflects Twitter’s attempt to allow users to control their data over time. For example, if the tweet is edited or deleted in the intervening time, you will receive the edited version at the time of rehydration, or nothing if the tweet has been deleted.
Parameters
- query*: A comma separated list of tweet IDs
Example
# Rehydrate a single tweet
massmine --task=twitter-rehydrate --query=595302290619260928
# Rehydrate multiple tweets with a comma-separated list
massmine --task=twitter-rehydrate --query=595302290619260928,595302291349118976
➾ twitter-sample
Returns a random sample of tweets as they occur in real time. Up to 1% of Twitter’s actual volume is returned. Returns up to a maximum number of tweets requested OR until a specified date/time is reached. Both “count” and “dur” can be specified, in which case the task finished whenever either target is reached.
Parameters
- count: (Optional) Maximum number of tweets to return
- dur: (Optional) Deadline, as ‘YYYY-MM-DD HH:MM:SS’
Example
# Request a specified number of tweets
massmine --task=twitter-sample --count=50
# Or, keep collecting until a time is reached
massmine --task=twitter-sample --dur='2015-10-11 14:30:00'
# This will finish whenever 50 tweets or the deadline is reached,
# whichever occurs first
massmine --task=twitter-sample --dur='2015-10-11 14:30:00' --count=50
➾ twitter-search
Search for pre-existing tweets matching a given search phrase. Not all tweets are indexed and made available by Twitter’s search, and search-able tweets are indexed for the last 7 days only. For better search coverage, consider using the twitter-stream task to capture tweets as they occur in real time.
Parameters
- query*: Search query string, using Twitter’s search formatting (see search operators at Twitter’s search site).
- count: (Optional) Maximum number of tweets to return.
- geo: (Optional) Return tweets from a location specified by ‘latitude,longitude,radius’, where radius can be specified either as “mi” (miles) or “km” (kilometers). For example, ‘37.781157,–122.398720,1mi’
- lang: (Optional) Return tweets of a given language, specified by an ISO 639–1 code.
Example
# Looking for love...
massmine --task=twitter-search --query=love --count=300
# ... in only certain places
massmine --task=twitter-search --query=love --count=300 --geo=37.781157,-122.398720,1mi
# ... in French
massmine --task=twitter-search --query=amour --count=300 --lang=fr
➾ twitter-search–30day
Search for pre-existing tweets matching a given search phrase using Twitter’s Premium service. Note that this endpoint requires a paid account with Twitter. Not all tweets are indexed and made available by Twitter’s search, and search-able tweets are indexed for the last 30 days only.
Important: The user is responsible for managing their requests-per-month rate limits. MassMine will adhere to Twitter’s per-second (10 requests per second) and per-minute (60 requests per minute) rate limits. However, each paid plan also has a requests-per-month limit that is determined by each user’s plan. Users can monitor their monthly rate limit status at Twitter’s developer dashboard.
Rate limits are determined by the number of requests to Twitter’s server. Each request can include up to 500 tweets. Because of this, requesting 100 tweets costs the same as requesting 500 tweets. As such, MassMine maximizes your return by providing results in 500-tweet chunks. Thus, users should request tweets in increments of 500 when using the count
parameter (request increments <500 will be rounded up). For instance, requesting --count=1800
or --count=2000
will both return up to 2000 tweets (2000 exactly, unless Twitter returns less than 2000 matches).
Parameters
- query*: Search query string, using Twitter’s search formatting (see Twitter’s premium search operators).
- date*: A date range (from:to) spanning no greater than the last 30 days. Dates should be formatted as from:to —> ‘YYYY-MM-DD-HH-MM:YYYY-MM-DD-HH-MM’ (note that single quotes around the date is likely required in your shell)
- count: (Optional) Maximum number of tweets to return (specified in increments of 500)
Example
massmine -t twitter-search-30day -q love -c 500 --date='2020-09-26-00-00:2020-10-22-00-00' -o lovetweets.ndjson
➾ twitter-stream
Returns tweets as they occur in real time, matching either a search phrase, a user name, or a location. Up to 1% of Twitter’s actual volume is returned. Returns up to a maximum number of tweets requested OR until a specified date/time is reached. Both “count” and “dur” can be specified, in which case the task finished whenever either target is reached.
Parameters
- query+: Search query string, using Twitter’s search formatting (see search operators at Twitter’s search site).
- user+: A Twitter user name to track. Multiple user names can be separated with commas.
- geo+: A bounding box described as longitude and latitude pairs, with the southwest corner of the box first, and the northeast corner second. For example, ’–122.75,36.8,–121.75,37.8′ specifies a box around San Francisco. Multiple boxes can be passed at once, with ’–122.75,36.8,–121.75,37.8,–74,40,–73,41’ specifying either San Francisco OR New York City.
- lang: (Optional) Return tweets of a given language, specified by an ISO 639–1 code.
- dur: (Optional) Deadline, as ‘YYYY-MM-DD HH:MM:SS’
- count: (Optional) Maximum number of tweets to return.
Example
# Search by keyword, with a max count OR deadline
massmine --task=twitter-stream --query=love --count=300 --dur='2015-10-11 14:30:00'
# Track a user in real time (may only make sense for HIGHLY active accounts).
# Here we track multiple users
massmine --task=twitter-stream --user=nasa,wired --dur='2015-10-11 14:30:00'
# Or, simply grab tweets coming out of New York City
massmine --task=twitter-stream --geo=-74,40,-73,41 --count=300
➾ twitter-trends
Returns the top–50 trends for a given location.
Parameters
- geo*: A location specified as a Yahoo Where on Earth Indentifier (WOEID). For a list of available WOEIDs, see the twitter-locations task.
Example
# Current trends in Seattle, Washington
massmine --task=twitter-trends --geo=2490383
➾ twitter-trends-nohash
Returns the top–50 trends for a given location, with #hashtags excluded.
Parameters
- geo*: A location specified as a Yahoo Where on Earth Indentifier (WOEID). For a list of available WOEIDs, see the twitter-locations task.
Example
# Current trends in Seattle, Washington
massmine --task=twitter-trends-nohash --geo=2490383
➾ twitter-user
Returns 1 or more users timelines (i.e., their tweet history), in reverse chronological order.
Parameters
- user*: A Twitter user name, or multiple user names separated by commas.
- count: (Optional) Maximum number of tweets to return, up to 3200 (max limit set by Twitter)
Example
# Let's get the last 10 tweets from NASA
massmine --task=twitter-user --user=nasa --count=10
# We can fetch 10 from both NASA and Wired in one shot:
massmine --task=twitter-user --user=nasa,wired --count=10