Title: | Aggregate Longitudinal Survey Data |
---|---|
Description: | Aggregate Business Tendency Survey Data (and other qualitative surveys) to time series at various aggregation levels. Run aggregation of survey data in a speedy, re-traceable and a easily deployable way. Aggregation is substantially accelerated by use of data.table. This package intends to provide an interface that is less general and abstract than data.table but rather geared towards survey researchers. |
Authors: | Matthias Bannert [aut, cre], Gabriel Bucur [aut] |
Maintainer: | Matthias Bannert <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2025-02-12 05:41:02 UTC |
Source: | https://github.com/cran/panelaggregation |
This data was created by simulation to mimmick a firm level dataset stemming from business tendency surveys. The data was simulated because of privacy concerns with micro level firm data. For convenience the dataset contains two different date notations. Also 5 qualitative 3-item questions are included. Business tendency survey data is often weighted with company size represented by the number of employees. Thus the weight column is quantitative and its distribution is somewhat (!) reasonable with respect to the distribution of employees in a typical firm sample.
A data frame with 27000 rows and 13 variables
uid unique company identifier
year numeric year column
weight quantitative weight
question\_1
question\_2
question\_3
question\_4
question\_5
group group to mimmick different sectors / branches of trade
altGroup another alternative grouping columns
sClass a column denoting discrete size classes small (S), medium (M) and large (L)
date\_qtrly quarterly dates stored in a single column.
Matthias Bannert
Randomly generated in R using the sample generator from https://github.com/mbannert/gateveys/blob/master/R/gateveys.R
This function computes balances (i.e. positive - negative items), from item shares stored in a wide format data.table.
computeBalance(data_table, multipliers = list(item_pos = 1, item_eq = 0, item_neg = -1))
computeBalance(data_table, multipliers = list(item_pos = 1, item_eq = 0, item_neg = -1))
data_table |
a data.table in wide format containing item |
multipliers |
list containing multipliers of items, assigned by item and column names |
Matthias Bannert, Gabriel Bucu
This function computes the weighted mean of variable groups from a data.table. computeWeightedMean is performance optimized and designed to work well in bulk operations. The function returns a data.table.
computeWeightedMeans(data_table, variables, weight, by)
computeWeightedMeans(data_table, variables, weight, by)
data_table |
a data.table |
variables |
character name of the variable(s) to focus on. The variables must be in the data.table |
weight |
character name of the data.table column that contains a weight. |
by |
character vector of the columns to group by |
Matthias Bannert, Gabriel Bucur
# TODO: add new weight columns to BTS demo # load library and dataset library(panelaggregation) data(btsdemo) head(btsdemo) # adapt the levels to positive, equal and negative # in order to suit the naming defaults. other levels work too, # but you'd need to specify multipliers in computeBalance then levels(btsdemo$question_1) <- c("pos","eq","neg") # compute the weighted shares and display store in wide format # to get a basis for further steps level1 <- computeShares(btsdemo,"question_1","weight", by = c("date_qtrly","group", "altGroup", "sClass")) # compute balance, don't have to do much here, because # (pos, eq, neg) is the default for the possible answers level1_wbalance <- computeBalance(level1) # Select a particular grouping combination and a timeseries that # should be extracted from the level 1 aggregation. ts1 <- extractTimeSeries(level1_wbalance, "date_qtrly", list(group = "C", altGroup = "a", sClass = "S"), freq = 4, item = "balance", variable = "question_1") ts1 # Plot a standard R ts using the plot method for ts plot(ts1, main = attributes(ts1)$ts_key) # Add weight column to the aggregated results # In order to join the tables, we need to know what weight to assign to each row. # This is done by having via a common key, for example c('group', 'altGroup'). # In this example we would assign a different weight for each # c('group', 'altGroup') combination (e.g. c('A', 'a')). btsweight1 <- btsdemo[, list(weight = sum(weight)), by = 'group'] btsagg1 <- joinDataTables(level1_wbalance, btsweight1, 'group') # Compute second level aggregation, this time on fewer columns and using a different set of weights. level2_balance <- computeWeightedMeans(btsagg1, c('item_pos', 'item_eq', 'item_neg', 'balance'), 'weight', c("date_qtrly","group", "sClass")) # Select a particular grouping combination and a timeseries that # should be extracted from the level 2 aggregation. ts2 <- extractTimeSeries(level2_balance, "date_qtrly", list(group = "C", sClass = "S"), freq = 4, item = "balance", variable = "question_1") ts2 # Plot a standard R ts using the plot method for ts plot(ts2, main = attributes(ts2)$ts_key) # Add weight column to the aggregated results # In order to join the tables, we need to know what weight to assign to each row. # This is done by having via a common key, for example c('group', 'altGroup'). # In this example we would assign a different weight for each # c('group', 'altGroup') combination (e.g. c('A', 'a')). btsweight2 <- btsdemo[, list(weight = sum(weight)), by = 'sClass'] btsagg2 <- joinDataTables(level2_balance, btsweight2, 'sClass') # Compute third level of aggregation, on the whole sector, using yet another set of weights. level3_balance <- computeWeightedMeans(btsagg2, 'balance', 'weight', c("date_qtrly", "sClass")) # Select a particular grouping combination and a timeseries that # should be extracted from the level 2 aggregation. ts3 <- extractTimeSeries(level3_balance, "date_qtrly", list(sClass = "S"), freq = 4, item = "balance", variable = "question_1") ts3 # Plot a standard R ts using the plot method for ts plot(ts3, main = attributes(ts3)$ts_key)
# TODO: add new weight columns to BTS demo # load library and dataset library(panelaggregation) data(btsdemo) head(btsdemo) # adapt the levels to positive, equal and negative # in order to suit the naming defaults. other levels work too, # but you'd need to specify multipliers in computeBalance then levels(btsdemo$question_1) <- c("pos","eq","neg") # compute the weighted shares and display store in wide format # to get a basis for further steps level1 <- computeShares(btsdemo,"question_1","weight", by = c("date_qtrly","group", "altGroup", "sClass")) # compute balance, don't have to do much here, because # (pos, eq, neg) is the default for the possible answers level1_wbalance <- computeBalance(level1) # Select a particular grouping combination and a timeseries that # should be extracted from the level 1 aggregation. ts1 <- extractTimeSeries(level1_wbalance, "date_qtrly", list(group = "C", altGroup = "a", sClass = "S"), freq = 4, item = "balance", variable = "question_1") ts1 # Plot a standard R ts using the plot method for ts plot(ts1, main = attributes(ts1)$ts_key) # Add weight column to the aggregated results # In order to join the tables, we need to know what weight to assign to each row. # This is done by having via a common key, for example c('group', 'altGroup'). # In this example we would assign a different weight for each # c('group', 'altGroup') combination (e.g. c('A', 'a')). btsweight1 <- btsdemo[, list(weight = sum(weight)), by = 'group'] btsagg1 <- joinDataTables(level1_wbalance, btsweight1, 'group') # Compute second level aggregation, this time on fewer columns and using a different set of weights. level2_balance <- computeWeightedMeans(btsagg1, c('item_pos', 'item_eq', 'item_neg', 'balance'), 'weight', c("date_qtrly","group", "sClass")) # Select a particular grouping combination and a timeseries that # should be extracted from the level 2 aggregation. ts2 <- extractTimeSeries(level2_balance, "date_qtrly", list(group = "C", sClass = "S"), freq = 4, item = "balance", variable = "question_1") ts2 # Plot a standard R ts using the plot method for ts plot(ts2, main = attributes(ts2)$ts_key) # Add weight column to the aggregated results # In order to join the tables, we need to know what weight to assign to each row. # This is done by having via a common key, for example c('group', 'altGroup'). # In this example we would assign a different weight for each # c('group', 'altGroup') combination (e.g. c('A', 'a')). btsweight2 <- btsdemo[, list(weight = sum(weight)), by = 'sClass'] btsagg2 <- joinDataTables(level2_balance, btsweight2, 'sClass') # Compute third level of aggregation, on the whole sector, using yet another set of weights. level3_balance <- computeWeightedMeans(btsagg2, 'balance', 'weight', c("date_qtrly", "sClass")) # Select a particular grouping combination and a timeseries that # should be extracted from the level 2 aggregation. ts3 <- extractTimeSeries(level3_balance, "date_qtrly", list(sClass = "S"), freq = 4, item = "balance", variable = "question_1") ts3 # Plot a standard R ts using the plot method for ts plot(ts3, main = attributes(ts3)$ts_key)
This function makes use of CJ
function of the data.table package to perform a
cross join. The function makes sure that the combinations are unique and removes NAs before
joining. doUniqueCJ is rather not used as a standalone function but inside computeShares
.
doUniqueCJ(dt, cols)
doUniqueCJ(dt, cols)
dt |
data.table |
cols |
character vector that denotes names of relevant columns |
Matthias Bannert, Gabriel Bucur
This function extracts time series from data.table columns and returns object of class ts.
extractTimeSeries(data_table, time_column, group_list, freq, item, variable, prefix = "CH.KOF.IND")
extractTimeSeries(data_table, time_column, group_list, freq, item, variable, prefix = "CH.KOF.IND")
data_table |
a data.table |
time_column |
character name of the column which contains the time index |
group_list |
list or NULL |
freq |
integer value either 4 denoting quarterly frequency or 12 denoting quarterly frequency |
item |
character name of the column which contains the item that is extracted from the data.table |
variable |
character name of the variable selected |
prefix |
character prefix attached to the dynamically generated key string to identify the time series. Recommend key format: ISOcountry.provider.source.aggregationLevel.selectedGroup.variable.item |
Matthias Bannert, Gabriel Bucur
This function joins two data.table objects, given a common key, which can have different names in the two tables. In the latter case, the sequence of the names is crucial. Make sure that the key columns match exactly.
joinDataTables(dt_1, dt_2, key_1, key_2 = key_1)
joinDataTables(dt_1, dt_2, key_1, key_2 = key_1)
dt_1 |
first data.table |
dt_2 |
second data.table |
key_1 |
character vector of key columns for first data.table |
key_2 |
character vector of key columns for second data.table |
joined data.table
Matthias Bannert, Gabriel Bucur