Title: | Analysis of Check-All-that-Apply (CATA) Data |
---|---|
Description: | Package contains functions for analyzing check-all-that-apply (CATA) data from consumer and sensory tests. Cochran's Q test, McNemar's test, and Penalty-Lift analysis are provided; for details, see Meyners, Castura & Carr (2013) <doi:10.1016/j.foodqual.2013.06.010>. Cluster analysis can be performed using b-cluster analysis, then evaluated using various measures; for details, see Castura, Meyners, Varela & Næs (2022) <doi:10.1016/j.foodqual.2022.104564>. Methods are adapted to cluster consumers based on their product-related hedonic responses; for details, see Castura, Meyners, Pohjanheimo, Varela & Næs (2023) <doi:10.1111/joss.12860>. |
Authors: | J.C. Castura [aut, cre, ctb]
|
Maintainer: | J.C. Castura <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0.26 |
Built: | 2025-02-18 01:58:14 UTC |
Source: | https://github.com/cran/cata |
Calculate the adjusted Rand index (ARI) between two sets of cluster memberships.
ARI(x, y, signif = FALSE, n = 1000)
ARI(x, y, signif = FALSE, n = 1000)
x |
vector of cluster memberships (integers) |
y |
vector of cluster memberships (integers) |
signif |
conduct significance test; default is |
n |
number of replicates in Monte Carlo significance test |
list of the following:
ari
adjusted Rand index
nari
normalized adjusted Rand index
sim.mean
average value of null distribution (should be closed to zero)
sim.var
variance of null distribution
p.value
P value of observed ARI (or NARI) value
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218. doi:10.1007/BF01908075.
Qannari, E.M., Courcoux, P., & Faye, P. (2014). Significance test of the adjusted Rand index. Application to the free sorting task. Food Quality and Preference, 32, 93-97. doi:10.1016/j.foodqual.2013.05.005.
x <- sample(1:3, 20, replace = TRUE) y <- sample(1:3, 20, replace = TRUE) ARI(x, y, signif = FALSE)
x <- sample(1:3, 20, replace = TRUE) y <- sample(1:3, 20, replace = TRUE) ARI(x, y, signif = FALSE)
Converts a three-dimensional array ( assessors,
products,
attributes) to a four-dimensional array of product
comparisons (
assessors,
product comparisons, two outcomes (of type
b
or c
),
attributes)
barray(X, values = "bc", type.in = "binary", type.out = "binary")
barray(X, values = "bc", type.in = "binary", type.out = "binary")
X |
three-dimensional array ( |
values |
|
type.in |
type of data submitted; default ( |
type.out |
currently only |
A four-dimensional array of product comparisons having
assessors,
product comparisons, outcomes (see
values
parameter), attributes
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) # Get the 4d array of CATA differences for the first 8 consumers b <- barray(bread$cata[1:8,,])
data(bread) # Get the 4d array of CATA differences for the first 8 consumers b <- barray(bread$cata[1:8,,])
By default, bcluster
calls a function to perform b-cluster analysis
by a non-hierarchical iterative ascent algorithm, then inspects results if
there are multiple runs.
bcluster(X, inspect = TRUE, inspect.plot = TRUE, algorithm = "n", measure = "b", G = NULL, M = NULL, max.iter = 500, X.input = "data", tol = exp(-32), runs = 1, seed = 2021)
bcluster(X, inspect = TRUE, inspect.plot = TRUE, algorithm = "n", measure = "b", G = NULL, M = NULL, max.iter = 500, X.input = "data", tol = exp(-32), runs = 1, seed = 2021)
X |
three-way array with |
inspect |
default ( |
inspect.plot |
default ( |
algorithm |
default is |
measure |
default is |
G |
number of clusters (required for non-hierarchical algorithm) |
M |
initial cluster memberships |
max.iter |
maximum number of iteration allowed (default |
X.input |
available only for non-hierarchical algorithm; its value is
either |
tol |
non-hierarchical algorithm stops if variance over 5 iterations is
less than |
runs |
number of runs (defaults to |
seed |
for reproducibility (default is |
list with elements:
runs
: b-cluster analysis results from bcluster.n
or bcluster.h
(in a list if runs>1
)
inspect
: result from inspect
(the plot from
this function is rendered if inspect.plot
is TRUE
)
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) # b-cluster analysis on the first 8 consumers and the first 5 attributes (b1 <- bcluster(bread$cata[1:8,,1:5], G=2, seed = 123)) # Since the seed is the same, the result will be identical to # (b2 <- bcluster.n(bread$cata[1:8,,1:5], G=2, seed = 123)) b3 <- bcluster(bread$cata[1:8,,1:5], G=2, runs = 5, seed = 123)
data(bread) # b-cluster analysis on the first 8 consumers and the first 5 attributes (b1 <- bcluster(bread$cata[1:8,,1:5], G=2, seed = 123)) # Since the seed is the same, the result will be identical to # (b2 <- bcluster.n(bread$cata[1:8,,1:5], G=2, seed = 123)) b3 <- bcluster(bread$cata[1:8,,1:5], G=2, runs = 5, seed = 123)
Perform b-clustering using the hierarchical agglomerative clustering strategy.
bcluster.h(X, measure = "b", runs = 1, seed = 2021)
bcluster.h(X, measure = "b", runs = 1, seed = 2021)
X |
three-way array; the |
measure |
currently only |
runs |
number of runs (defaults to |
seed |
for reproducibility (default is |
An object of class hclust
from hierarchical b-cluster
analysis results (a list of such objects if runs>1
), where each hclust
object has the structure described in hclust
as well as
the item retainedB
(a vector indicating the retained sensory
differentiation at each iteration (merger)).
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) # hierarchical b-cluster analysis on first 8 consumers and first 5 attributes b <- bcluster.h(bread$cata[1:8,,1:5]) plot(as.dendrogram(b), main = "Hierarchical b-cluster analysis", sub = "8 bread consumers on 5 attributes")
data(bread) # hierarchical b-cluster analysis on first 8 consumers and first 5 attributes b <- bcluster.h(bread$cata[1:8,,1:5]) plot(as.dendrogram(b), main = "Hierarchical b-cluster analysis", sub = "8 bread consumers on 5 attributes")
Non-hierarchical b-cluster analysis transfers assessors iteratively to reach a local maximum in sensory differentiation retained.
bcluster.n(X, G, M = NULL, measure = "b", max.iter = 500, runs = 1, X.input = "data", tol = exp(-32), seed = 2021)
bcluster.n(X, G, M = NULL, measure = "b", max.iter = 500, runs = 1, X.input = "data", tol = exp(-32), seed = 2021)
X |
CATA data organized in a three-way array (assessors, products, attributes) |
G |
number of clusters (required for non-hierarchical algorithm) |
M |
initial cluster memberships (default: |
measure |
|
max.iter |
maximum number of iteration allowed (default |
runs |
number of runs (defaults to |
X.input |
either |
tol |
algorithm stops if variance over 5 iterations is less than
|
seed |
for reproducibility (default is |
An object of class bclust.n
(or a list of such objects
if runs>1
), where each such object has the following components:
cluster
: vector of the final cluster memberships
totalB
: value of the total sensory differentiation in data set
retainedB
: value of sensory differentiation retained in b-cluster
analysis solution
progression
: vector of sensory differentiation retained in each
iteration
iter
: number of iterations completed
finished
: boolean indicates whether the algorithm converged
before max.iter
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) # b-cluster analysis on the first 8 consumers and the first 5 attributes (b <- bcluster.n(bread$cata[1:8, , 1:5], G=2))
data(bread) # b-cluster analysis on the first 8 consumers and the first 5 attributes (b <- bcluster.n(bread$cata[1:8, , 1:5], G=2))
Conduct Cochran's Q test assuming equal columns proportions for matched binary responses versus the alternative hypothesis of unequal column proportions.
cochranQ(X, quiet = FALSE, digits = getOption("digits"))
cochranQ(X, quiet = FALSE, digits = getOption("digits"))
X |
matrix of |
quiet |
if |
digits |
for rounding |
Method returns test statistic, degrees of freedom, and p value from Cochran's Q test.
Cochran's Q test results (statistic, degrees of freedom, p-value)
Cochran, W.G. (1950). The comparison of percentages in matched samples. Biometrika, 37, 256-266, doi:10.2307/2332378
Meyners, M., Castura, J.C., & Carr, B.T. (2013). Existing and new approaches for the analysis of CATA data. Food Quality and Preference, 30, 309-319, doi:10.1016/j.foodqual.2013.06.010
data(bread) # Cochran's Q test on the first 50 consumers on the first attribute ("Fresh") cochranQ(bread$cata[1:50, , 1], digits=3) # Same, returning only test statistics for the first 4 attributes t(res <- apply(bread$cata[1:50, , 1:4], 3, cochranQ, quiet=TRUE, digits=3))
data(bread) # Cochran's Q test on the first 50 consumers on the first attribute ("Fresh") cochranQ(bread$cata[1:50, , 1], digits=3) # Same, returning only test statistics for the first 4 attributes t(res <- apply(bread$cata[1:50, , 1:4], 3, cochranQ, quiet=TRUE, digits=3))
Apply top-k box coding to scale data. Using defaults give top-2 box (T2B) coding.
code.topk(X, zero.below = 8, one.above = 7)
code.topk(X, zero.below = 8, one.above = 7)
X |
input matrix |
zero.below |
default is |
one.above |
default is |
matrix X
with top-k coding applied
Castura, J.C., Meyners, M., Pohjanheimo, T., Varela, P., & Næs, T. (2023). An approach for clustering consumers by their top-box and top-choice responses. Journal of Sensory Studies, e12860. doi:10.1111/joss.12860
# Generate some data set.seed(123) X <- matrix(sample(1:9, 100, replace = TRUE), nrow = 5) # apply top-2 box (T2B) coding code.topk(X, zero.below = 8, one.above = 7)
# Generate some data set.seed(123) X <- matrix(sample(1:9, 100, replace = TRUE), nrow = 5) # apply top-2 box (T2B) coding code.topk(X, zero.below = 8, one.above = 7)
Raw results from CATA and Liking evaluations of six bread products samples by 161 consumers.
A list with 4 items:
$cata
: check-all-that-apply (CATA) data (array, 161 consumers x 6 breads x 31 sensory attributes)
$liking
: 9-point hedonic scale data (matrix, 161 consumers x 6 breads)
$ideal.cata
: check-all-that-apply (CATA) data for ideal bread (matrix, 161 consumers x 31 sensory attributes)
$liking
: 9-point hedonic scale data for ideal bread(vector, 161 consumers)
CATA data is coded 1
if the attribute is checked; otherwise it is coded 0
Meyners, M., Castura, J.C., & Carr, B.T. (2013). Existing and new approaches for the analysis of CATA data. Food Quality and Preference, 30, 309-319, doi:10.1016/j.foodqual.2013.06.010
data(bread) head(bread$cata)
data(bread) head(bread$cata)
Evaluate the quality of cluster analysis solutions using measures related to within-cluster product discrimination, between-cluster non-redundancy, overall diversity (coverage), average RV, sensory differentiation retained, and within-cluster homogeneity.
evaluateClusterQuality(X, M, alpha = .05, M.order = NULL, quiet = FALSE, digits = getOption("digits"), ...)
evaluateClusterQuality(X, M, alpha = .05, M.order = NULL, quiet = FALSE, digits = getOption("digits"), ...)
X |
three-way array; the |
M |
cluster memberships |
alpha |
significance level for two-tailed tests (default: |
M.order |
can be used to change the cluster numbers (e.g. to label
cluster 1 as cluster 2 and vice versa); defaults to |
quiet |
if |
digits |
significant digits (to display) |
... |
other parameters for |
A list containing cluster analysis quality measures:
$solution
:
Pct.b
= percentage of the total sensory differentiation
retained in the solution
min(NR)
= smallest observed between-cluster non-redundancy
Div_G
= overall diversity (coverage)
H_G
= overall homogeneity (weighted average of within-cluster
homogeneity indices)
avRV
= average RV coefficient for all between-cluster
comparisons
$clusters
:
ng
= number of cluster members
bg
= sensory differentiation retained in cluster
xbarg
= average citation rate in cluster
Hg
= homogeneity index within cluster (see
homogeneity
)
Dg
= within-cluster product discrimination
$nonredundancy.clusterpairs
:
square data frame showing non-redundancy for each pair of clusters (low values indicate high redundancy)
$rv.clusterpairs
:
square data frame with RV coefficient for each pair of clusters (high values indicate higher similarity in product configurations)
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) evaluateClusterQuality(bread$cata[1:8,,1:5], M = rep(1:2, each = 4))
data(bread) evaluateClusterQuality(bread$cata[1:8,,1:5], M = rep(1:2, each = 4))
Function to calculate the b-measure, which quantifies the sensory differentiation retained.
getb(X.b, X.c, oneI = FALSE, oneM = FALSE)
getb(X.b, X.c, oneI = FALSE, oneM = FALSE)
X.b |
three-way ( |
X.c |
array of same dimension as |
oneI |
indicates whether calculation is for one assessor (default:
|
oneM |
indicates whether calculation is for one attribute (default:
|
b-measure
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) bread.bc <- barray(bread$cata[1:8,,1:5]) getb(bread.bc[,,1,], bread.bc[,,2,])
data(bread) bread.bc <- barray(bread$cata[1:8,,1:5]) getb(bread.bc[,,1,], bread.bc[,,2,])
Within a group of N
consumers, the Homogeneity index lies between
1/N
(no homogeneity) to 1
(perfect homogeneity).
homogeneity(X, oneI = FALSE, oneM = FALSE)
homogeneity(X, oneI = FALSE, oneM = FALSE)
X |
three-way array; the |
oneI |
indicates whether calculation is for one assessor (default:
|
oneM |
indicates whether calculation is for one attribute (default:
|
homogeneity index
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E.M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39, doi:10.1016/j.foodqual.2018.09.006
data(bread) # homogeneity index for the first 7 consumers on the first 6 attributes homogeneity(bread$cata[1:7,,1:6])
data(bread) # homogeneity index for the first 7 consumers on the first 6 attributes homogeneity(bread$cata[1:7,,1:6])
Inspect many runs of b-cluster analysis. Calculate sensory differentiation retained and recurrence rate.
inspect(X, G = 2, bestB = NULL, bestM = NULL, inspect.plot = TRUE)
inspect(X, G = 2, bestB = NULL, bestM = NULL, inspect.plot = TRUE)
X |
list of multiple runs of b-cluster analysis results from
|
G |
number of clusters (required for non-hierarchical algorithm) |
bestB |
total sensory differentiation retained in the best solution. If
not provided, then |
bestM |
cluster memberships for best solution. If not provided, then
the best solution is determined from the runs provided (in |
inspect.plot |
default ( |
A data frame with unique solutions in rows and the following columns:
B
: Sensory differentiation retained
PctB
: Percentage of the total sensory differentiation retained
B.prop
: Proportion of sensory differentiation retained compared
to best solution
Raw.agree
: raw agreement with best solution
Count
: number of runs for which this solution was observed
Index
: list index (i.e., run number) of first solution
solution in X
corresponding to this row
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
data(bread) res <- bcluster.n(bread$cata[1:8, , 1:5], G = 3, runs = 3) (ires <- inspect(res)) # get index of solution retaining the most sensory differentiation (in these runs) indx <- ires$Index[1] # cluster memberships for solution of this solution res[[indx]]$cluster
data(bread) res <- bcluster.n(bread$cata[1:8, , 1:5], G = 3, runs = 3) (ires <- inspect(res)) # get index of solution retaining the most sensory differentiation (in these runs) indx <- ires$Index[1] # cluster memberships for solution of this solution res[[indx]]$cluster
Computes and returns inter-object median of absolute deviations (MADs) based differences.
mad.dist(X)
mad.dist(X)
X |
objects-by-terms matrix |
object of class dist
giving inter-object MAD distances
One citation, one vote! A new approach to the analysis of check-all-that-apply (CATA) data in sensometrics based on L1 norm methods.
data(bread) CATA.freq <- apply(bread$cata, 2:3, sum) # median-center columns (attributes) CATA.swept <- sweep(CATA.freq, 2, apply(CATA.freq, 2, median)) # cluster analysis of products using complete linkage dist.Products <- mad.dist(CATA.swept) plot(as.dendrogram(hclust(dist.Products, method = "complete")), main = "Product clusters") # cluster analysis of attributes using complete linkage dist.Att <- mad.dist(t(CATA.swept)) plot(as.dendrogram(hclust(dist.Att, method = "complete")), main = "Attribute clusters")
data(bread) CATA.freq <- apply(bread$cata, 2:3, sum) # median-center columns (attributes) CATA.swept <- sweep(CATA.freq, 2, apply(CATA.freq, 2, median)) # cluster analysis of products using complete linkage dist.Products <- mad.dist(CATA.swept) plot(as.dendrogram(hclust(dist.Products, method = "complete")), main = "Product clusters") # cluster analysis of attributes using complete linkage dist.Att <- mad.dist(t(CATA.swept)) plot(as.dendrogram(hclust(dist.Att, method = "complete")), main = "Attribute clusters")
Permutation tests for check-all-that-apply (CATA) data following the 'one citation, one vote' principle. Returns CATA frequency and percentage tables per condition and permutation test results specified.
madperm(X, B = 99, seed = .Random.seed, tests = 1:5, alpha = 0.05, control.fdr = FALSE, verbose = FALSE)
madperm(X, B = 99, seed = .Random.seed, tests = 1:5, alpha = 0.05, control.fdr = FALSE, verbose = FALSE)
X |
a three-way (or four-way) array with |
B |
permutations in null distribution; ensure |
seed |
specify a numeric seed for reproducibility; if not provided, a random seed is generated |
tests |
numeric vector specifying which tests to conduct; default
|
alpha |
Type I error rate (default: |
control.fdr |
control False Discovery Rate (using Benjamini-Hochberg (BH) step-up
procedure)? (default: |
verbose |
return null distribution(s) and function call? (default: |
list, one per condition:
CATA.table
: table of CATA citation percentages ()
CATA.freq
: CATA frequency table ()
Permutation test results specified by the tests
parameter
Global.Results
: list of multivariate (global) results
Univariate.Results
: list of univariate results
Elementwise.Results
: list of elementwise results
Multivariate.Paired.Results
: list of multivariate paired results
Univariate.Paired.Results
: list of univariate paired results
also, if verbose
is TRUE
:
Null.Dist
list of null distributions for tests specified
Call
: madperm
function call
One citation, one vote! A new approach to the analysis of check-all-that-apply (CATA) data in sensometrics based on L1 norm methods.
data(bread) # add product names X <- bread$cata[1:100,,1:5] dimnames(X)[[2]] <- paste0("P", dimnames(X)[[2]]) # permutation tests for the first 100 consumers and 5 attributes # will be run with default parameter values for illustrative purposes only res <- madperm(X, B = 99, seed = 123) print(res) # inspect results
data(bread) # add product names X <- bread$cata[1:100,,1:5] dimnames(X)[[2]] <- paste0("P", dimnames(X)[[2]]) # permutation tests for the first 100 consumers and 5 attributes # will be run with default parameter values for illustrative purposes only res <- madperm(X, B = 99, seed = 123) print(res) # inspect results
Pairwise tests are conducted using the two-tailed binomial test. These tests can be conducted after Cochran's Q test.
mcnemarQ(X, quiet = FALSE, digits = getOption("digits"))
mcnemarQ(X, quiet = FALSE, digits = getOption("digits"))
X |
matrix of |
quiet |
if |
digits |
for rounding |
Test results for all McNemar pairwise tests conducted via the binomial test
Cochran, W.G. (1950). The comparison of percentages in matched samples. Biometrika, 37, 256-266.
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153-157.
Meyners, M., Castura, J.C., & Carr, B.T. (2013). Existing and new approaches for the analysis of CATA data. Food Quality and Preference, 30, 309-319, doi:10.1016/j.foodqual.2013.06.010
data(bread) # McNemar's exact pairwise test for all product pairs # on the first 50 consumers and the first attribute ("Fresh") mcnemarQ(bread$cata[1:50, , 1]) # Same, returning only results for the first 4 attributes (res <- apply(bread$cata[1:50, , 1:4], 3, mcnemarQ, quiet=TRUE, simplify=FALSE))
data(bread) # McNemar's exact pairwise test for all product pairs # on the first 50 consumers and the first attribute ("Fresh") mcnemarQ(bread$cata[1:50, , 1]) # Same, returning only results for the first 4 attributes (res <- apply(bread$cata[1:50, , 1:4], 3, mcnemarQ, quiet=TRUE, simplify=FALSE))
Penalty-Lift analysis for CATA variables, which is the difference between the average hedonic response when CATA attribute is checked vs. the average hedonic response when CATA attribute is not checked.
plift(X, Y, digits = getOption("digits"), verbose = FALSE)
plift(X, Y, digits = getOption("digits"), verbose = FALSE)
X |
either a matrix of CATA data with |
Y |
matrix of hedonic data with |
digits |
for rounding |
verbose |
set to |
Penalty lift per attribute, with counts and averages if verbose
is TRUE
.
Meyners, M., Castura, J.C., & Carr, B.T. (2013). Existing and new approaches for the analysis of CATA data. Food Quality and Preference, 30, 309-319, doi:10.1016/j.foodqual.2013.06.010
data(bread) # penalty lift, based only on the first 12 consumers # for the first attribute ("Fresh") plift(bread$cata[1:12,,1], bread$liking[1:12, ], digits = 3) # for the first 3 attributes with counts and averages plift(bread$cata[1:12,,1:3], bread$liking[1:12, ], digits = 3, verbose = TRUE)
data(bread) # penalty lift, based only on the first 12 consumers # for the first attribute ("Fresh") plift(bread$cata[1:12,,1], bread$liking[1:12, ], digits = 3) # for the first 3 attributes with counts and averages plift(bread$cata[1:12,,1:3], bread$liking[1:12, ], digits = 3, verbose = TRUE)
CoefficientCalculate coefficient
rv.coef(X, Y, method = 1)
rv.coef(X, Y, method = 1)
X |
input matrix (same dimensions as |
Y |
input matrix (same dimensions as |
method |
|
coefficient
Robert, P., & Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods: the RV-coefficient. Journal of the Royal Statistical Society: Series C (Applied Statistics), 25, 257-265.
# Generate some data set.seed(123) X <- matrix(rnorm(8), nrow = 4) Y <- matrix(rnorm(8), nrow = 4) # get the RV coefficient rv.coef(X, Y)
# Generate some data set.seed(123) X <- matrix(rnorm(8), nrow = 4) Y <- matrix(rnorm(8), nrow = 4) # get the RV coefficient rv.coef(X, Y)
Calculate Salton's cosine measure
salton(X, Y)
salton(X, Y)
X |
input matrix (same dimensions as |
Y |
input matrix (same dimensions as |
Salton's cosine measure
Salton, G., & McGill, M.J. (1983). Introduction to Modern Information Retrieval. Toronto: McGraw-Hill.
# Generate some data set.seed(123) X <- matrix(rnorm(8), nrow = 4) Y <- matrix(rnorm(8), nrow = 4) # get Salton's cosine measure salton(X, Y)
# Generate some data set.seed(123) X <- matrix(rnorm(8), nrow = 4) Y <- matrix(rnorm(8), nrow = 4) # get Salton's cosine measure salton(X, Y)
Plot variation in retained sensory differentiation of cluster memberships obtained from b-cluster analysis. This plot can be used to help the decision of how many clusters to retain.
selectionPlot(x, pctB = NULL, x.input = "deltaB", indx = NULL, ylab = "change in B (K to G)", xlab = NULL)
selectionPlot(x, pctB = NULL, x.input = "deltaB", indx = NULL, ylab = "change in B (K to G)", xlab = NULL)
x |
input vector which is either deltaB (default; change
in sensory differentiation retained) or B (sensory differentiation
retained) if |
pctB |
vector of percentage of the total sensory differentiation retained |
x.input |
indicates what |
indx |
numeric value indicating which point(s) to emphasize |
ylab |
label shown on y axis and at selection point |
xlab |
label for points along x axis |
Castura, J.C., Meyners, M., Varela, P., & Næs, T. (2022). Clustering consumers based on product discrimination in check-all-that-apply (CATA) data. Food Quality and Preference, 104564. doi:10.1016/j.foodqual.2022.104564.
set.seed(123) G2 <- bcluster.n(bread$cata[1:8, , 1:5], G = 2, runs = 3) G3 <- bcluster.n(bread$cata[1:8, , 1:5], G = 3, runs = 3) G4 <- bcluster.n(bread$cata[1:8, , 1:5], G = 4, runs = 3) best.indx <- c(which.max(unlist(lapply(G2, function(x) x$retainedB))), which.max(unlist(lapply(G3, function(x) x$retainedB))), which.max(unlist(lapply(G4, function(x) x$retainedB)))) G1.bc <- barray(bread$cata[1:8, , 1:5]) G1.B <- getb(G1.bc[,,1,], G1.bc[,,2,]) BpctB <- data.frame(retainedB = c(G1.B, G2[[best.indx[1]]]$retainedB, G3[[best.indx[2]]]$retainedB, G4[[best.indx[3]]]$retainedB)) BpctB$pctB <- 100*BpctB$retainedB / G2[[1]]$totalB BpctB$deltaB <- c(100*(1-BpctB$retainedB[-nrow(BpctB)] / BpctB$retainedB[-1]), NA) BpctB <- BpctB[-nrow(BpctB),] opar <- par(no.readonly=TRUE) par(mar = rep(5,4)) selectionPlot(BpctB$deltaB, BpctB$pctB, indx = 2) par(opar)
set.seed(123) G2 <- bcluster.n(bread$cata[1:8, , 1:5], G = 2, runs = 3) G3 <- bcluster.n(bread$cata[1:8, , 1:5], G = 3, runs = 3) G4 <- bcluster.n(bread$cata[1:8, , 1:5], G = 4, runs = 3) best.indx <- c(which.max(unlist(lapply(G2, function(x) x$retainedB))), which.max(unlist(lapply(G3, function(x) x$retainedB))), which.max(unlist(lapply(G4, function(x) x$retainedB)))) G1.bc <- barray(bread$cata[1:8, , 1:5]) G1.B <- getb(G1.bc[,,1,], G1.bc[,,2,]) BpctB <- data.frame(retainedB = c(G1.B, G2[[best.indx[1]]]$retainedB, G3[[best.indx[2]]]$retainedB, G4[[best.indx[3]]]$retainedB)) BpctB$pctB <- 100*BpctB$retainedB / G2[[1]]$totalB BpctB$deltaB <- c(100*(1-BpctB$retainedB[-nrow(BpctB)] / BpctB$retainedB[-1]), NA) BpctB <- BpctB[-nrow(BpctB),] opar <- par(no.readonly=TRUE) par(mar = rep(5,4)) selectionPlot(BpctB$deltaB, BpctB$pctB, indx = 2) par(opar)
Converts a three-dimensional array ( assessors,
products,
attributes) to a two-dimensional matrix with
(
assessors,
products) rows and (
attributes) columns, optionally preceded by two columns of row headers.
toMatrix(X, header.rows = TRUE, oneI = FALSE, oneM = FALSE)
toMatrix(X, header.rows = TRUE, oneI = FALSE, oneM = FALSE)
X |
three-dimensional array ( |
header.rows |
|
oneI |
indicates whether calculation is for one assessor (default:
|
oneM |
indicates whether calculation is for one attribute (default:
|
A matrix with assessors
products in rows
and
attributes in columns (preceded by 2 columns)
of headers if
header.rows = TRUE
data(bread) # convert CATA results from the first 8 consumers and the first 4 attributes # to a tall matrix toMatrix(bread$cata[1:8,,1:4])
data(bread) # convert CATA results from the first 8 consumers and the first 4 attributes # to a tall matrix toMatrix(bread$cata[1:8,,1:4])
Apply top-c choices coding to a vector of scale data from a respondent
topc(x, c = 2, coding = "B")
topc(x, c = 2, coding = "B")
x |
input matrix |
c |
number of top choices considered to be 'success'; other choices are
considered to be 'failure' and are coded |
coding |
|
matrix X
with top-k coding applied
Castura, J.C., Meyners, M., Pohjanheimo, T., Varela, P., & Næs, T. (2023). An approach for clustering consumers by their top-box and top-choice responses. Journal of Sensory Studies, e12860. doi:10.1111/joss.12860
# Generate some data set.seed(123) X <- matrix(sample(1:9, 100, replace = TRUE), nrow = 5) # apply top-2 choice (T2C) coding apply(X, 1, topc)
# Generate some data set.seed(123) X <- matrix(sample(1:9, 100, replace = TRUE), nrow = 5) # apply top-2 choice (T2C) coding apply(X, 1, topc)
Converts a three-dimensional array ( assessors,
products,
attributes) to a two-dimensional matrix
(
products, (
assessors,
attributes))
toWideMatrix(X)
toWideMatrix(X)
X |
three-dimensional array ( |
A matrix with J
products in rows and
assessors
attributes in columns
data(bread) # convert CATA results from the first 8 consumers and the first 4 attributes # to a wide matrix toWideMatrix(bread$cata[1:8,,1:4])
data(bread) # convert CATA results from the first 8 consumers and the first 4 attributes # to a wide matrix toWideMatrix(bread$cata[1:8,,1:4])