A Gorn address (Gorn, 1967) is a method of identifying and addressing any node within a tree data structure. This notation is often used for identifying nodes in a parse tree defined by phrase structure rules. The Gorn address is a sequence of zero or more integers conventionally separated by dots, e.g., 0 or 1.0.1. The root which Gorn calls can be regarded as the empty sequence. And the j {\displaystyle j} -th child of the i {\displaystyle i} -th child has an address i . j {\displaystyle i.j} , counting from 0. It is named after American computer scientist Saul Gorn.
Triplet loss is a machine learning loss function widely used in one-shot learning, a setting where models are trained to generalize effectively from limited examples. It was conceived by Google researchers for their prominent FaceNet algorithm for face detection. Triplet loss is designed to support metric learning. Namely, to assist training models to learn an embedding (mapping to a feature space) where similar data points are closer together and dissimilar ones are farther apart, enabling robust discrimination across varied conditions. In the context of face detection, data points correspond to images. == Definition == The loss function is defined using triplets of training points of the form ( A , P , N ) {\displaystyle (A,P,N)} . In each triplet, A {\displaystyle A} (called an "anchor point") denotes a reference point of a particular identity, P {\displaystyle P} (called a "positive point") denotes another point of the same identity in point A {\displaystyle A} , and N {\displaystyle N} (called a "negative point") denotes a point of an identity different from the identity in point A {\displaystyle A} and P {\displaystyle P} . Let x {\displaystyle x} be some point and let f ( x ) {\displaystyle f(x)} be the embedding of x {\displaystyle x} in the finite-dimensional Euclidean space. It shall be assumed that the L2-norm of f ( x ) {\displaystyle f(x)} is unity (the L2 norm of a vector X {\displaystyle X} in a finite dimensional Euclidean space is denoted by ‖ X ‖ {\displaystyle \Vert X\Vert } .) We assemble m {\displaystyle m} triplets of points from the training dataset. The goal of training here is to ensure that, after learning, the following condition (called the "triplet constraint") is satisfied by all triplets ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} in the training data set: ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 + α < ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 {\displaystyle \Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}+\alpha <\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}} The variable α {\displaystyle \alpha } is a hyperparameter called the margin, and its value must be set manually. In the FaceNet system, its value was set as 0.2. Thus, the full form of the function to be minimized is the following: L = ∑ i = 1 m max ( ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 − ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 + α , 0 ) {\displaystyle L=\sum _{i=1}^{m}\max {\Big (}\Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}-\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}+\alpha ,0{\Big )}} == Intuition == A baseline for understanding the effectiveness of triplet loss is the contrastive loss, which operates on pairs of samples (rather than triplets). Training with the contrastive loss pulls embeddings of similar pairs closer together, and pushes dissimilar pairs apart. Its pairwise approach is greedy, as it considers each pair in isolation. Triplet loss innovates by considering relative distances. Its goal is that the embedding of an anchor (query) point be closer to positive points than to negative points (also accounting for the margin). It does not try to further optimize the distances once this requirement is met. This is approximated by simultaneously considering two pairs (anchor-positive and anchor-negative), rather than each pair in isolation. == Triplet "mining" == One crucial implementation detail when training with triplet loss is triplet "mining", which focuses on the smart selection of triplets for optimization. This process adds an additional layer of complexity compared to contrastive loss. A naive approach to preparing training data for the triplet loss involves randomly selecting triplets from the dataset. In general, the set of valid triplets of the form ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} is very large. To speed-up training convergence, it is essential to focus on challenging triplets. In the FaceNet paper, several options were explored, eventually arriving at the following. For each anchor-positive pair, the algorithm considers only semi-hard negatives. These are negatives that violate the triplet requirement (i.e, are "hard"), but lie farther from the anchor than the positive (not too hard). Restated, for each A ( i ) {\displaystyle A^{(i)}} and P ( i ) {\displaystyle P^{(i)}} , they seek N ( i ) {\displaystyle N^{(i)}} such that: The rationale for this design choice is heuristic. It may appear puzzling that the mining process neglects "very hard" negatives (i.e., closer to the anchor than the positive). Experiments conducted by the FaceNet designers found that this often leads to a convergence to degenerate local minima. Triplet mining is performed at each training step, from within the sample points contained in the training batch (this is known as online mining), after embeddings were computed for all points in the batch. While ideally the entire dataset could be used, this is impractical in general. To support a large search space for triplets, the FaceNet authors used very large batches (1800 samples). Batches are constructed by selecting a large number of same-category sample points (40), and randomly selected negatives for them. == Extensions == Triplet loss has been extended to simultaneously maintain a series of distance orders by optimizing a continuous relevance degree with a chain (i.e., ladder) of distance inequalities. This leads to the Ladder Loss, which has been demonstrated to offer performance enhancements of visual-semantic embedding in learning to rank tasks. In Natural Language Processing, triplet loss is one of the loss functions considered for BERT fine-tuning in the SBERT architecture. Other extensions involve specifying multiple negatives (multiple negatives ranking loss).
LamaH (Large-Sample Data for Hydrology and Environmental Sciences) is a cross-state initiative for unified data preparation and collection in the field of catchment hydrology. Hydrological datasets, for example, are an integral component for creating flood forecasting models. == Features == LamaH datasets always consist of a combination of meteorological time series (e.g., precipitation, temperature) and hydrologically relevant catchment attributes (e.g., elevation, slope, forest area, soil, bedrock) aggregated over the respective catchment as well as associated hydrological time series at the catchment outlet (discharge). By evaluating the large and heterogeneous sample (large-sample) of catchments, it is possible to gain insights into the hydrological cycle that would probably not be achievable with local and small-scale studies. The structure of the dataset allows an evaluation based on machine learning methods (deep learning). The accompanying paper explains not only the data preparation but also any limitations, uncertainties and possible applications. == Difference to CAMELS == The LamaH datasets are quite similar to the CAMELS datasets, but additionally feature: Further basin delineations (based on intermediate catchments) and attributes (e.g. flow distance and altitude difference between two topologically adjacent discharge gauges), enabling the setup of an interconnected hydrological network Attributes for classifying catchments and runoff gauges according to the degree and type of (anthropogenic) influence == Availability == LamaH datasets are available for the following regions: Central Europe (Austria and its hydrological upstream areas in Germany, Czech Republic, Switzerland, Slovakia, Italy, Liechtenstein, Slovenia and Hungary) / 859 catchments CAMELS datasets are available for (ranked by publication date): Contiguous USA (exclusive Alaska and Hawaii) / 671 catchments Chile / 516 catchments Brazil / 897 catchments Great Britain / 671 catchments Australia / 222 catchments Both the CAMELS and LamaH datasets are licensed with Creative Commons and are therefore available barrier-free for the public.
Genotypic and phenotypic repair are optional components of an evolutionary algorithm (EA). An EA reproduces essential elements of biological evolution as a computer algorithm in order to solve demanding optimization or planning tasks, at least approximately. A candidate solution is represented by a - usually linear - data structure that plays the role of an individual's chromosome. New solution candidates are generated by mutation and crossover operators following the example of biology. These offspring may be defective, which is corrected or compensated for by genotypic or phenotypic repair. == Description == Genotypic repair, also known as genetic repair, is the removal or correction of impermissible entries in the chromosome that violate restrictions. In phenotypic repair, the corrections are only made in the genotype-phenotype mapping and the chromosome remains unchanged. Michalewicz wrote about the importance of restrictions in real-world applications: "In general, constraints are an integral part of the formulation of any problem". Restriction violations are application-specific and therefore it depends on the current problem whether and which type of repair is useful. They can usually also be treated by a correspondingly extended evaluation and it depends on the problem which measures are possible and which is the most suitable. If a phenotypic repair is feasible, then it is usually the most efficient compared to the other measures. A survey on repair methods used as constraint handling techniques can be found in. Violations of the range limits of genes should be avoided as far as possible by the formulation of the genome. If this is not possible or if restrictions within the search space defined by the genome are involved, their violations are usually handled by the evaluation. This can be done, for example, by penalty functions that lower the fitness. Repair is often also required for combinatorial tasks. The application of a 1- or n-point crossover operator can, for example, lead to genes being missing in one of the child genomes that are present in duplicate in the other. In this case, a suitable genotypic repair measure is to move the surplus genes to the other genome in a positional manner. The use of the aforementioned operators in combinatorial tasks has also proven to be useful in combination with crossover types specially developed for permutations, at least for certain problems. Particularly in combinatorial problems, it has been observed that genotypic repair can promote premature convergence to a suboptimum, but can also significantly accelerate a successful search. Studies on various tasks have shown that this is application-dependent. An effective measure to avoid premature convergence is generally the use of structured populations instead of the usual panmictic ones. Sequence restrictions play a role in many scheduling tasks, for example when it comes to planning workflows. If, for example, it is specified that step A must be carried out before step B and the gene of step B is located before the gene of A in the chromosome, then there is an impermissible gene sequence. This is because the scheduling operation of step B requires the planned end of step A for correct scheduling, but this is not yet scheduled at the time gene B is processed. The problem can be solved in two ways: The scheduling operation of step B is postponed until the gene from step A has been processed. The genome remains unchanged and the repair only influences the genotype-phenotype mapping. Since only the phenotype is changed, this is referred to as phenotypic repair. If, on the other hand, the gene of step B is moved behind the gene of step A, this is a genotypic repair. The same applies to the alternative shift of gene A in front of gene B. In this case, genotypic repair has the disadvantage that it prevents a meaningful restructuring of the gene sequence in the chromosome if this requires several intermediate steps (mutations) that at least partially violate restrictions.
The Open Cloud Computing Interface (OCCI) is a set of specifications delivered through the Open Grid Forum, for cloud computing service providers. OCCI has a set of implementations that act as proofs of concept. It builds upon World Wide Web fundamentals by using the Representational State Transfer (REST) approach for interacting with services. == Scope == The aim of the Open Cloud Computing Interface is the development of an open specification and API for cloud offerings. The focus was on Infrastructure-as-a-Service (IaaS) based offerings but the interface can be extended to support Platform and Software as a Service offerings as well. IaaS is one of three primary segments of the cloud computing industry in which compute, storage and network resources are provided as services. The API is based on a review of existing service-provider functionality and a set of use cases contributed by the working group. OCCI is a boundary API that acts as a service front-end to an IaaS provider’s internal infrastructure management framework. OCCI provides commonly understood semantics, syntax and a means of management in the domain of consumer-to-provider IaaS. It covers management of the entire life-cycle of OCCI-defined model entities and is compatible with existing standards such as the Open Virtualization Format (OVF) and the Cloud Data Management Interface (CDMI). Notably, it serves as an integration point for standardization efforts including Distributed Management Task Force, Internet Engineering Task Force and the Storage Networking Industry Association. == Context == OCCI began in March 2009 and was initially led by RabbitMQ and the Complutense University of Madrid. Today, the working group has over 250 members and includes numerous individuals, industry and academic parties. The OCCI operates under the umbrella of the Open Grid Forum (OGF), using a wiki and a mailing list for collaboration. == Goals == Interoperability: allow different Cloud providers to work together without data schema/format translation, facade/proxying between APIs and understanding and/or dependency on multiple APIs Portability: no technical/vendor lock-in and enable services to move between providers allows clients to easily switch between providers based on business objectives (e.g., cost) with minimal technical costs, thus enabling and fostering competition. Integration: the specification can be implemented with both the latest infrastructures or legacy ones. Extensibility: thanks to the use of a meta-model and capabilities discovery features, an OCCI client is able to interact with any OCCI server using provider-specific OCCI extensions. == Specific Implementations == They implement specific extensions of OCCI for a particular service: IaaS, PaaS, brokering, etc. Several implementations have been announced or released. == Generic Implementations (frameworks) == Here are frameworks to build OCCI APIs. Complementing these are a variety of developer tools. == Alternatives == Alternative approaches include the use of the Cloud Infrastructure Management Interface (CIMI) and related standards set from DMTF and the Amazon Web Services interfaces from Amazon. (The latter have not been endorsed by any known Standards organization). OpenNebula conducted a survey of their users in which the results showed, 38% do not expose cloud APIs, their users only interface through the Sunstone GUI, 36% mostly use the Amazon Web Services API, and 26% mostly use the OpenNebula’s OCCI API or the OCCI API offered by rOCCI.
In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.
Blockmodeling is a set or a coherent framework, that is used for analyzing social structure and also for setting procedure(s) for partitioning (clustering) social network's units (nodes, vertices, actors), based on specific patterns, which form a distinctive structure through interconnectivity. It is primarily used in statistics, machine learning and network science. As an empirical procedure, blockmodeling assumes that all the units in a specific network can be grouped together to such extent to which they are equivalent. Regarding equivalency, it can be structural, regular or generalized. Using blockmodeling, a network can be analyzed using newly created blockmodels, which transforms large and complex network into a smaller and more comprehensible one. At the same time, the blockmodeling is used to operationalize social roles. While some contend that the blockmodeling is just clustering methods, Bonacich and McConaghy state that "it is a theoretically grounded and algebraic approach to the analysis of the structure of relations". Blockmodeling's unique ability lies in the fact that it considers the structure not just as a set of direct relations, but also takes into account all other possible compound relations that are based on the direct ones. The principles of blockmodeling were first introduced by Francois Lorrain and Harrison C. White in 1971. Blockmodeling is considered as "an important set of network analytic tools" as it deals with delineation of role structures (the well-defined places in social structures, also known as positions) and the discerning the fundamental structure of social networks. According to Batagelj, the primary "goal of blockmodeling is to reduce a large, potentially incoherent network to a smaller comprehensible structure that can be interpreted more readily". Blockmodeling was at first used for analysis in sociometry and psychometrics, but has now spread also to other sciences. == Definition == A network as a system is composed of (or defined by) two different sets: one set of units (nodes, vertices, actors) and one set of links between the units. Using both sets, it is possible to create a graph, describing the structure of the network. During blockmodeling, the researcher is faced with two problems: how to partition the units (e.g., how to determine the clusters (or classes), that then form vertices in a blockmodel) and then how to determine the links in the blockmodel (and at the same time the values of these links). In the social sciences, the networks are usually social networks, composed of several individuals (units) and selected social relationships among them (links). Real-world networks can be large and complex; blockmodeling is used to simplify them into smaller structures that can be easier to interpret. Specifically, blockmodeling partitions the units into clusters and then determines the ties among the clusters. At the same time, blockmodeling can be used to explain the social roles existing in the network, as it is assumed that the created cluster of units mimics (or is closely associated with) the units' social roles. Blockmodeling can thus be defined as a set of approaches for partitioning units into clusters (also known as positions) and links into blocks, which are further defined by the newly obtained clusters. A block (also blockmodel) is defined as a submatrix, that shows interconnectivity (links) between nodes, present in the same or different clusters. Each of these positions in the cluster is defined by a set of (in)direct ties to and from other social positions. These links (connections) can be directed or undirected; there can be multiple links between the same pair of objects or they can have weights on them. If there are not any multiple links in a network, it is called a simple network. A matrix representation of a graph is composed of ordered units, in rows and columns, based on their names. The ordered units with similar patterns of links are partitioned together in the same clusters. Clusters are then arranged together so that units from the same clusters are placed next to each other, thus preserving interconnectivity. In the next step, the units (from the same clusters) are transformed into a blockmodel. With this, several blockmodels are usually formed, one being core cluster and others being cohesive; a core cluster is always connected to cohesive ones, while cohesive ones cannot be linked together. Clustering of nodes is based on the equivalence, such as structural and regular. The primary objective of the matrix form is to visually present relations between the persons included in the cluster. These ties are coded dichotomously (as present or absent), and the rows in the matrix form indicate the source of the ties, while the columns represent the destination of the ties. Equivalence can have two basic approaches: the equivalent units have the same connection pattern to the same neighbors or these units have same or similar connection pattern to different neighbors. If the units are connected to the rest of network in identical ways, then they are structurally equivalent. Units can also be regularly equivalent, when they are equivalently connected to equivalent others. With blockmodeling, it is necessary to consider the issue of results being affected by measurement errors in the initial stage of acquiring the data. == Different approaches == Regarding what kind of network is undergoing blockmodeling, a different approach is necessary. Networks can be one–mode or two–mode. In the former all units can be connected to any other unit and where units are of the same type, while in the latter the units are connected only to the unit(s) of a different type. Regarding relationships between units, they can be single–relational or multi–relational networks. Further more, the networks can be temporal or multilevel and also binary (only 0 and 1) or signed (allowing negative ties)/values (other values are possible) networks. Different approaches to blockmodeling can be grouped into two main classes: deterministic blockmodeling and stochastic blockmodeling approaches. Deterministic blockmodeling is then further divided into direct and indirect blockmodeling approaches. Among direct blockmodeling approaches are: structural equivalence and regular equivalence. Structural equivalence is a state, when units are connected to the rest of the network in an identical way(s), while regular equivalence occurs when units are equally related to equivalent others (units are not necessarily sharing neighbors, but have neighbour that are themselves similar). Indirect blockmodeling approaches, where partitioning is dealt with as a traditional cluster analysis problem (measuring (dis)similarity results in a (dis)similarity matrix), are: conventional blockmodeling, generalized blockmodeling: generalized blockmodeling of binary networks, generalized blockmodeling of valued networks and generalized homogeneity blockmodeling, prespecified blockmodeling. According to Brusco and Steinley (2011), the blockmodeling can be categorized (using a number of dimensions): deterministic or stochastic blockmodeling, one–mode or two–mode networks, signed or unsigned networks, exploratory or confirmatory blockmodeling. == Blockmodels == Blockmodels (sometimes also block models) are structures in which: vertices (e.g., units, nodes) are assembled within a cluster, with each cluster identified as a vertex; from such vertices a graph can be constructed; combinations of all the links (ties), represented in a block as a single link between positions, while at the same time constructing one tie for each block. In a case, when there are no ties in a block, there will be no ties between the two positions that define the block. Computer programs can partition the social network according to pre-set conditions. When empirical blocks can be reasonably approximated in terms of ideal blocks, such blockmodels can be reduced to a blockimage, which is a representation of the original network, capturing its underlying 'functional anatomy'. Thus, blockmodels can "permit the data to characterize their own structure", and at the same time not seek to manifest a preconceived structure imposed by the researcher. Blockmodels can be created indirectly or directly, based on the construction of the criterion function. Indirect construction refers to a function, based on "compatible (dis)similarity measure between paris of units", while the direct construction is "a function measuring the fit of real blocks induced by a given clustering to the corresponding ideal blocks with perfect relations within each cluster and between clusters according to the considered types of connections (equivalence)". === Types === Blockmodels can be specified regarding the intuition, substance or the insight into the nature of the studied network; this can result in such models as follows: parent-child role systems, organizational hierarchies, systems of
An influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following the maximum expected utility criterion) can be modeled and solved. ID was first developed in the mid-1970s by decision analysts with an intuitive semantic that is easy to understand. It is now adopted widely and becoming an alternative to the decision tree which typically suffers from exponential growth in number of branches with each variable modeled. ID is directly applicable in team decision analysis, since it allows incomplete sharing of information among team members to be modeled and solved explicitly. Extensions of ID also find their use in game theory as an alternative representation of the game tree. == Semantics == An ID is a directed acyclic graph with three types (plus one subtype) of node and three types of arc (or arrow) between nodes. Nodes: Decision node (corresponding to each decision to be made) is drawn as a rectangle. Uncertainty node (corresponding to each uncertainty to be modeled) is drawn as an oval. Deterministic node (corresponding to special kind of uncertainty that its outcome is deterministically known whenever the outcome of some other uncertainties are also known) is drawn as a double oval. Value node (corresponding to each component of additively separable Von Neumann-Morgenstern utility function) is drawn as an octagon (or diamond). Arcs: Functional arcs (ending in value node) indicate that one of the components of additively separable utility function is a function of all the nodes at their tails. Conditional arcs (ending in uncertainty node) indicate that the uncertainty at their heads is probabilistically conditioned on all the nodes at their tails. Conditional arcs (ending in deterministic node) indicate that the uncertainty at their heads is deterministically conditioned on all the nodes at their tails. Informational arcs (ending in decision node) indicate that the decision at their heads is made with the outcome of all the nodes at their tails known beforehand. Given a properly structured ID: Decision nodes and incoming information arcs collectively state the alternatives (what can be done when the outcome of certain decisions and/or uncertainties are known beforehand) Uncertainty/deterministic nodes and incoming conditional arcs collectively model the information (what are known and their probabilistic/deterministic relationships) Value nodes and incoming functional arcs collectively quantify the preference (how things are preferred over one another). Alternative, information, and preference are termed decision basis in decision analysis, they represent three required components of any valid decision situation. Formally, the semantic of influence diagram is based on sequential construction of nodes and arcs, which implies a specification of all conditional independencies in the diagram. The specification is defined by the d {\displaystyle d} -separation criterion of Bayesian network. According to this semantic, every node is probabilistically independent on its non-successor nodes given the outcome of its immediate predecessor nodes. Likewise, a missing arc between non-value node X {\displaystyle X} and non-value node Y {\displaystyle Y} implies that there exists a set of non-value nodes Z {\displaystyle Z} , e.g., the parents of Y {\displaystyle Y} , that renders Y {\displaystyle Y} independent of X {\displaystyle X} given the outcome of the nodes in Z {\displaystyle Z} . == Example == Consider the simple influence diagram representing a situation where a decision-maker is planning their vacation. There is 1 decision node (Vacation Activity), 2 uncertainty nodes (Weather Condition, Weather Forecast), and 1 value node (Satisfaction). There are 2 functional arcs (ending in Satisfaction), 1 conditional arc (ending in Weather Forecast), and 1 informational arc (ending in Vacation Activity). Functional arcs ending in Satisfaction indicate that Satisfaction is a utility function of Weather Condition and Vacation Activity. In other words, their satisfaction can be quantified if they know what the weather is like and what their choice of activity is. (Note that they do not value Weather Forecast directly) Conditional arc ending in Weather Forecast indicates their belief that Weather Forecast and Weather Condition can be dependent. Informational arc ending in Vacation Activity indicates that they will only know Weather Forecast, not Weather Condition, when making their choice. In other words, actual weather will be known after they make their choice, and only forecast is what they can count on at this stage. It also follows semantically, for example, that Vacation Activity is independent on (irrelevant to) Weather Condition given Weather Forecast is known. == Applicability to value of information == The above example highlights the power of the influence diagram in representing an extremely important concept in decision analysis known as the value of information. Consider the following three scenarios; Scenario 1: The decision-maker could make their Vacation Activity decision while knowing what Weather Condition will be like. This corresponds to adding extra informational arc from Weather Condition to Vacation Activity in the above influence diagram. Scenario 2: The original influence diagram as shown above. Scenario 3: The decision-maker makes their decision without even knowing the Weather Forecast. This corresponds to removing informational arc from Weather Forecast to Vacation Activity in the above influence diagram. Scenario 1 is the best possible scenario for this decision situation since there is no longer any uncertainty on what they care about (Weather Condition) when making their decision. Scenario 3, however, is the worst possible scenario for this decision situation since they need to make their decision without any hint (Weather Forecast) on what they care about (Weather Condition) will turn out to be. The decision-maker is usually better off (definitely no worse off, on average) to move from scenario 3 to scenario 2 through the acquisition of new information. The most they should be willing to pay for such move is called the value of information on Weather Forecast, which is essentially the value of imperfect information on Weather Condition. The applicability of this simple ID and the value of information concept is tremendous, especially in medical decision making when most decisions have to be made with imperfect information about their patients, diseases, etc. == Related concepts == Influence diagrams are hierarchical and can be defined either in terms of their structure or in greater detail in terms of the functional and numerical relation between diagram elements. An ID that is consistently defined at all levels—structure, function, and number—is a well-defined mathematical representation and is referred to as a well-formed influence diagram (WFID). WFIDs can be evaluated using reversal and removal operations to yield answers to a large class of probabilistic, inferential, and decision questions. More recent techniques have been developed by artificial intelligence researchers concerning Bayesian network inference (belief propagation). An influence diagram having only uncertainty nodes (i.e., a Bayesian network) is also called a relevance diagram. An arc connecting node A to B implies not only that "A is relevant to B", but also that "B is relevant to A" (i.e., relevance is a symmetric relationship).
Hint (hint.app) is an American software platform that provides astrological content, personality assessments, and relationship compatibility tools. The application was launched in 2018 and is based in Claymont, Delaware. The platform has been described in media coverage as part of a broader trend of astrology-based and self-reflection applications, particularly among younger users. As of 2026, the company reports that it has reached more than 25 million users worldwide. == History == Hint was founded in 2018 and is headquartered in Claymont, Delaware. The platform was developed to address a growing demand among Millennials and Gen Z for structured self-reflection tools that deviate from traditional religious or clinical psychological frameworks. The app has become a prominent figure in the "emotional technology" sector, reaching over 25 million global users by 2026. The platform is frequently cited by sociologists and media outlets as a primary driver of the Open-source intelligence trend, where individuals use digital tools to vet and analyze personal relationships in the dating economy. Media coverage has described the platform as part of a broader trend in which digital tools incorporate astrology and symbolic frameworks into wellness and relationship advice. == Reception == Coverage of Hint has appeared alongside reporting on changing attitudes toward dating and relationships, particularly among younger adults. Surveys reported by media outlets have described shifts in dating behavior, including reduced interest in casual relationships and increased reliance on digital tools for emotional reflection and compatibility assessment. Additional reporting has linked the use of astrology apps to broader trends in emotional fatigue and changing relationship expectations. Lifestyle and culture publications have described Hint, as an example of applications that integrate astrology into digital self-reflection and relationship analysis.
In the theory of Probably Approximately Correct Machine Learning, the Natarajan dimension characterizes the complexity of learning a set of functions, generalizing from the Vapnik–Chervonenkis dimension for boolean functions to multi-class functions. Originally introduced as the Generalized Dimension by Natarajan, it was subsequently renamed the Natarajan Dimension by Haussler and Long. == Definition == Let H {\displaystyle H} be a set of functions from a set X {\displaystyle X} to a set Y {\displaystyle Y} . H {\displaystyle H} shatters a set C ⊂ X {\displaystyle C\subset X} if there exist two functions f 0 , f 1 ∈ H {\displaystyle f_{0},f_{1}\in H} such that For every x ∈ C , f 0 ( x ) ≠ f 1 ( x ) {\displaystyle x\in C,f_{0}(x)\neq f_{1}(x)} . For every B ⊂ C {\displaystyle B\subset C} , there exists a function h ∈ H {\displaystyle h\in H} such that for all x ∈ B , h ( x ) = f 0 ( x ) {\displaystyle x\in B,h(x)=f_{0}(x)} and for all x ∈ C − B , h ( x ) = f 1 ( x ) {\displaystyle x\in C-B,h(x)=f_{1}(x)} . The Natarajan dimension of H is the maximal cardinality of a set shattered by H {\displaystyle H} . It is easy to see that if | Y | = 2 {\displaystyle |Y|=2} , the Natarajan dimension collapses to the Vapnik–Chervonenkis dimension. Shalev-Shwartz and Ben-David present comprehensive material on multi-class learning and the Natarajan dimension, including uniform convergence and learnability. Recently, Cohen et al showed that the Natarajan dimension is the dominant term governing agnostic multi-class PAC learnability.
Silhouette is a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object has been classified. It was proposed by Belgian statistician Peter Rousseeuw in 1987. The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette value ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters. A clustering with an average silhouette width of over 0.7 is considered to be "strong", a value over 0.5 "reasonable", and over 0.25 "weak". However, with an increasing dimensionality of the data, it becomes difficult to achieve such high values because of the curse of dimensionality, as the distances become more similar. The silhouette score is specialized for measuring cluster quality when the clusters are convex-shaped, and may not perform well if the data clusters have irregular shapes or are of varying sizes. The silhouette value can be calculated with any distance metric, such as Euclidean distance or Manhattan distance. == Definition == Assume the data have been clustered via any technique, such as k-medoids or k-means, into k {\displaystyle k} clusters. For data point i ∈ C i {\displaystyle i\in C_{i}} (data point i {\displaystyle i} in the cluster C i {\displaystyle C_{i}} ), calculate a ( i ) {\displaystyle a(i)} , the average distance that i {\displaystyle i} is from all other points in that cluster: a ( i ) = 1 | C i | − 1 ∑ j ∈ C i , i ≠ j d ( i , j ) {\displaystyle a(i)={\frac {1}{|C_{i}|-1}}\sum _{j\in C_{i},i\neq j}d(i,j)} where | C i | {\displaystyle |C_{i}|} is the number of points belonging to cluster C i {\displaystyle C_{i}} , and d ( i , j ) {\displaystyle d(i,j)} is the distance between data points i {\displaystyle i} and j {\displaystyle j} in the cluster C i {\displaystyle C_{i}} (we divide by | C i | − 1 {\displaystyle |C_{i}|-1} because the distance d ( i , i ) {\displaystyle d(i,i)} is not included in the sum). a ( i ) {\displaystyle a(i)} can be interpreted as a measure of how well i {\displaystyle i} is assigned to its cluster (the smaller the value, the better the assignment). We then define the mean dissimilarity of point i {\displaystyle i} to some cluster C j {\displaystyle C_{j}} as the mean of the distance from i {\displaystyle i} to all points in C j {\displaystyle C_{j}} (where C j ≠ C i {\displaystyle C_{j}\neq C_{i}} ). For each data point i ∈ C i {\displaystyle i\in C_{i}} , we now define b ( i ) {\displaystyle b(i)} as the average distance between i {\displaystyle i} and the points in the closest cluster (hence: "min") that i {\displaystyle i} does not belong to: b ( i ) = min j ≠ i 1 | C j | ∑ l ∈ C j d ( i , l ) {\displaystyle b(i)=\min _{j\neq i}{\frac {1}{|C_{j}|}}\sum _{l\in C_{j}}d(i,l)} The cluster with the smallest mean dissimilarity is said to be the "neighboring cluster" of i {\displaystyle i} because it is the next best fit cluster for point i {\displaystyle i} . We now define a silhouette (value) of one data point i {\displaystyle i} s ( i ) = b ( i ) − a ( i ) max { a ( i ) , b ( i ) } {\displaystyle s(i)={\frac {b(i)-a(i)}{\max\{a(i),b(i)\}}}} , if | C i | > 1 {\displaystyle |C_{i}|>1} and s ( i ) = 0 {\displaystyle s(i)=0} , if | C i | = 1 {\displaystyle |C_{i}|=1} , which can also be written as s ( i ) = { 1 − a ( i ) b ( i ) , if a ( i ) < b ( i ) 0 , if a ( i ) = b ( i ) b ( i ) a ( i ) − 1 , if a ( i ) > b ( i ) {\displaystyle s(i)={\begin{cases}1-{\frac {a(i)}{b(i)}},&{\mbox{ if }}a(i)b(i)\\\end{cases}}} From the above definition, s ( i ) {\displaystyle s(i)} is bounded to the interval [ − 1 , 1 ] {\displaystyle [-1,1]} , i.e. − 1 ≤ s ( i ) ≤ 1. {\displaystyle -1\leq s(i)\leq 1.} Note that a ( i ) {\displaystyle a(i)} is not clearly defined for clusters with size = 1, in which case we set s ( i ) = 0 {\displaystyle s(i)=0} . This choice is arbitrary, but neutral in the sense that it is at the midpoint of the bounds, -1 and 1. For s ( i ) {\displaystyle s(i)} to be close to 1 we require a ( i ) ≪ b ( i ) {\displaystyle a(i)\ll b(i)} . As a ( i ) {\displaystyle a(i)} is a measure of how dissimilar i {\displaystyle i} is to its own cluster, a small value means it is well matched. Furthermore, a large b ( i ) {\displaystyle b(i)} implies that i {\displaystyle i} is badly matched to its neighbouring cluster. Thus an s ( i ) {\displaystyle s(i)} close to 1 means that the data is appropriately clustered. If s ( i ) {\displaystyle s(i)} is close to -1, then by the same logic we see that i {\displaystyle i} would be more appropriate if it was clustered in its neighbouring cluster. An s ( i ) {\displaystyle s(i)} near zero means that the datum is on the border of two natural clusters. The mean s ( i ) {\displaystyle s(i)} over all points of a cluster is a measure of how tightly grouped all the points in the cluster are. Thus the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset is a measure of how appropriately the data have been clustered. If there are too many or too few clusters, as may occur when a poor choice of k {\displaystyle k} is used in the clustering algorithm (e.g., k-means), some of the clusters will typically display much narrower silhouettes than the rest. Thus silhouette plots and means may be used to determine the natural number of clusters within a dataset. One can also increase the likelihood of the silhouette being maximized at the correct number of clusters by re-scaling the data using feature weights that are cluster specific. Kaufman et al. introduced the term silhouette coefficient for the maximum value of the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset, i.e., S C = max k s ~ ( k ) , {\displaystyle SC=\max _{k}{\tilde {s}}\left(k\right),} where s ~ ( k ) {\displaystyle {\tilde {s}}\left(k\right)} represents the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset for a specific number of clusters k {\displaystyle k} . The silhouette coefficient describes the best possible clustering possible for a given number of clusters, as measured by the highest average silhouette score for all points in the dataset. == Simplified and medoid silhouette == Computing the silhouette coefficient needs all O ( N 2 ) {\displaystyle {\mathcal {O}}(N^{2})} pairwise distances, making this evaluation much more costly than clustering with k-means. For a clustering with centers μ C I {\displaystyle \mu _{C_{I}}} for each cluster C I {\displaystyle C_{I}} , we can use the following simplified Silhouette for each point i ∈ C I {\displaystyle i\in C_{I}} instead, which can be computed using only O ( N k ) {\displaystyle {\mathcal {O}}(Nk)} distances: a ′ ( i ) = d ( i , μ C I ) {\displaystyle a'(i)=d(i,\mu _{C_{I}})} and b ′ ( i ) = min C J ≠ C I d ( i , μ C J ) {\displaystyle b'(i)=\min _{C_{J}\neq C_{I}}d(i,\mu _{C_{J}})} , which has the additional benefit that a ′ ( i ) {\displaystyle a'(i)} is always defined, then define accordingly the simplified silhouette and simplified silhouette coefficient s ′ ( i ) = b ′ ( i ) − a ′ ( i ) max { a ′ ( i ) , b ′ ( i ) } {\displaystyle s'(i)={\frac {b'(i)-a'(i)}{\max\{a'(i),b'(i)\}}}} S C ′ = max k 1 N ∑ i s ′ ( i ) {\displaystyle SC'=\max _{k}{\frac {1}{N}}\sum _{i}s'\left(i\right)} . If the cluster centers are medoids (as in k-medoids clustering) instead of arithmetic means (as in k-means clustering), this is also called the medoid-based silhouette or medoid silhouette. If every object is assigned to the nearest medoid (as in k-medoids clustering), we know that a ′ ( i ) ≤ b ′ ( i ) {\displaystyle a'(i)\leq b'(i)} , and hence s ′ ( i ) = b ′ ( i ) − a ′ ( i ) b ′ ( i ) = 1 − a ′ ( i ) b ′ ( i ) {\displaystyle s'(i)={\frac {b'(i)-a'(i)}{b'(i)}}=1-{\frac {a'(i)}{b'(i)}}} . == Silhouette clustering == Instead of using the average silhouette to evaluate a clustering obtained from, e.g., k-medoids or k-means, we can try to directly find a solution that maximizes the Silhouette. We do not have a closed form solution to maximize this, but it will usually be best to assign points to the nearest cluster as done by these methods. Van der Laan et al. proposed to adapt the standard algorithm for k-medoids, PAM, for this purpose and call this algorithm PAMSIL: Choose initial medoids by using PAM Compute the average silhouette of this initial solution For each pair of a medoid m and a non-medoid x swap m and x compute the average silhouette of the resulting solution remember the best swap un-swap m and x for the next iteration Perform the best swap and return to
Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic spectrum (Jensen, 2005). Subcategories of multispectral remote sensing include hyperspectral, in which hundreds of bands are collected and analyzed, and ultraspectral remote sensing where many hundreds of bands are used (Logicon, 1997). The main purpose of multispectral imaging is the potential to classify the image using multispectral classification. This is a much faster method of image analysis than is possible by human interpretation. == Multispectral remote sensing systems == Remote sensing systems gather data via instruments typically carried on satellites in orbit around the Earth. The remote sensing scanner detects the energy that radiates from the object or area of interest. This energy is recorded as an analog electrical signal and converted into a digital value though an A-to-D conversion. There are several multispectral remote sensing systems that can be categorized in the following way: === Multispectral imaging using discrete detectors and scanning mirrors === Landsat Multispectral Scanner (MSS) Landsat Thematic Mapper (TM) NOAA Geostationary Operational Environmental Satellite (GOES) NOAA Advanced Very High Resolution Radiometer (AVHRR) NASA and ORBIMAGE, Inc., Sea-viewing Wide field-of-view Sensor (SeaWiFS) Daedalus, Inc., Aircraft Multispectral Scanner (AMS) NASA Airborne Terrestrial Applications Sensor (ATLAS) === Multispectral imaging using linear arrays === SPOT 1, 2, and 3 High Resolution Visible (HRV) sensors and Spot 4 and 5 High Resolution Visible Infrared (HRVIR) and vegetation sensor Indian Remote Sensing System (IRS) Linear Imaging Self-scanning Sensor (LISS) Space Imaging, Inc. (IKONOS) Digital Globe, Inc. (QuickBird) ORBIMAGE, Inc. (OrbView-3) ImageSat International, Inc. (EROS A1) NASA Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) NASA Terra Multiangle Imaging Spectroradiometer (MISR) === Imaging spectrometry using linear and area arrays === NASA Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Compact Airborne Spectrographic Imager 3 (CASI 3) NASA Terra Moderate Resolution Imaging Spectrometer (MODIS) NASA Earth Observer (EO-1) Advanced Land Imager (ALI), Hyperion, and LEISA Atmospheric Corrector (LAC) === Satellite analog and digital photographic systems === Russian SPIN-2 TK-350, and KVR-1000 NASA Space Shuttle and International Space Station Imagery == Multispectral classification methods == A variety of methods can be used for the multispectral classification of images: Algorithms based on parametric and nonparametric statistics that use ratio-and interval-scaled data and nonmetric methods that can also incorporate nominal scale data (Duda et al., 2001), Supervised or unsupervised classification logic, Hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products, Per-pixel or object-oriented classification logic, and Hybrid approaches == Supervised classification == In this classification method, the identity and location of some of the land-cover types are obtained beforehand from a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. The analyst would locate sites that have similar characteristics to the known land-cover types. These areas are known as training sites because the known characteristics of these sites are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. All pixels inside and outside of the training sites are evaluated and allocated to the class with the more similar characteristics. === Classification scheme === The first step in the supervised classification method is to identify the land-cover and land-use classes to be used. Land-cover refers to the type of material present on the site (e.g. water, crops, forest, wet land, asphalt, and concrete). Land-use refers to the modifications made by people to the land cover (e.g. agriculture, commerce, settlement). All classes should be selected and defined carefully to properly classify remotely sensed data into the correct land-use and/or land-cover information. To achieve this purpose, it is necessary to use a classification system that contains taxonomically correct definitions of classes. If a hard classification is desired, the following classes should be used: Mutually exclusive: there is not any taxonomic overlap of any classes (i.e., rain forest and evergreen forest are distinct classes). Exhaustive: all land-covers in the area have been included. Hierarchical: sub-level classes (e.g., single-family residential, multiple-family residential) are created, allowing that these classes can be included in a higher category (e.g., residential). Some examples of hard classification schemes are: American Planning Association Land-Based Classification System United States Geological Survey Land-use/Land-cover Classification System for Use with Remote Sensor Data U.S. Department of the Interior Fish and Wildlife Service U.S. National Vegetation and Classification System International Geosphere-Biosphere Program IGBP Land Cover Classification System === Training sites === Once the classification scheme is adopted, the image analyst may select training sites in the image that are representative of the land-cover or land-use of interest. If the environment where the data was collected is relatively homogeneous, the training data can be used. If different conditions are found in the site, it would not be possible to extend the remote sensing training data to the site. To solve this problem, a geographical stratification should be done during the preliminary stages of the project. All differences should be recorded (e.g. soil type, water turbidity, crop species, etc.). These differences should be recorded on the imagery and the selection training sites made based on the geographical stratification of this data. The final classification map would be a composite of the individual stratum classifications. After the data are organized in different training sites, a measurement vector is created. This vector would contain the brightness values for each pixel in each band in each training class. The mean, standard deviation, variance-covariance matrix, and correlation matrix are calculated from the measurement vectors. Once the statistics from each training site are determined, the most effective bands for each class should be selected. The objective of this discrimination is to eliminate the bands that can provide redundant information. Graphical and statistical methods can be used to achieve this objective. Some of the graphic methods are: Bar graph spectral plots Cospectral mean vector plots Feature space plots Cospectral parallelepiped or ellipse plots === Classification algorithm === The last step in supervised classification is selecting an appropriate algorithm. The choice of a specific algorithm depends on the input data and the desired output. Parametric algorithms are based on the fact that the data is normally distributed. If the data is not normally distributed, nonparametric algorithms should be used. The more common nonparametric algorithms are: One-dimensional density slicing Parallelipiped Minimum distance Nearest-neighbor Expert system analysis Convolutional neural network == Unsupervised classification == Unsupervised classification (also known as clustering) is a method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Unsupervised classification require less input information from the analyst compared to supervised classification because clustering does not require training data. This process consists in a series of numerical operations to search for the spectral properties of pixels. From this process, a map with m spectral classes is obtained. Using the map, the analyst tries to assign or transform the spectral classes into thematic information of interest (i.e. forest, agriculture, urban). This process may not be easy because some spectral clusters represent mixed classes of surface materials and may not be useful. The analyst has to understand the spectral characteristics of the terrain to be able to label clusters as a specific information class. There are hundreds of clustering algorithms. Two of the most conceptually simple algorithms are the chain method and the ISODATA method. === Chain method === The algorithm used in this method operates in a two-pass mode (it passes through the multispectral dataset two times. In the first pass, the program reads through the dataset and sequentially builds clusters (groups of p