ChemxSeer


About

Overview

ChemXSeer Name and Formula Search provides the ability to search chemical names and formulae in academic publications in the area of Chemistry. It also can support general searches, like keywords search, dates search, field search. For general searches, please check the query syntax on the website of Lucene. ChemXSeer also provides the ability to embed name and formula search in the general search to find publications which have particular keywords, names, and formulae.

This page provides the Query Parser syntax in ChemXSeer Name and Formula Search 1.0 and how to embed formula search in the general search.

After the initial search, a list of name/formula candidates is shown for each name/formula search in the query string. A list of relevant documents is also provided. Users can select some documents directly, or can select one particular name/formula for each name/formula search to narrow down the relevant documents.

 

 

Ranges of Elements

ChemXSeer supports Range Search by defining ranges for each desired chemical element or each desired sub-structures in formulae. You can specify any ranges from one to infinity for each chemical element. A range can be a combination of several ranges.

Example:

"C2" specifies two C in any formulae, "C1,3" specifies one OR three C, "C1-3" specifies one OR two OR three C, "C3-*" considers formulae with three OR more, "C1,3, 5-7" specifies one OR three OR five OR six OR seven C, "C1,3, 5-*" specifies one OR three OR five OR more C.

"C2,3H6,8-*" represents formulas with two OR three C, and  six OR eight OR more H.

 

 

Mass Number and Charges

ChemXSeer can search isotopes by including mass numbers of atoms like "^{2}H". If no mass number defined in the input query string, For example, if the input is "H", then results will match species containing normal hydrogen, deuterium, and tritium atoms. If the input is "^{2}H", only deuterium will be considered. If the input is "^{1}H", then deuterium and tritium would be excluded from the search. All the four types of formula search can support this function.

Charges are only considered in Exact Formula Search with a kind of input like "^{2+}" or "^{-}", which can only appear at the end of the query string. Example, "Cu^{2+}". Input of charges in query string is not allowed in the other three types of searches.

 

 

Types of Formula Searches

ChemXSeer supports five basic types of formula search and the conjunctive search of them defined by the identifier "formula:".

Exact Formula Searches

Exact Formula Searches can search formulae with the same sequence or of atoms and atom numbers, or sub-structure and sub-structure numbers. Exact Formula Searches is defined by adding a "=" before the query string. Brackets of "(" and ")" is allowed in the input query string and both "(", ")" and "[", "]" will be matched. Charges will be ignored if no charges defined in the input query strings. Range search is allowed but infinity is not allowed.

Example:

formula:=CH4

searches formulae like CH4, ^{14}CH4, or CH4^{+} if it exists.

formula:=CH3(CH2)2OH

can find CH3[CH2]2OH, but not CH3CH2CH2OH.

formula:=CH3(CH2)4-6O1,2H

will search CH3(CH2)4OH, or CH3(CH2)5OH, or CH3(CH2)6OH, or CH3(CH2)4O2H, or CH3(CH2)5O2H, or CH3(CH2)6O2H. Note that " formula:=CH3(CH2)4-*O1,2H " is not allowed.

 

 

Full Frequency Formula Searches

Full formula searches can search formulae with some specific numbers where the sequential order or structures will be ignored. Chemical elements which are not specified in the query string should not appear in the search result of formulae. Charge inputs are not allowed, and mass numbers are allowed.

Example:

formula:CH4

search formulae which have one C and four H without any other chemical elements.

formula:H4-*C1,5-7O2

search formulae which have one OR five OR six OR seven C, and four OR more H, and two O without any other chemical elements.

 

 

Partial Frequency Formula Searches

Similar to full formula searches, sub-formula searches can have the same input of query strings only starting with a "*". The difference is that we don't care chemical elements which are not specified

Example:

formula:*CH4

search formulae which have one C and four H and any other chemical elements.

formula:*H4-*C1,5-7O2

search formulae which have one OR five OR six OR seven C, and four OR more H, and two O and any other chemical elements.

 

 

Similarity Formula Searches

Similarity Searches can search formulas with a similar part as the input formula starting with a "~". A heuristic approach based on sub-structures is used to measure the similarity between a pair of formulas.

Example:

formula:~CH3(CH2)2COOH

search formulae which are similar to CH3(CH2)2COOH.

 

 

Substructure Name Searches

Sub-structure can search for formulae may have a sub-structure appearing at least once. The frequency is not considered currently. No charge input and one chemical element like "C2" is not considered. Currently three types of matches provided with different ranking scores:

Exact match (high score), reverse match (medium score), parsed match (low score).

Example:

formula:-COOH

Exact match: CH3COOH

Reverse match: HOOCNCHs

Parsed match: CH3CHO2, or CH3CO2H, CH3OCOH, etc.

 

 

Conjunctive Formula Searches

Conjunctive formula searches of the basic formula searches are supported for search more specific formulae. "[" and "]" are used to group them.

Example:

formula:[*CH4-6 -COOH]

searches formulae which have one C, and  four OR five OR six H, and any other elements, and may have the sub-structure of COOH.

 

 

 

Types of Name Searches

ChemXSeer supports two basic types of name search and the conjunctive search of them defined by the identifier "name:".

 

Similarity Name Searches

Similarity Name Searches can search names with a similar part as the input name starting with a "~". A heuristic approach based on substrings is used to measure the similarity between a pair of names.

Example:

name:"~Benzene, 1-methyl-4-(1-methylethyl)-"

search names which are similar to "~Benzene, 1-methyl-4-(1-methylethyl)-".

 

 

Substring Name Searches

Substring name searched can search for names having a query substring.

Example:

name:methyl

search names having at least one occurrence of the substring "methyl". If there are white spaces in the substring, quotation marks are used, like name:"acetic acid"

 

 

Conjunctive Name Searches

Similar to conjunctive formula searches, conjunctive name searches of the basic name searches are supported for search more specific names. "[" and "]" are used to group them.

Example:

name:[ethyl methyl]

searches names which have substrings of "ethyl" and "methyl". Note "ethyl" in "methyl" is not considered as a substring of "ethyl".

 

 

 

Embedding Name and Formula Searches in General Search

 

General Searches with Name and Formula Searches

In ChemXSeer General Search, a desired name/formula can be embedded in the input query string using the identifier "name:"/"formula:". Relevant names/formulae will be returned and embedded in the original query string as a sub-group with OR operators to search relevant publications.

Example:

oxygen (formula:[*CH4 ~COOH] OR name:methyl)

First search formulae with formula:[*CH4 ~COOH], and search names with formula:=CH4, then search documents which have "oxygen" AND any formula from results of formula:[*CH4 ~COOH], OR any name from results of name:methyl. Documents are ranked with scores for all the former types of searches.