-
What
can you do with CWM Global Search?
CWM Global Search allows searching currently
more than 70 free chemistry databases and scientific
journals on the internet by structure, CAS Registry Number,
names, synonyms, and free text.
-
What makes CWM Global Search
unique?
CWM Global Search is the only search engine
for scientists performing a true federated search over all
major chemistry databases and scientific journals.
-
What does ‘federated search’ mean?
Federated search is an
information retrieval technology that allows the
simultaneous search of multiple searchable resources. A user
makes a single query request which is distributed to the
search engines participating in the federation. The
federated search then aggregates the results that are
received from the
search engines for presentation to the user. (see
http://en.wikipedia.org/wiki/Federated_search) Most
searches as in ChemSpider, PuChem, etc. use a database or
go over an index as in Google. CWM
Global Search uses Web API’s or web services exposed by the
various data sources included in CWM Global Search and
searches all these data sources independent from each other.
That means a given search in a given data source in CWM
Global Search is always using the most current version of
the data source for the search.
-
What is the
advantage of a federated search
compared with
building databases? You get the most current snapshot of a data source using
federated search. A data source aggregator such a ChemSpider
needs to update the database. The earliest a newly added
record shows up in a given curated data source is when the
owner of the database provides an update and this update is
processed by the data aggregator. In a
federated search, all data sources are searched independent
from each other and the search always uses the most current
version of the data source. Means, as soon a new record is
added to a given database, this record can be found.
-
How
big is the ‘problem’ of not finding links to curated
databases in systems such as ChemSpider or PubChem? PubChem currently contains around 30 Million unique
compounds and ChemSpider currently contains around 25
Million unique compounds. It would be a fair assumption that
practically every compound in ChemSpider is also contained
in the PubChem databases and vice versa. Reality
is that currently only 11 Million compounds in the
ChemSpider database have an associated link to the very same
compound in the PubChem database. The PubChem contains 14.6
million links to the ChemSpider database. That
means that 64 % of the PubChem database records are missing
in the ChemSpider database and that 40% of the ChemSpider
links are absent in the PubChem database.
-
Is that a
problem for only ‘special’
compound cases? Not at all! Take as example ‘Catechol’ (2-Hydroxyphenole), a
very simple compound. Of course this compound is present in
both the PubChem database and in the ChemSpider database.
But there is no link to the corresponding record in the
PubChem database in ChemSpider and there is also no
reference to the catechol record in the PubChem database.
This means you don’t find safety data if you look only in
Chem Spider.
-
Is this a
general problem that curated databases have a
great time lag of updating?
Open this link
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&db=pcsubstance&term=all[filt] This
link shows the most recently added records to the PubChem
database. Take the first record and check if you find a link
to this record in the ChemSpider database. It is more or
less guaranteed that you will not find that link. Browse a
couple (if you like a couple of hundred) pages to look at
other recently added (but not most recently) added records.
The chance is still extremely high that you will not find
that link in the ChemSpider database.
-
How
up-to-date is Google in respect to e.g. PubChem records?
Use the above link and look
up the most recently added compound and try to locate that
record via a Google search (e.g. using the name). Like in
the case of ChemSpider is very unlikely that you will find
that record. A newly added record to a Website shows up in
Google when the Google index gets updated with the most
current snapshot of the website. This update frequency is
mostly dependent on how frequent a site is used. It is easy
to imagine that PubChem, a scientific database and never
will be accessed as frequently as e.g. sites such as Youtube.
The update frequency will be rather low.
-
Does
scientists get the most recent information on the
internet? No, not if you use Google. If
you want to find the most recently added information in all
databases on the Internet you either have to use the native
user interfaces of all those databases, which of course
means you have to know the URLs of all those databases and
you have to know how to use those native user interfaces or
you use CWM Global Search. This provides a single user
interface for all those databases. You do not need to know
the URLs of those databases and you always search the most
current version of the databases. Try it out, and search for
"biodegradable delivery" in Google and for instance in
PubMed. Google ranks hits by its own algorithm, PubMed gives
you the latest reference first. This seems to be sensible
for science
-
What
means ‘crosslinking’
or ‘extended’ search in CWM Global
Search? For every chemistry related
query i.e. either a chemical structure, or a CAS Registry
number or a chemical name we try to retrieve both the
structure, CAS Registry number and other chemical names
(synonyms, trivial names, brand names, IUPAC names). We
include automatically all this information in the final
query. That means, if you type ‘Catechol’ in the textbox of
the CWM Global Search start page and click on ‘Search for
safety information’, we try to find the CAS Registry number
for catechol, the chemical structure of Catechol, and
additional names. If this step is completed we search safety
databases on the internet by structure, CAS Registry number
and name.This
‘extension’ of the original query is pretty useful, since
not all databases containing safety information support for
example structure searches, but the very most do support
searches by CAS Registry number and/or chemical name.
-
What means ‘structure query’ in the context of CWM Global
Search? Since we automatically search for structure, name, and CAS
Registry Number if possible, when you enter a name,
structure or CAS Registry Number, a structure query means
more that only searching by chemical structure.
-
Can you
submit more than one structure query at a time?
CWM Global Search supports
SDFile input. That means you can read in a SDFile containing
several structures and search all of the contained
structures on the internet. Like we do for single structure
queries, we also lookup CAS numbers for every structure in
the SDFile and automatically extend the internet search
using those CAS Registry numbers.
-
Can you
submit more than one query at a time?
Actually we do automatically
always several queries at the same time. You also can add in
the ‘Advanced Search’ as many text queries as you want. For
instance ask the same query in several languages?
-
Do you
support formats other than single structure and SDFile?
Yes, CWM Global Search is
most likely the only search engine that allows the use of
reactions files. You can draw or paste (by dragging) a
reaction in the structure box. We split the reaction into
the various members (reactants, reagents and products) and
search those structures on the internet. That means you can
search for commercial suppliers and safety information for
all reactants in one go. Also for structures from reactions
we extend the search with CAS Registry numbers, and names if
those can be found. We even support RDFile input i.e. you
can search more than one reaction with a single click.
-
Can you search for reaction
methodologies?
Not really, we do not search for reactions, only for
structures in a reaction.
-
Can you search for compounds and their
biological effects?
You can start a structure
query ( by now you know that a structure query means
searching by structure, CAS registry Number and name)
searching over ‘Drug Information’. If you start with a text
string, i.e. ‘rheumatoid arthritis’ and you search over
‘Chemical Databases’ you will get quickly small molecules
with the desired biological activity.
-
Why have we
integrated PASS? There is no system that shows all biological effects of
compounds. PASS predicts over 4000 effects. We have
integrated this prediction tool for single compounds. If you
process large SDFiles you can purchase the program PASS from
AKos GmbH.
-
Can you predict
other properties? You need to start with a chemical structure or the search
find a chemical structure. In these cases you ill get on
answer that leads to 'chemicalize', the property prediction
tool from ChemAxon. It gives logP, rotatable bonds, rule of
five, and much, much more.
-
Can you search for structure
modified proteins?
There are many structure
modified proteins on the Internet. But the drawing of those
structures is cumbersome. For this we have integrated
Proteax, see below.
-
How do you search with
proteins and peptides? We have integrated Proteax. Proteax is a simple to use
Peptide/Protein editor that helps you to generate chemical
structures by input of one or three letter codes. With a
click you generate the full structure. You can edit this
further in the editor. The generated structure can be
searched in databases such as PubChem, CheEBI , Drugbank
etc., including the possibility to perform substructure
searches and structure similarity searches.
-
Text vs structure search
You can by any names, or
text. You should be aware that the different search engines
have their own way of interpreting text strings. Sometimes
they ignore numbers, and hyphens, create separate strings
and give you unspecific results. A good idea is to use the
menu button "Find structure for chemical name" and transform
you name into a structure. Afterwards you have to clear the
text box. A search with the structure is faster and more
accurate.
-
Can you search over
REAXYS?
For organizations that have a
license for REAXYS, we have integrated the possibility to
execute a REAXYS search from the CWM Global Search user
interface.
-
Does CWM Global Search work on an
iPad or smart phone?
No, not yet. We still have to
rely on Java for the structure editor and we use Silverlight.
-
Why do we use
Silverlight for CWM Global Search? We based our user interface on RIA (Rich Internet
Application) technology using Microsoft Silverlight. This
allows to resize, pin/unpin, expand/collapse many aspects
of the user interface. It allows both large screens, desktop
PCs , notebooks, and down to netbooks.
-
Does CWM Global Search
support both PC and Macintosh?
Yes, we support all major
browsers ,Internet Explorer, Firefox, Chrome, Safari on both
Windows and Macintosh (MacOS).
-
Can one compare CWM Global Search
with SciFinder?
CWM Global Search should
not be compared to SciFinder, and if you look at the
price this is pretty obvious. You will find many answers
either with CWM Global Search or SciFinder, and an
experienced user will know when to start his search in
SciFinder and when in CWM Global Search. If you want to
make a comprehensive search, you probably cannot ignore
the Internet any longer. If you don't find anything
suitable in SciFinder it is probably a good idea to
start CWM Global Search.
|
|