Back to Home

SNOMED-CT Core Subset – Significant Changes in July File

Written by Charlie Harp

October 12, 2009 at 9:52 AM

For those of you evaluating the use of the SNOMED-CT Core Subset, you need to be aware that the NLM has made some non-trivial changes to the format and content of the subset file in the latest (second) release dated 200908 (July).If you have developed a load program, as we have, that uses the subset file to identify concepts that are included in the subset, it is likely you will need to modify that program.

Here is a summary of the changes:

Term Changes:

  • Nine terms were added and eleven terms were retired from the core subset.

New Terms:

208892001 Closed traumatic dislocation of hip (disorder) Current
165468009 Erythrocyte sedimentation rate (ESR) raised (finding) Current
197321007 Steatosis of liver (disorder) Current
40733004 Infectious disease (disorder) Current
165346000 Laboratory test result abnormal (situation) Current
442234001 Serum cholesterol borderline high (finding) Current
442438000 Influenza due to Influenza A virus (disorder) Current
442551007 Dental caries extending into dentine (disorder) Current
4557003 Preinfarction syndrome (disorder) Current

Retired Terms:

41006004 Depression (finding) Ambiguous
309158009 Laboratory finding abnormal (navigational concept) Current
371330000 Fatty liver (disorder) Duplicate
131016008 Increased thyroid stimulating hormone level (finding) Duplicate
166829003 Serum cholesterol borderline (finding) Ambiguous
191415002 Communicable disease (navigational concept) Current
78431007 Influenza due to Influenza virus, type A, human (disorder) Ambiguous
416103000 Elevated erythrocyte sedimentation rate (finding) Duplicate
50047001 Compound dental caries (disorder) Ambiguous
63079007 Closed traumatic dislocation of hip joint (disorder) Duplicate
64333001 Preinfarction angina (disorder) Duplicate

File Structure Changes:

June Subset July Subset Change
Now uses Description instead of Code!!!
- REPLACED_BY New Field (SNOMED-CT Concept ID)

New Fields:

New Field What is it?
FIRST_IN_SUBSET This is the issue year and month when the concept first appeared in the subset.
LAST_IN_SUBSET This is the issue year and month when the concept last appeared in the subset as a non-retired concept.
REPLACED_BY Concept ID of the concept replacing a retired concept.


If you developed a program that loads the core subset file this update likely broke it. 

If you are using a text ODBC/OLEDB driver to load the file the name changes to the columns broke it. 

If you are accessing the fields using sequential access and splitting the fields using the pipe delimiter, the insertion of the FIRST_IN_SUBSET before the IS_RETIRED fields will break your load program.  

If you created a function that uses the coded values in the CONCEPT_STATUS field to support your load logic, that is now broken by the switch to the text value. (I don't understand this change at all.  It seems to run contrary to the move away from free text.  I would change it back...)

Needless to say, this update was a painful one for the early adopter.  But, if you have already created logic based on the inaugural release of the core subset data... and early adopter is what you are and it is not without risks.

Along with the painful changes that left our load program writhing on the ground, clutching its face and yelling "You broke my nose!" are some new useful additions.

The FIRST_IN_SUBSET, LAST_IN_SUBSET and REPLACED_BY_SNOMED_CID are useful lifecycle management fields that will help with the management of term availability.

Patience is a Virtue

If this update frustrated you, I would ask that you focus on the positive and consider that the Core subset is another in a growing line of great, "FREE" work products from our friends at the NLM. 

It is also worth noting that as we in the HIT industry leverage SNOMED-CT, RxNorm and LOINC the bar will continue to be raised in terms of update frequency and format stability.  From the interactions I have had with the NLM, I expect that they are paying attention and will be responsive as we evolve and leverage them more.

Free Advice

As someone who worked at a commercial content provider, I would encourage the following with respect to all data products.

1.) Do not change field/column names lightly if they are included in the file, as developers will leverage that with a text driver to load the information.

2.) Avoid inserting fields into a record, as some load programs will operate based on field order. If you append new fields to the end of the record you will be less likely to disrupt the load.

3.) Coded fields are better than text fields...always.

Regardless of the constructive criticism...this is good stuff.  If we at Clinical Architecture can help you better take advantage of it, give us a call!