For those of you evaluating the use of the SNOMED-CT Core Subset, you need to be aware that the NLM has made some non-trivial changes to the format and content of the subset file in the latest (second) release dated 200908 (July).If you have developed a load program, as we have, that uses the subset file to identify concepts that are included in the subset, it is likely you will need to modify that program.
Here is a summary of the changes:
- Nine terms were added and eleven terms were retired from the core subset.
|208892001||Closed traumatic dislocation of hip (disorder)||Current|
|165468009||Erythrocyte sedimentation rate (ESR) raised (finding)||Current|
|197321007||Steatosis of liver (disorder)||Current|
|40733004||Infectious disease (disorder)||Current|
|165346000||Laboratory test result abnormal (situation)||Current|
|442234001||Serum cholesterol borderline high (finding)||Current|
|442438000||Influenza due to Influenza A virus (disorder)||Current|
|442551007||Dental caries extending into dentine (disorder)||Current|
|4557003||Preinfarction syndrome (disorder)||Current|
|309158009||Laboratory finding abnormal (navigational concept)||Current|
|371330000||Fatty liver (disorder)||Duplicate|
|131016008||Increased thyroid stimulating hormone level (finding)||Duplicate|
|166829003||Serum cholesterol borderline (finding)||Ambiguous|
|191415002||Communicable disease (navigational concept)||Current|
|78431007||Influenza due to Influenza virus, type A, human (disorder)||Ambiguous|
|416103000||Elevated erythrocyte sedimentation rate (finding)||Duplicate|
|50047001||Compound dental caries (disorder)||Ambiguous|
|63079007||Closed traumatic dislocation of hip joint (disorder)||Duplicate|
|64333001||Preinfarction angina (disorder)||Duplicate|
File Structure Changes:
|June Subset||July Subset||Change|
|CONCEPT_STATUS||SNOMED_CONCEPT_STATUS|| Name Change
Now uses Description instead of Code!!!
|-||FIRST_IN_SUBSET||New Field (YYYYMM)|
|-||LAST_IN_SUBSET||New Field (YYYYMM)|
|-||REPLACED_BY||New Field (SNOMED-CT Concept ID)|
|New Field||What is it?|
|FIRST_IN_SUBSET||This is the issue year and month when the concept first appeared in the subset.|
|LAST_IN_SUBSET||This is the issue year and month when the concept last appeared in the subset as a non-retired concept.|
|REPLACED_BY||Concept ID of the concept replacing a retired concept.|
If you developed a program that loads the core subset file this update likely broke it.
If you are using a text ODBC/OLEDB driver to load the file the name changes to the columns broke it.
If you are accessing the fields using sequential access and splitting the fields using the pipe delimiter, the insertion of the FIRST_IN_SUBSET before the IS_RETIRED fields will break your load program.
If you created a function that uses the coded values in the CONCEPT_STATUS field to support your load logic, that is now broken by the switch to the text value. (I don't understand this change at all. It seems to run contrary to the move away from free text. I would change it back...)
Needless to say, this update was a painful one for the early adopter. But, if you have already created logic based on the inaugural release of the core subset data... and early adopter is what you are and it is not without risks.
Along with the painful changes that left our load program writhing on the ground, clutching its face and yelling "You broke my nose!" are some new useful additions.
The FIRST_IN_SUBSET, LAST_IN_SUBSET and REPLACED_BY_SNOMED_CID are useful lifecycle management fields that will help with the management of term availability.
Patience is a Virtue
If this update frustrated you, I would ask that you focus on the positive and consider that the Core subset is another in a growing line of great, "FREE" work products from our friends at the NLM.
It is also worth noting that as we in the HIT industry leverage SNOMED-CT, RxNorm and LOINC the bar will continue to be raised in terms of update frequency and format stability. From the interactions I have had with the NLM, I expect that they are paying attention and will be responsive as we evolve and leverage them more.
As someone who worked at a commercial content provider, I would encourage the following with respect to all data products.
1.) Do not change field/column names lightly if they are included in the file, as developers will leverage that with a text driver to load the information.
2.) Avoid inserting fields into a record, as some load programs will operate based on field order. If you append new fields to the end of the record you will be less likely to disrupt the load.
3.) Coded fields are better than text fields...always.
Regardless of the constructive criticism...this is good stuff. If we at Clinical Architecture can help you better take advantage of it, give us a call!