AccueilProfessionnelVoile, catamaranContact, infos

Rubrique professionnelle :


  • FASDIM

    Presentation

    FASDIM stands for Fast And Simple De-Identification Method. It is a method designed for automated removing of PHI (Protected Health Information). Although it is based on pattern matching, the originality of the method is that a list of authorized words is not necessary: such a list can be constructed on the course of the method. However, if you intend to de-identify French letters, you will save some time, as a list of words already goes with the source code. This piece of software is open source and can be downloaded on this page.

    Is FASDIM the method I need?

    FASDIM is probably the method you need when you are in such a situation:

    • you don't have any de-identification software, there is no existing method in your language
    • you have 40 hours (all included) to anonymize 100,000 free-text discharge letters (in DOC or TXT format)
    • or you have a smaller collection of documents 

    FASDIM is probably not the best option if:

    • other methods are widely available (i.e. for English language)
    • you already have a learning set of manually de-identified and annotated letters (then you should prefer machine learning approaches)
    • you want to automatically annotate the documents (i.e. to tag precisely the first name, the last name, the date, etc.) 
    • you have more than 5,000,000 letters to de-identify without spending time and with a perfect accuracy 

    Does FASDIM obtain good results?

    The FASDIM method has been published in the  International Journal of Medical Informatics (IJMI, IF=2.061). The detailed results are available in the scientific paper (free full-text). The main results are:

    • Accuracy: 
      • Recall: 98.1% of the PHIs (personal health identifier) are deleted (63.7% of the remaining terms are places, 23% are healthcare professionals, and 0% are patient names)
      • Precision: 89.2% of the deleted terms are PHIs
      • Harmonique mean: 93.4%
    • Safe over-scrubbing: although some words are erroneously deleted, this does not alter the medical meaning of the reports:
      • 99.02% of the medical terms are appropriately protected, and more specifically 
      • 99.49% of the diagnoses 
      • 99.66% of the medical procedures
    • Fast and simple implementation: 
      • the implementation from scratch (without preexisting material) required 40 hours (including software development) to de-identify 27,000 letters with the best accuracy
      • however, if you have to de-identify French letters, the source code and a list of authorized words are already available!

    How can I get FASDIM?

    Two steps:

    This distribution works in the following environment:

    • MS Windows
    • with a MySQL database
    • with PHP installed (a web server is not necessary, only the CLI mode is used)
    • input: Word *.doc documents of *.txt simple text

    However, if you are familiar with PHP code, it will be easy for you to adapt it to other environments. 

    How can I learn more?

    To learn more about FASDIM, you can:

    http://emmanuel.chazard.org - Copyright 2001-2024
    Page générée par nos soins le 31/01/2024 à 08:17:42