Important Notice: This repository is no longer under active development. We have developed a new version of the program called ipacheckcorrections. The new program is part of the ipacheck package and is housed here. Thank you for your support and understanding.

readreplace

readreplace modifies the dataset currently in memory by making replacements that are specified in an external dataset, the replacements file.

The list of differences saved by the SSC program cfout is designed for later use by readreplace. After the addition of a new variable to the cfout differences file that holds the new (correct) values, the file can be used as the readreplace replacements file.

readreplace is available through SSC: type ssc install cfout in Stata to install.

Certification script

The certification script of readreplace is cscript/readreplace.do. If you are new to certification scripts, you may find this Stata Journal article helpful. See this guide for more on readreplace testing.

Stata help file

Converted automatically from SMCL:

log html readreplace.sthlp readreplace.md

The help file looks best when viewed in Stata as SMCL.

Title

    readreplace -- Make replacements that are specified in an external
        dataset


Syntax

        readreplace using filename, id(varlist) variable(varname) value(
          varname) [options]

    options               Description
    -------------------------------------------------------------------------
    Main
    * id(varlist)         variables for matching observations with the
                            replacements specified in the using dataset
    * variable(varname)   variable in the using dataset that indicates the
                            variables to replace
    * value(varname)      variable in the using dataset that stores the new
                            values

    Import
      insheet             use insheet to import filename; the default
      use                 use use to load filename
      excel               use import excel to import filename
      import(options)     options to specify to the import command
    -------------------------------------------------------------------------
    * id(), variable(), and value() are required.


Description

    readreplace modifies the dataset currently in memory by making
    replacements that are specified in an external dataset, the replacements
    file.

    The list of differences saved by the SSC program cfout is designed for
    later use by readreplace. After the addition of a new variable to the
    cfout differences file that holds the new (correct) values, the file can
    be used as the readreplace replacements file.


Remarks

    readreplace changes the contents of existing variables by making
    replacements that are specified in a separate dataset, the replacements
    file. The replacements file should be long by replacement such that each
    observation is a replacement to complete.  Replacements are described by
    a variable that contains the name of the variable to change, specified to
    option variable(), and a variable that stores the new value for the
    variable, specified to option value(). The replacements file should also
    hold variables shared by the dataset in memory that indicate the subset
    of the data for which each change is intended; these are specified to
    option id(), and are used to match observations in memory to their
    replacements in the replacements file.

    Below, an example replacements file is shown with three variables:
    uniqueid, to be specified to id(), Question, to be specified to
    variable(), and CorrectValue, to be specified to value().

    +--------------------------------------+
    | uniqueid     Question   CorrectValue |
    |--------------------------------------|
    |      105     district             13 |
    |      125          age              2 |
    |      138       gender              1 |
    |      199     district             34 |
    |        2   am_failure              3 |
    +--------------------------------------+

    For each observation of the replacements file, readreplace essentially
    runs the following replace command:

    replace Question_value = CorrectValue_value if uniqueid == uniqueid_value

    That is, the effect of readreplace here is the same as these five replace
    commands:

    replace district   = 13 if uniqueid == 105
    replace age        = 2  if uniqueid == 125
    replace gender     = 1  if uniqueid == 138
    replace district   = 34 if uniqueid == 199
    replace am_failure = 3  if uniqueid == 2

    The variable specified to value() may be numeric or string; either is
    accepted.

    The replacements file may be one of the following formats:

        o Comma-separated data. This is the default format, but you may
            specify option insheet; either way, readreplace will use insheet
            to import the replacements file. You can also specify any options
            for insheet to option import().
        o Stata dataset. Specify option use to readreplace, passing any
            options for use to import().
        o Excel file. Specify option excel to readreplace, passing any
            options for import excel to import().

    readreplace may be employed for a variety of purposes, but it was
    designed to be used as part of a data entry process in which data is
    entered two times for accuracy.  After the second entry, the two separate
    entry datasets need to be reconciled.  cfout can compare the first and
    second entries, saving the list of differences in a format that is useful
    for data entry teams.  Data entry operators can then add a new variable
    to the differences file for the correct value.  Once this variable has
    been entered, load either of the two entry datasets, then run readreplace
    with the new replacements file.

    The GitHub repository for readreplace is here.  Previous versions may be
    found there: see the tags.


Remarks for promoting storage types

    readreplace will change variables' storage types in much the same way as
    replace, promoting storage types according to these rules:

        1.  Storage types are only promoted; they are never compressed.
        2.  The storage type of float variables is never changed.
        3.  If a variable of integer type (byte, int, or long) is replaced
            with a noninteger value, its storage type is changed to float or
            double according to the current set type setting.
        4.  If a variable of integer type is replaced with an integer value
            that is too large or too small for its current storage type, it
            is promoted to a longer type (int, long, or double).
        5.  When needed, str# variables are promoted to a longer str# type or
            to strL.


Examples

    Make the changes specified in correctedValues.csv
        . use firstEntry
        . readreplace using correctedValues.csv, id(uniqueid)
            variable(question) value(correctvalue)

    Same as the previous readreplace command, but specifies option case to
    insheet to import the replacements file
        . use firstEntry
        . readreplace using correctedValues.csv, id(uniqueid)
            variable(Question) value(CorrectValue) import(case)

    Same as the previous readreplace command, but loads the replacements file
    as a Stata dataset
        . use firstEntry
        . readreplace using correctedValues.dta, id(uniqueid)
            variable(Question) value(CorrectValue) use


Stored results

    readreplace stores the following in r():

    Scalars
      r(N)           number of real changes

    Macros
      r(varlist)     variables replaced

    Matrices
      r(changes)     number of real changes by variable


Authors

    Ryan Knight
    Matthew White

    For questions or suggestions, submit a GitHub issue or e-mail
    researchsupport@poverty-action.org.


Also see

    Help:  [D] generate

    User-written:  cfout, bcstats, mergeall

PovertyAction/readreplace

readreplace

Certification script

Stata help file