banner



How To Merge Data Sets In Stata

This article is going to hash out how to combine multiple Excel information files into one Stata file. Allow's suppose we have 15 Excel files, each file with daily stock market place data of one company. In lodge to combine data for all 15 companies in one file, we follow three steps:

Stride ane: Set the Current Directory

Your working directory is the location where all your files are saved. Once you lot specify this directory in Stata, you lot tin can refer to files only by their names instead of typing out the entire directory each time a file or dataset needs to be mentioned. To do this, nosotros use the 'cd' control followed by the directory path enclosed in inverted commas:

                cd "Eastward:\Combining data from multiple files"              

Your directory address could look longer besides if your files are located inside a nested folder. In this example, the file for the kickoff company is named 'one.csv', for the second information technology is named '2.csv', and and so on.

Footstep 2: Use a Loop to Import and Save Your Files in Stata Format

Once our directory containing the relevant files is set, nosotros need to save the Excel data files in Stata format before proceeding to combine them as CSV files cannot be combined straight. Because there are multiple files that need to be converted, it is more than efficient to use a loop to exercise this job. The post-obit loop imports each individual Excel information file into Stata, and saves is it every bit a Stata data file.

                forvalues 1/15 {                            clear                            import delimited `i'.csv                            salvage `i'.dta, replace                }              

Permit's deconstruct the code above. The first line indicates that the code inside the curly brackets is going to run fifteen times. The local `i' mentioned in the lawmaking volition accept the value '1' when the loop runs for the first time. It volition take the value '2' when information technology is run the second time, and and so on. This volition occur for values until 15. Notation that every time the local i is mentioned, it is enclosed between the symbols ` and '. The get-go of these tin can exist found beside the number 1 primal on the keyboard. It is also chosen the grave accent or backtick key. The symbol at the cease of the local is a unmarried quote.

The clear control inside the loop will ensure that there is no existing data loaded in Stata and that the file is set up to import new data on a make clean and empty slate.

The import delimited command is used to import CSV files into Stata. The local `i' is used to indicate the file name because it volition incrementally have values from one to 15 as indicated by forvalues 1/fifteen , which is also how our Excel files are named. This command will load the Excel data into Stata.

The concluding line of the code within the loop saves the loaded information as a Stata data (.dta) file. Once over again, nosotros make use of the local `i' to save the files with the numbers one to 15, simply this time with a '.dta' extension. The replace option ensure that any existing file with the proper noun 1.dta (all the style to fifteen.dta) is replaced by the new dataset we just loaded and saved.

All 15 files in the '.dta' format volition exist saved in the directory that was specified earlier using the cd command. If suppose, you lot wish to save these new files in separate folder located within the directory, called, say 'folder2', nosotros can suit the relieve command every bit follows:

                save folder2\\`i'.dta, supplant              

Note the addition of the ii astern slashes. Nosotros add two slashes to make certain that Stata does not consider the ` symbol to be a part of the file proper noun.

Step 3: Combine All the Files Together

After converting your CSV files into Stata's .dta format, they are fix to be combined in ane file. Nosotros will once again employ a loop to append all fifteen files together.

Now it is worth going over the two means that join two or more datasets. Appending datasets together means that you are adding more observations/rows i.due east., combining data that is structurally similar. Merging datasets means you lot are adding more than variables/columns.

In this case, we would like to suspend the datasets together. This is considering structurally, they all have the same variables with different observations under each in each file (daily stock time series data beingness converted to panel information). To suspend the files together, we employ the code:

                forvalues i = ane/15{                            append using folder2\\`i'.dta                }              

The workings of the loop are the same equally described before. The suspend using syntax only tells Stata to suspend the current dataset using a 2nd, subsequently specified dataset.

Using the unique (ssc install) control, nosotros tin can cheque how many unique values a variable has. The variable name for the company's stock symbol here is 'symbol'.

                unique symbol              

As expected, the command returns an output that tells us that the number of unique values for this variable is fifteen. Farther using the tabulate command will return a list of the fifteen unique values for the variable.

                tabulate symbol              

This returns u.s. with a table listing down the 15 unique symbols, the number of observations respective to each company, percentage share of their observations in the dataset, and the cumulative percentage.

Combining Files with Unlike Names

Our task above was greatly eased past the fact that our CSV files to exist combined were named conveniently. What if our files have dissimilar, random names? In that case, nosotros volition start off once again past changing our directory using the cd command equally shown before, followed past the following command:

                local files: dir "E:\Combining Information from multiple files" files "*.csv"                foreach file in `files'{                            clear                            Import delimited `file'                            relieve `file'.dta, replace                }              

The very first local command creates a local chosen 'files'. In this local, it saves the listing of all the files y'all wish to combine past first stating the directory where your files are stored, and and then indicating via the *.csv totell Stata to get the name of all the files in that directory have a .csv extension. Asterisk is used as wild bill of fare.

foreach file in `files' prepares the loop to run for all these files that are stored in the local specified as files . The rest of the loop will execute in the same fashion as we described previously. The files will be stored with the same proper noun that they had in CSV format.

In order to combine these files, we once again apply the local command to locate files in the directory that take .dta as their extension.

                local files : dir "E:\Combining Data from multiple files" files "*.dta"              

The list of these .dta files is once more saved in a local variable chosen 'files'. They are appended using the following loop:

                foreach file in files{                            append using `file'                }              

All the data for the xv firms whose file names were stored in the local 'files' will be appended one past one.

How To Merge Data Sets In Stata,

Source: https://thedatahall.com/how-to-combine-multiple-csv-excel-files-in-stata/

Posted by: morrishisems.blogspot.com

0 Response to "How To Merge Data Sets In Stata"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel