Download the PHP package geoffroy-aubry/awk-csv-parser without Composer
On this page you can find all versions of the php package geoffroy-aubry/awk-csv-parser. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download geoffroy-aubry/awk-csv-parser
More information about geoffroy-aubry/awk-csv-parser
Files in geoffroy-aubry/awk-csv-parser
Package awk-csv-parser
Short Description AWK and Bash code to easily parse CSV files, with possibly embedded commas and quotes.
License LGPL-3.0+
Informations about the package awk-csv-parser
Awk CSV parser
AWK and Bash code to easily parse CSV files, with possibly embedded commas and quotes.
Table of Contents
- Features
- Known limitations
- Links
- Requirements
- Usage
- Examples
- Installation
- Copyrights & licensing
- Change log
- Continuous integration
- Git branching model
Features
- Parse CSV files with only Bash and Awk.
- Allow to process CSV data with standard UNIX shell commands.
- Properly handle CSV data that contain field separators (commas by default) and field enclosures (double quotes by default) inside enclosed data fields.
- Process CSVs from stdin pipe as well as from multiple command line file arguments.
- Handle any character both for field separator and field enclosure.
- Can rewrite CSV records with a multi-character output field separator, CSV enclosure characters removed and escaped enclosures unescaped.
- Each line may not contain the same number of fields throughout the file.
Known limitations
- Does not yet handle embedded newlines inside data fields.
Links
- Wikipedia: Comma-separated values
- RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files
Other Awk implementations:
Requirements
- Bash v4 (2009) and above
- GNU Awk 3.1+
Tested on Debian/Ubuntu Linux.
Usage
Displayed by:
Text version
Description
AWK and Bash code to easily parse CSV files, with possibly embedded commas and quotes.
Usage
awk-csv-parser.sh [OPTION]… [<CSV-file>]…
Options
-e <character>, --enclosure=<character>
Set the CSV field enclosure. One character only, '"' (double quote) by default.
-o <string>, --output-separator=<string>
Set the output field separator. Multiple characters allowed, '|' (pipe) by default.
-s <character>, --separator=<character>
Set the CSV field separator. One character only, ',' (comma) by default.
-h, --help
Display this help.
<CSV-file>
CSV file to parse.
Discussion
– The last record in the file may or may not have an ending line break.
– Each line may not contain the same number of fields throughout the file.
– The last field in the record must not be followed by a field separator.
– Fields containing field enclosures or field separators must be enclosed in field
enclosure.
– A field enclosure appearing inside a field must be escaped by preceding it with
another field enclosure. Example: "aaa","b""bb","ccc"
Examples
Parse a CSV and display records without field enclosure, fields pipe-separated:
awk-csv-parser.sh --output-separator='|' resources/iso_3166-1.csv
Remove CSV's header before parsing:
tail -n+2 resources/iso_3166-1.csv | awk-csv-parser.sh
Keep only first column of multiple files:
awk-csv-parser.sh a.csv b.csv c.csv | cut -d'|' -f1
Keep only first column, using multiple UTF-8 characters output separator:
awk-csv-parser.sh -o '⇒⇒' resources/iso_3166-1.csv | awk -F '⇒⇒' '{print $1}'
You can directly call the Awk script:
awk -f csv-parser.awk -v separator=',' -v enclosure='"' --source '{
csv_parse_record($0, separator, enclosure, csv)
print csv[2] " ⇒ " csv[0]
}' resources/iso_3166-1.csv
Examples
Excerpt from resources/iso_3166-1.csv
(full version):
1. Parse a CSV and display records without field enclosure, output fields pipe-separated
Result:
2. Remove CSV header, keep only first column and grep fields containing separator
Result:
3. You can directly call the Awk script
Result:
4. Technical example
Content of tests/resources/ok.csv
:
Test:
Result:
5. Errors
Content of tests/resources/invalid.csv
:
Test:
Result:
Installation
Debian/Ubuntu
-
Move to the directory where you wish to store the source.
-
Clone the repository:
-
You should be on
stable
branch. If not, switch your clone to that branch: -
You can create a symlink to
awk-csv-parser.sh
: - It's ready for use:
OS X
As both readlink
and sed
Mac OS X versions are based on BSD with small differences with the GNU version, you need to install GNU utilities:
With --with-default-names
option, GNU utilities replace those of OS X.
Else GNU utilities are prefixed with a g
and you have to edit the scripts src/awk-csv-parser.sh
and tests/all-tests.sh
to replace both readlink
and sed
with greadlink
and gsed
respectively.
Then follow Debian/Ubuntu installation process.
Copyrights & licensing
Licensed under the GNU Lesser General Public License v3 (LGPL version 3). See LICENSE file for details.
Change log
See CHANGELOG file for details.
Continuous integration
Launch unit tests:
Git branching model
The git branching model used for development is the one described and assisted
by twgit
tool: https://github.com/Twenga/twgit.