Updated CSV, TSV, etc. parser using PHP’s built in SPL library. It can be dropped in almost anywhere, but as usual it’s wise to namespace.
What sets this apart from the usual suspects like file_get_contents() is that it can deal with much larger files without running out of memory. file_get_contents() works in a pinch, but it usually needs to load the entire file into memory. This can prove troublesome for large files, causing a crash at worst or degrading the user experience.
NoRewindIterator is just what it sounds like – an iterator that can’t be rewound. In other words it is performant because the iterator only lasts until we’ve reached the last iteration, then it can be considered refuse for the garbage collector.
We’re also making use of yield instead of returning the results. This way it becomes a generator.
namespace Application\Iterator; use Exception; use InvalidArgumentException; use SplFileObject; use NoRewindIterator; /** * Class LargeFile * @package Application\Iterator */ class LargeFile { const ERROR_UNABLE = 'ERROR: Unable to open file'; const ERROR_TYPE = 'ERROR: Type must be "ByLength", "ByLine", or "Csv"'; /** * @var SplFileObject */ protected $file; /** * @var array */ protected $allowed_types = [ 'ByLine', 'ByLength', 'Csv' ]; /** * LargeFile constructor. * * @param $filename * @param string $mode * * @throws Exception * * Populates $file with new SPL File object instance. */ public function __construct($filename, $mode = 'r') { if (!file_exists($filename)) { $message = __METHOD__ . ' : ' . self::ERROR_UNABLE . PHP_EOL; $message .= strip_tags($filename) . PHP_EOL; throw new Exception($message); } $this->file = new SplFileObject($filename, $mode); } /** * @return \Generator|int * * References SplFileObject method to read the file one line * at a time with fgets. * * Suitable for smaller text files like Csvs and / or * include line feeds. */ protected function fileIteratorByLine() { $count = 0; while (!$this->file->eof()) { yield $this->file->fgets(); $count++; } return $count; } /** * @param $numBytes * * @return \Generator|int * * References SplFileObject method to read the file one line * at a time with freads. * * Suitable for larger binary files. */ protected function fileIteratorByLength($numBytes) { $count = 0; while (!$this->file->eof()) { yield $this->file->fread($numBytes); $count++; } return $count; } protected function fileIteratorCsv() { $count = 0; while (!$this->file->eof()) { yield $this->file->fgetcsv(); $count++; } return $count; } /** * Returns file iterator * * @param string $type * @param null $numBytes * * @return NoRewindIterator */ public function getIterator($type = 'ByLine', $numBytes = null) { if (!in_array($type, $this->allowed_types)) { $message = __METHOD__ . ' : ' . self::ERROR_TYPE . PHP_EOL; throw new InvalidArgumentException($message); } $iterator = 'fileIterator' . $type; return new NoRewindIterator($this->$iterator($numBytes)); } }
Example of usage:
$largeFile = new Application\Iterator\LargeFile($file); $iterator = $largeFile->getIterator('Csv'); $rawHeaders = $iterator->current(); $headerIterator = new \ArrayIterator; foreach ($rawHeaders as $header) { // in this case we're creating header rows for DB insertion, hence the back ticks. $headerIterator->append(strtolower("`$header`")); } $headers = $headerIterator->getArrayCopy(); $iterator->next(); foreach ($iterator as $row) { // statements for each $row }