Aquí hay un script de PowerShell lo improvisado que demuestra algunos métodos diferentes de contar líneas en un archivo de texto, junto con el tiempo y la memoria necesaria para cada método. Los resultados (a continuación) muestran claras diferencias en los requisitos de tiempo y memoria. Para mis pruebas, parece que el punto óptimo fue Get-Content, usando una configuración de ReadCount de 100. Las otras pruebas requirieron mucho más tiempo y/o uso de memoria.
#$testFile = 'C:\test_small.csv' # 245 lines, 150 KB
#$testFile = 'C:\test_medium.csv' # 95,365 lines, 104 MB
$testFile = 'C:\test_large.csv' # 285,776 lines, 308 MB
# Using ArrayList just because they are faster than Powershell arrays, for some operations with large arrays.
$results = New-Object System.Collections.ArrayList
function AddResult {
param([string] $sMethod, [string] $iCount)
$result = New-Object -TypeName PSObject -Property @{
"Method" = $sMethod
"Count" = $iCount
"Elapsed Time" = ((Get-Date) - $dtStart)
"Memory Total" = [System.Math]::Round((GetMemoryUsage)/1mb, 1)
"Memory Delta" = [System.Math]::Round(((GetMemoryUsage) - $dMemStart)/1mb, 1)
}
[void]$results.Add($result)
Write-Output "$sMethod : $count"
[System.GC]::Collect()
}
function GetMemoryUsage {
# return ((Get-Process -Id $pid).PrivateMemorySize)
return ([System.GC]::GetTotalMemory($false))
}
# Get-Content -ReadCount 1
[System.GC]::Collect()
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = 0
Get-Content -Path $testFile -ReadCount 1 |% { $count++ }
AddResult "Get-Content -ReadCount 1" $count
# Get-Content -ReadCount 10,100,1000,0
# Note: ReadCount = 1 returns a string. Any other value returns an array of strings.
# Thus, the Count property only applies when ReadCount is not 1.
@(10,100,1000,0) |% {
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = 0
Get-Content -Path $testFile -ReadCount $_ |% { $count += $_.Count }
AddResult "Get-Content -ReadCount $_" $count
}
# Get-Content | Measure-Object
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = (Get-Content -Path $testFile -ReadCount 1 | Measure-Object -line).Lines
AddResult "Get-Content -ReadCount 1 | Measure-Object" $count
# Get-Content.Count
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = (Get-Content -Path $testFile -ReadCount 1).Count
AddResult "Get-Content.Count" $count
# StreamReader.ReadLine
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = 0
# Use this constructor to avoid file access errors, like Get-Content does.
$stream = New-Object -TypeName System.IO.FileStream(
$testFile,
[System.IO.FileMode]::Open,
[System.IO.FileAccess]::Read,
[System.IO.FileShare]::ReadWrite)
if ($stream) {
$reader = New-Object IO.StreamReader $stream
if ($reader) {
while(-not ($reader.EndOfStream)) { [void]$reader.ReadLine(); $count++ }
$reader.Close()
}
$stream.Close()
}
AddResult "StreamReader.ReadLine" $count
$results | Select Method, Count, "Elapsed Time", "Memory Total", "Memory Delta" | ft -auto | Write-Output
Aquí hay resultados para el archivo de texto que contienen ~ líneas 95k, 104 MB:
Method Count Elapsed Time Memory Total Memory Delta
------ ----- ------------ ------------ ------------
Get-Content -ReadCount 1 95365 00:00:11.1451841 45.8 0.2
Get-Content -ReadCount 10 95365 00:00:02.9015023 47.3 1.7
Get-Content -ReadCount 100 95365 00:00:01.4522507 59.9 14.3
Get-Content -ReadCount 1000 95365 00:00:01.1539634 75.4 29.7
Get-Content -ReadCount 0 95365 00:00:01.3888746 346 300.4
Get-Content -ReadCount 1 | Measure-Object 95365 00:00:08.6867159 46.2 0.6
Get-Content.Count 95365 00:00:03.0574433 465.8 420.1
StreamReader.ReadLine 95365 00:00:02.5740262 46.2 0.6
Aquí hay resultados para un archivo más grande (que contienen ~ líneas 285K, 308 MB):
Method Count Elapsed Time Memory Total Memory Delta
------ ----- ------------ ------------ ------------
Get-Content -ReadCount 1 285776 00:00:36.2280995 46.3 0.8
Get-Content -ReadCount 10 285776 00:00:06.3486006 46.3 0.7
Get-Content -ReadCount 100 285776 00:00:03.1590055 55.1 9.5
Get-Content -ReadCount 1000 285776 00:00:02.8381262 88.1 42.4
Get-Content -ReadCount 0 285776 00:00:29.4240734 894.5 848.8
Get-Content -ReadCount 1 | Measure-Object 285776 00:00:32.7905971 46.5 0.9
Get-Content.Count 285776 00:00:28.4504388 1219.8 1174.2
StreamReader.ReadLine 285776 00:00:20.4495721 46 0.4
Cabe señalar que el segundo enfoque cuenta sólo las líneas con texto. Si hay líneas vacías, no se cuentan. – Vladislav