我已经阅读了很多关于此的问题,但我找不到一个足够快的问题.我认为有更好的方法可以将大量行插入MySQL数据库
我使用以下代码将100k插入我的MySQL数据库:
public static void CSVToMySQL() { string ConnectionString = "server=192.168.1xxx"; string Command = "INSERT INTO User (FirstName, LastName ) VALUES (@FirstName, @LastName);"; using (MySqlConnection mConnection = new MySqlConnection(ConnectionString)) { mConnection.Open(); for(int i =0;i< 100000;i++) //inserting 100k items using (MySqlCommand myCmd = new MySqlCommand(Command, mConnection)) { myCmd.CommandType = CommandType.Text; myCmd.Parameters.AddWithValue("@FirstName", "test"); myCmd.Parameters.AddWithValue("@LastName", "test"); myCmd.ExecuteNonQuery(); } } }
这需要100k行约40秒.我怎样才能更快或更高效?
通过DataTable/DataAdapter或一次插入多行可能会更快:
INSERT INTO User (Fn, Ln) VALUES (@Fn1, @Ln1), (@Fn2, @Ln2)...
由于安全问题,我无法将数据加载到文件和MySQLBulkLoad它.
加速的一种方法是将所有插入包装到一个事务中(SQL-Server代码):
using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString"))) { conn.Open(); SqlTransaction transaction = conn.BeginTransaction(); try { foreach (string commandString in dbOperations) { SqlCommand cmd = new SqlCommand(commandString, conn, transaction); cmd.ExecuteNonQuery(); } transaction.Commit(); } // Here the execution is committed to the DB catch (Exception) { transaction.Rollback(); throw; } conn.Close(); }
另一种方法是将CSV文件加载到数据表中,并使用DataAdapter的批处理功能
DataTable dtInsertRows = GetDataTable(); SqlConnection connection = new SqlConnection(connectionString); SqlCommand command = new SqlCommand("sp_BatchInsert", connection); command.CommandType = CommandType.StoredProcedure; command.UpdatedRowSource = UpdateRowSource.None; // Set the Parameter with appropriate Source Column Name command.Parameters.Add("@PersonId", SqlDbType.Int, 4, dtInsertRows.Columns[0].ColumnName); command.Parameters.Add("@PersonName", SqlDbType.VarChar, 100, dtInsertRows.Columns[1].ColumnName); SqlDataAdapter adpt = new SqlDataAdapter(); adpt.InsertCommand = command; // Specify the number of records to be Inserted/Updated in one go. Default is 1. adpt.UpdateBatchSize = 2; connection.Open(); int recordsInserted = adpt.Update(dtInsertRows); connection.Close();
你在这里找到一个很好的例子.
或者您可以使用MySQL BulkLoader C#类:
var bl = new MySqlBulkLoader(connection); bl.TableName = "mytable"; bl.FieldTerminator = ","; bl.LineTerminator = "\r\n"; bl.FileName = "myfileformytable.csv"; bl.NumberOfLinesToSkip = 1; var inserted = bl.Load(); Debug.Print(inserted + " rows inserted.");
如果在一个命令中执行多个插入,则可能仍然使用StringBuilder而不是string来挤出一两英寸.
我使用MySqlDataAdapter,transactions和UpdateBatchSize这三个东西做了一个小测试.它比你的第一个例子快约30倍.Mysql在单独的盒子上运行,因此涉及延迟.batchsize可能需要一些调整.代码如下:
string ConnectionString = "server=xxx;Uid=xxx;Pwd=xxx;Database=xxx"; string Command = "INSERT INTO User2 (FirstName, LastName ) VALUES (@FirstName, @LastName);"; using (var mConnection = new MySqlConnection(ConnectionString)) { mConnection.Open(); MySqlTransaction transaction = mConnection.BeginTransaction(); //Obtain a dataset, obviously a "select *" is not the best way... var mySqlDataAdapterSelect = new MySqlDataAdapter("select * from User2", mConnection); var ds = new DataSet(); mySqlDataAdapterSelect.Fill(ds, "User2"); var mySqlDataAdapter = new MySqlDataAdapter(); mySqlDataAdapter.InsertCommand = new MySqlCommand(Command, mConnection); mySqlDataAdapter.InsertCommand.Parameters.Add("@FirstName", MySqlDbType.VarChar, 32, "FirstName"); mySqlDataAdapter.InsertCommand.Parameters.Add("@LastName", MySqlDbType.VarChar, 32, "LastName"); mySqlDataAdapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None; var stopwatch = new Stopwatch(); stopwatch.Start(); for (int i = 0; i < 50000; i++) { DataRow row = ds.Tables["User2"].NewRow(); row["FirstName"] = "1234"; row["LastName"] = "1234"; ds.Tables["User2"].Rows.Add(row); } mySqlDataAdapter.UpdateBatchSize = 100; mySqlDataAdapter.Update(ds, "User2"); transaction.Commit(); stopwatch.Stop(); Debug.WriteLine(" inserts took " + stopwatch.ElapsedMilliseconds + "ms"); } }
这种方式可能不会比stringbuilder方法快,但它是参数化的:
/// <summary> /// Bulk insert some data, uses parameters /// </summary> /// <param name="table">The Table Name</param> /// <param name="inserts">Holds list of data to insert</param> /// <param name="batchSize">executes the insert after batch lines</param> /// <param name="progress">Progress reporting</param> public void BulkInsert(string table, MySQLBulkInsertData inserts, int batchSize = 100, IProgress<double> progress = null) { if (inserts.Count <= 0) throw new ArgumentException("Nothing to Insert"); string insertcmd = string.Format("INSERT INTO `{0}` ({1}) VALUES ", table, inserts.Fields.Select(p => p.FieldName).ToCSV()); StringBuilder sb = new StringBuilder(); using (MySqlConnection conn = new MySqlConnection(ConnectionString)) using (MySqlCommand sqlExecCommand = conn.CreateCommand()) { conn.Open(); sb.AppendLine(insertcmd); for (int i = 0; i < inserts.Count; i++) { sb.AppendLine(ToParameterCSV(inserts.Fields, i)); for (int j = 0; j < inserts[i].Count(); j++) { sqlExecCommand.Parameters.AddWithValue(string.Format("{0}{1}",inserts.Fields[j].FieldName,i), inserts[i][j]); } //commit if we are on the batch sizeor the last item if (i > 0 && (i%batchSize == 0 || i == inserts.Count - 1)) { sb.Append(";"); sqlExecCommand.CommandText = sb.ToString(); sqlExecCommand.ExecuteNonQuery(); //reset the stringBuilder sb.Clear(); sb.AppendLine(insertcmd); if (progress != null) { progress.Report((double)i/inserts.Count); } } else { sb.Append(","); } } } }
这使用如下的帮助程序类:
/// <summary> /// Helper class to builk insert data into a table /// </summary> public struct MySQLFieldDefinition { public MySQLFieldDefinition(string field, MySqlDbType type) : this() { FieldName = field; ParameterType = type; } public string FieldName { get; private set; } public MySqlDbType ParameterType { get; private set; } } /// ///You need to ensure the fieldnames are in the same order as the object[] array /// public class MySQLBulkInsertData : List<object[]> { public MySQLBulkInsertData(params MySQLFieldDefinition[] fieldnames) { Fields = fieldnames; } public MySQLFieldDefinition[] Fields { get; private set; } }
而这个辅助方法:
/// <summary> /// Return a CSV string of the values in the list /// </summary> /// <returns></returns> /// <exception cref="ArgumentNullException"></exception> private string ToParameterCSV(IEnumerable<MySQLFieldDefinition> p, int row) { string csv = p.Aggregate(string.Empty, (current, i) => string.IsNullOrEmpty(current) ? string.Format("@{0}{1}",i.FieldName, row) : string.Format("{0},@{2}{1}", current, row, i.FieldName)); return string.Format("({0})", csv); }
也许不是超级优雅,但效果很好.我需要进度跟踪,以便包含在内,随时删除该部分.
这将生成类似于所需输出的SQL命令.
编辑:ToCSV:
/// <summary> /// Return a CSV string of the values in the list /// </summary> /// <param name="intValues"></param> /// <param name="separator"></param> /// <param name="encloser"></param> /// <returns></returns> /// <exception cref="ArgumentNullException"></exception> public static string ToCSV<T>(this IEnumerable<T> intValues, string separator = ",", string encloser = "") { string result = String.Empty; foreach (T value in intValues) { result = String.IsNullOrEmpty(result) ? string.Format("{1}{0}{1}", value, encloser) : String.Format("{0}{1}{3}{2}{3}", result, separator, value, encloser); } return result; }
在a中执行命令Transaction
并为每次迭代重用相同的命令实例.要进一步优化性能,请在一个命令中发送100个查询.进行并行执行可以提供更好的性能(Parallel.For
),但要确保每个并行循环获得自己的MySqlCommand
实例.
public static void CSVToMySQL() { string ConnectionString = "server=192.168.1xxx"; string Command = "INSERT INTO User (FirstName, LastName ) VALUES (@FirstName, @LastName);"; using (MySqlConnection mConnection = new MySqlConnection(ConnectionString)) { mConnection.Open(); using (MySqlTransaction trans = mConnection.BeginTransaction()) { using (MySqlCommand myCmd = new MySqlCommand(Command, mConnection, trans)) { myCmd.CommandType = CommandType.Text; for (int i = 0; i <= 99999; i++) { //inserting 100k items myCmd.Parameters.Clear(); myCmd.Parameters.AddWithValue("@FirstName", "test"); myCmd.Parameters.AddWithValue("@LastName", "test"); myCmd.ExecuteNonQuery(); } trans.Commit(); } } } }
如果Add
的AddWithValue
不逃串,你必须提前做这样避免SQL注入和语法错误.
构建INSERT
一次只有1000行的语句.这应该比你开始时的速度快10倍(每行1行INSERT
).一次完成所有100K是有风险的,可能更慢.冒险,因为你可能会吹出一些限制(数据包大小等); 由于需要巨大的ROLLBACK
日志而变慢. COMMIT
每批后,或使用autocommit=1
.
这是我的"多次插入" - 代码.
插入100k行而不是40秒仅需3秒!
public static void BulkToMySQL() { string ConnectionString = "server=192.168.1xxx"; StringBuilder sCommand = new StringBuilder("INSERT INTO User (FirstName, LastName) VALUES "); using (MySqlConnection mConnection = new MySqlConnection(ConnectionString)) { List<string> Rows = new List<string>(); for (int i = 0; i < 100000; i++) { Rows.Add(string.Format("('{0}','{1}')", MySqlHelper.EscapeString("test"), MySqlHelper.EscapeString("test"))); } sCommand.Append(string.Join(",", Rows)); sCommand.Append(";"); mConnection.Open(); using (MySqlCommand myCmd = new MySqlCommand(sCommand.ToString(), mConnection)) { myCmd.CommandType = CommandType.Text; myCmd.ExecuteNonQuery(); } } }
创建的SQL语句如下所示:
INSERT INTO User (FirstName, LastName) VALUES ('test','test'),('test','test'),... ;
更新:感谢Salman我添加了MySQLHelper.EscapeString
以避免在使用参数时内部使用的代码注入.