Merge patch series "Update lpfc to revision 14.4.0.5"
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.4.0.5
This patch set contains bug fixes related to HBA state clean ups, FCP
discovery on older adapters, kref imbalances, log message improvements,
and support for a new diagnostic loopback testing mode.
The patches were cut against Martin's 6.12/scsi-queue tree.
Justin Tee [Thu, 12 Sep 2024 23:24:46 +0000 (16:24 -0700)]
scsi: lpfc: Support loopback tests with VMID enabled
The VMID feature adds an extra application services header to each frame.
As such, the loopback test path is updated to accommodate the extra
application header.
Changes include filling in APPID and WQES bit fields for XMIT_SEQUENCE64
commands, a special loopback source APPID for verifying received loopback
data matches what is sent, and increasing ELS WQ size to accommodate the
APPID field in loopback test mode.
Justin Tee [Thu, 12 Sep 2024 23:24:45 +0000 (16:24 -0700)]
scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING
Revise certain log messages marked as KERN_ERR LOG_TRACE_EVENT to
KERN_WARNING and use the lpfc_vlog_msg() macro to still log the event. The
benefit is that events of interest are still logged and the entire trace
buffer is not dumped with extraneous logging information when using default
lpfc_log_verbose driver parameter settings.
Also, delete the keyword "fail" from such log messages as they aren't
really causes for concern. The log messages are more for warnings to a SAN
admin about SAN activity.
Justin Tee [Thu, 12 Sep 2024 23:24:44 +0000 (16:24 -0700)]
scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance
Deleting an NPIV instance requires all fabric ndlps to be released before
an NPIV's resources can be torn down. Failure to release fabric ndlps
beforehand opens kref imbalance race conditions. Fix by forcing the DA_ID
to complete synchronously with usage of wait_queue.
Justin Tee [Thu, 12 Sep 2024 23:24:43 +0000 (16:24 -0700)]
scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo handler
With a FLOGI outstanding and loss of physical link connection to the fabric
for the duration of dev_loss_tmo, there is a fabric ndlp kref imbalance
that decrements the kref and sets the NLP_IN_RECOV_POST_DEV_LOSS flag at
the same time.
The issue is that when the FLOGI completion routine executes, the fabric
ndlp could already be freed because of the final kref put from the
dev_loss_tmo handler. Fix by early returning before the ndlp kref put if
the ndlp is deemed a candidate for NLP_IN_RECOV_POST_DEV_LOSS in the FLOGI
completion routine.
Justin Tee [Thu, 12 Sep 2024 23:24:42 +0000 (16:24 -0700)]
scsi: lpfc: Restrict support for 32 byte CDBs to specific HBAs
An older generation of HBAs are failing FCP discovery due to usage of an
outdated field in FCP command WQEs.
Fix by checking the SLI Interface Type register for applicable support of
32 Byte CDB commands, and restore a setting for a WQE path using normal 16
byte CDBs.
Fixes: af20bb73ac25 ("scsi: lpfc: Add support for 32 byte CDBs") Cc: stable@vger.kernel.org # v6.10+ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240912232447.45607-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Thu, 12 Sep 2024 23:24:41 +0000 (16:24 -0700)]
scsi: lpfc: Update phba link state conditional before sending CMF_SYNC_WQE
It's possible for the driver to send a CMF_SYNC_WQE to nonresponsive
firmware during reset of the adapter. The phba link_state conditional
check is currently a strict == LPFC_LINK_DOWN, which does not cover
initialization states before reaching the LPFC_LINK_UP state.
Update the phba->link_state conditional to < LPFC_LINK_UP so that all
initialization states are covered before allowing sending CMF_SYNC_WQE.
Update taking of the hbalock to be during this link_state check as well.
Justin Tee [Thu, 12 Sep 2024 23:24:40 +0000 (16:24 -0700)]
scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd()
During HBA stress testing, a spam of received PLOGIs exposes a resource
recovery bug causing leakage of lpfc_sqlq entries from the global
phba->sli4_hba.lpfc_els_sgl_list.
The issue is in lpfc_els_flush_cmd(), where the driver attempts to recover
outstanding ELS sgls when walking the txcmplq. Only CMD_ELS_REQUEST64_CRs
and CMD_GEN_REQUEST64_CRs are added to the abort and cancel lists. A check
for CMD_XMIT_ELS_RSP64_WQE is missing in order to recover LS_ACC usages of
the phba->sli4_hba.lpfc_els_sgl_list too.
Fix by adding CMD_XMIT_ELS_RSP64_WQE as part of the txcmplq walk when
adding WQEs to the abort and cancel list in lpfc_els_flush_cmd(). Also,
update naming convention from CRs to WQEs.
scsi: mpi3mr: Improve wait logic while controller transitions to READY state
During controller transitioning to READY state, if the controller is found
in transient states ("becoming ready" or "reset requested"), driver waits
for 510 secs even if the controller transitions out of these states
early. This causes an unnecessary delay of 510 secs in the overall firmware
initialization sequence.
Poll the controller state periodically (every 100 milliseconds) while
waiting for the controller to come out of the mentioned transient
states. Once the controller transits out of the transient states, come out
of the wait loop.
scsi: mpi3mr: Use firmware-provided timestamp update interval
Make driver use the timestamp update interval value provided by firmware in
the driver page 1. If firmware fails to provide non-zero value, then the
driver will fall back to the driver defined macro.
scsi: mpi3mr: Enhance the Enable Controller retry logic
When enabling the IOC request and polling for controller ready status, poll
for controller fault and reset history bit. If the controller is faulted
or the reset history bit is set, retry the initialization a maximum of
three times (2 retries) or if the cumulative time taken for all retries
exceeds 510 seconds.
Martin Wilck [Thu, 12 Sep 2024 13:43:08 +0000 (15:43 +0200)]
scsi: sd: Fix off-by-one error in sd_read_block_characteristics()
Ff the device returns page 0xb1 with length 8 (happens with qemu v2.x, for
example), sd_read_block_characteristics() may attempt an out-of-bounds
memory access when accessing the zoned field at offset 8.
Fixes: 7fb019c46eee ("scsi: sd: Switch to using scsi_device VPD pages") Cc: stable@vger.kernel.org Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20240912134308.282824-1-mwilck@suse.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Daniel Wagner [Thu, 12 Sep 2024 08:58:28 +0000 (10:58 +0200)]
scsi: pm8001: Do not overwrite PCI queue mapping
blk_mq_pci_map_queues() maps all queues but right after this, we overwrite
these mappings by calling blk_mq_map_queues(). Just use one helper but not
both.
Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20240912-do-not-overwrite-pci-mapping-v1-1-85724b6cec49@suse.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chen Ni [Thu, 5 Sep 2024 02:35:21 +0000 (10:35 +0800)]
scsi: pmcraid: Convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects. Although
that is not the case here, it is seems best to use ';' unless ',' is
intended.
Found by inspection. No functional change intended. Compile tested only.
During system resume, sd_start_stop_device() submits a START STOP UNIT
command to the SCSI device that is being resumed. That command is not
retried in case of a unit attention and hence may fail. An example:
Make the SCSI core retry the START STOP UNIT command if the device reports
that it has been powered on or that it has been reset.
Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240904210304.2947789-1-bvanassche@acm.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomas Henzl [Tue, 3 Sep 2024 14:47:29 +0000 (16:47 +0200)]
scsi: mpi3mr: A performance fix
Commit 0c52310f2600 ("hrtimer: Ignore slack time for RT tasks in
schedule_hrtimeout_range()") effectivelly shortens a sleep in a polling
function in the driver. That is causing a performance regression as the
new value of just 2us is too low, in certain tests the perf drop is ~30%.
Fix this by adjusting the sleep to 20us (close to the previous value).
Reported-by: Jan Jurca <jjurca@redhat.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240903144729.37218-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 8db8f6ce556a ("scsi: ufs: qcom: Add missing interconnect bandwidth
values for Gear 5") updated the ufs_qcom_bw_table for Gear 5. However, it
missed updating the cfg_bw value for the max mode.
Hence update the cfg_bw value for the max mode for UFS 4.x devices.
Fixes: 8db8f6ce556a ("scsi: ufs: qcom: Add missing interconnect bandwidth values for Gear 5") Cc: stable@vger.kernel.org Signed-off-by: Manish Pandey <quic_mapa@quicinc.com> Link: https://lore.kernel.org/r/20240903063709.4335-1-quic_mapa@quicinc.com Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Yan Zhen [Mon, 2 Sep 2024 01:33:03 +0000 (09:33 +0800)]
scsi: fusion: mptctl: Use min() macro
Using the real macro is usually more intuitive and readable when the
original file is guaranteed to contain the minmax.h header file and compile
correctly.
scsi: libcxgbi: Remove an unused field in struct cxgbi_device
Usage of .dev_ddp_cleanup() in libcxgbi was removed by commit 5999299f1ce9
("cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support") on 2016-07.
.csk_rx_pdu_ready() and debugfs_root have apparently never been used since
introduction by commit 9ba682f01e2f ("[SCSI] libcxgbi: common library for
cxgb3i and cxgb4i")
Remove the now unused function pointer from struct cxgbi_device.
Brian King [Tue, 3 Sep 2024 13:47:09 +0000 (08:47 -0500)]
scsi: ibmvfc: Add max_sectors module parameter
There are some scenarios that can occur, such as performing an upgrade of
the virtual I/O server, where the supported max transfer of the backing
device for an ibmvfc HBA can change. If the max transfer of the backing
device decreases, this can cause issues with previously discovered
LUNs. This patch accomplishes two things. First, it changes the default
ibmvfc max transfer value to 1MB. This is generally supported by all
backing devices, which should mitigate this issue out of the box. Secondly,
it adds a module parameter, enabling a user to increase the max transfer
value to values that are larger than 1MB, as long as they have configured
these larger values on the virtual I/O server as well.
Rafael Rocha [Thu, 5 Sep 2024 17:39:21 +0000 (12:39 -0500)]
scsi: st: Fix input/output error on empty drive reset
A previous change was introduced to prevent data loss during a power-on
reset when a tape is present inside the drive. This commit set the
"pos_unknown" flag to true to avoid operations that could compromise data
by performing actions from an untracked position. The relevant change is
commit 9604eea5bd3a ("scsi: st: Add third party poweron reset handling")
As a consequence of this change, a new issue has surfaced: the driver now
returns an "Input/output error" even for empty drives when the drive, host,
or bus is reset. This issue stems from the "flush_buffer" function, which
first checks whether the "pos_unknown" flag is set. If the flag is set, the
user will encounter an "Input/output error" until the tape position is
known again. This behavior differs from the previous implementation, where
empty drives were not affected at system start up time, allowing tape
software to send commands to the driver to retrieve the drive's status and
other information.
The current behavior prioritizes the "pos_unknown" flag over the
"ST_NO_TAPE" status, leading to issues for software that detects drives
during system startup. This software will receive an "Input/output error"
until a tape is loaded and its position is known.
To resolve this, the "ST_NO_TAPE" status should take priority when the
drive is empty, allowing communication with the drive following a power-on
reset. At the same time, the change should continue to protect data by
maintaining the "pos_unknown" flag when the drive contains a tape and its
position is unknown.
Signed-off-by: Rafael Rocha <rrochavi@fnal.gov> Link: https://lore.kernel.org/r/20240905173921.10944-1-rrochavi@fnal.gov Fixes: 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
These patches are based on Martin Petersen's 6.12/scsi-queue tree
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
6.12/scsi-queue
There are two functional changes:
smartpqi-add-fw-log-to-kdump
smartpqi-add-counter-for-parity-write-stream-requests
There are three minor bug fixes:
smartpqi-fix-stream-detection
smartpqi-fix-rare-system-hang-during-LUN-reset
smartpqi-fix-volume-size-updates
The other two patches add PCI-IDs for new controllers and change the
driver version.
This set of changes consists of:
* smartpqi-add-fw-log-to-kdump
During a kdump, the driver tells the controller to copy its logging
information to some pre-allocated buffers that can be analyzed
later.
This is a "feature" driven capability and is backward compatible
with existing controller FW.
This patch renames some prefixes for OFA (Online-Firmware Activation
ofa_*) buffers to host_memory_*. So, not a lot of actual functional
changes to smartpqi_init.c, mainly determining the memory size
allocation.
We added a function to notify the controller to copy debug data into
host memory before continuing kdump.
Most of the functional changes are in smartpqi_sis.c where the
actual handshaking is done.
* smartpqi-fix-stream-detection
Correct some false write-stream detections. The data structure used
to check for write-streams was not initialized to all 0's causing
some false write stream detections. The driver sends down streamed
requests to the raid engine instead of using AIO bypass for some
extra performance. (Potential full-stripe write verses Read Modify
Write).
False detections have not caused any data corruption. Found by
internal testing. No known externally reported bugs.
Adding some counters for raid_bypass and write streams. These two
counters are related because write stream detection is only checked
if an I/O request is eligible for bypass (AIO).
The bypass counter (raid_bypass_cnt) was moved into a common
structure (pqi_raid_io_stats) and changed to type __percpu. The
write stream counter is (write_stream_cnt) has been added to this
same structure.
These counters are __percpu counters for performance. We added a
sysfs entry to show the write stream count. The raid bypass counter
sysfs entry already exists.
Useful for checking streaming writes. The change in the sysfs entry
write_stream_cnt can be checked during AIO eligible write
operations.
* smartpqi-add-new-controller-PCI-IDs
Adding support for new controller HW. No functional changes.
* smartpqi-fix-rare-system-hang-during-LUN-reset
We found a rare race condition that can occur during a LUN reset. We
were not emptying our internal queue completely.
There have been some rare conditions where our internal request
queue has requests for multiple LUNs and a reset comes in for one of
the LUNs. The driver waits for this internal queue to empty. We were
only clearing out the requests for the LUN being reset so the
request queue was never empty causing a hang.
The Fix:
For all requests in our internal request queue:
Complete requests with DID_RESET for queued requests for the
device undergoing a reset.
Complete requests with DID_REQUEUE for all other queued requests.
Found by internal testing. No known externally reported bugs.
* smartpqi-fix-volume-size-updates
The current code only checks for a size change if there is also a
queue depth change. We are separating the check for queue depth and
the size changes.
Found by internal testing. No known bugs were filed.
* smartpqi-update-version-to-2.1.30-031
No functional changes.
Don Brace [Tue, 27 Aug 2024 18:55:01 +0000 (13:55 -0500)]
scsi: smartpqi: update driver version to 2.1.30-031
Update driver version to 2.1.30-031.
Reviewed-by: David Strahan <david.strahan@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Gerry Morong <gerry.morong@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-8-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Murthy Bhat [Tue, 27 Aug 2024 18:54:59 +0000 (13:54 -0500)]
scsi: smartpqi: fix rare system hang during LUN reset
Correct a rare case where in a LUN reset occurs on a device and I/O
requests for other devices persist in the driver's internal request queue.
Part of a LUN reset involves waiting for our internal request queue to
empty before proceeding. The internal request queue contains requests not
yet sent down to the controller.
We were clearing the requests queued for the LUN undergoing a reset, but
not all of the queued requests. Causing a hang.
For all requests in our internal request queue:
Complete requests with DID_RESET for queued requests for the device
undergoing a reset.
Complete requests with DID_REQUEUE for all other queued requests.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-6-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
David Strahan [Tue, 27 Aug 2024 18:54:58 +0000 (13:54 -0500)]
scsi: smartpqi: add new controller PCI IDs
All PCI ID entries in Hex.
Add new cisco pci ids:
VID / DID / SVID / SDID
---- ---- ---- ----
9005 028f 1137 02fe
9005 028f 1137 02ff
9005 028f 1137 0300
Add new h3c pci ids:
VID / DID / SVID / SDID
---- ---- ---- ----
9005 028f 193d 0462
9005 028f 193d 8462
Add new ieit pci ids:
VID / DID / SVID / SDID
---- ---- ---- ----
9005 028f 1ff9 00a3
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: David Strahan <David.Strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-5-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: smartpqi: add counter for parity write stream requests
Add sysfs entry to check for write stream requests.
Move existing raid_bypass_cnt into a structure named pqi_raid_io_stats and
add member write_stream_cnt. These two counters are related because write
stream detection is only checked if an I/O request is eligible for bypass
(AIO).
Example usage:
lsscsi
[15:1:0:0] disk Adaptec LOGICAL VOLUME 0129 /dev/sdae
cat /sys/block/sdae/device/ssd_smart_path_enabled
1
^
|
+---- NOTE: here bypass has been enabled on device sdae
To read the counter for parity write stream requests:
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Co-developed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-4-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct stream detection by initializing the structure
pqi_scsi_dev_raid_map_data to 0s.
When the OS issues SCSI READ commands, the driver erroneously considers
them as SCSI WRITES. If they are identified as sequential IOs, the driver
then submits those requests via the RAID path instead of the AIO path.
The 'is_write' flag might be set for SCSI READ commands also. The driver
may interpret SCSI READ commands as SCSI WRITE commands, resulting in IOs
being submitted through the RAID path.
Note: This does not cause data corruption.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-3-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Murthy Bhat [Tue, 27 Aug 2024 18:54:55 +0000 (13:54 -0500)]
scsi: smartpqi: Add fw log to kdump
Add controller logs to kdump.
Driver allocates DMA memory and communicates this address to FW. In the
event of system crash, host driver notifies the firmware about the crash
and firmware posts all the necessary logs in the pre-allocated host buffer
for firmware debugging.
Once firmware notifies the completion of the log uploading to the host
memory and host continues with the OS crash dump saving.
This is a "feature" driven capability and is backward compatible with
existing controller FW.
Rename some prefixes for OFA (Online-Firmware Activation ofa_*) buffers to
host_memory_*. So, not a lot of actual functional changes to
smartpqi_init.c, mainly determining the memory size allocation.
Added a function to notify the controller to copy debug data into host
memory before continuing kdump.
Most of the functional changes are in smartpqi_sis.c where the actual
handshaking is done.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240827185501.692804-2-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: qla2xxx: Remove the unused 'del_list_entry' field in struct fc_port
The 'del_list_entry' field in "struct fc_port" is unused.
The field was introduced in commit 2d70c103fd2a ("[SCSI] qla2xxx: Add LLD
target-mode infrastructure for >= 24xx series") in 2012-05 and the last
user was removed in commit 726b85487067 ("qla2xxx: Add framework for async
fabric discovery") in 2017-02.
Bao D. Nguyen [Tue, 27 Aug 2024 23:14:13 +0000 (16:14 -0700)]
scsi: ufs: core: Remove ufshcd_urgent_bkops()
ufshcd_urgent_bkops() is a wrapper function. It only calls
ufshcd_bkops_ctrl(). Remove it to simplify the ufs core driver. Replace any
references to ufshcd_urgent_bkops() with ufshcd_bkops_ctrl().
In addition, remove the second parameter in the ufshcd_bkops_ctrl() because
the information can be retrieved from the first parameter.
Merge patch series "Simplify multiple create*_workqueue() invocations"
Bart Van Assche <bvanassche@acm.org> says:
Hi Martin,
Multiple SCSI drivers use snprintf() to format a workqueue name before
invoking one of the create*_workqueue() macros. This patch series
simplifies such code by passing the format string and arguments to
alloc_workqueue(). Additionally, the structure members that are only
used as a temporary buffer for formatting workqueue names are
removed. Please consider this patch series for the next merge window.
Bart Van Assche [Thu, 22 Aug 2024 19:59:22 +0000 (12:59 -0700)]
scsi: core: Simplify an alloc_workqueue() invocation
Let alloc_workqueue() format the workqueue name. Remove the
work_q_name[] member from struct Scsi_Host because it is no longer
used by any SCSI driver nor by the SCSI core.
Bart Van Assche [Thu, 22 Aug 2024 19:59:21 +0000 (12:59 -0700)]
scsi: ufs: Simplify alloc*_workqueue() invocation
Let alloc*_workqueue() format the workqueue name instead of calling
snprintf() explicitly.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240822195944.654691-18-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Let alloc_workqueue() format the workqueue name instead of calling
snprintf() explicitly. Not setting shost->work_q_name is safe because
there is no code that reads the value set by the removed code.
Bart Van Assche [Thu, 22 Aug 2024 19:59:05 +0000 (12:59 -0700)]
scsi: Expand all create*_workqueue() invocations
The workqueue maintainer wants to remove the create*_workqueue() macros
because these macros always set the WQ_MEM_RECLAIM flag and because these
only support literal workqueue names. Hence this patch that replaces the
create*_workqueue() invocations with the definition of this macro. The
WQ_MEM_RECLAIM flag has been retained because I think that flag is necessary
for workqueues created by storage drivers. This patch has been generated by
running spatch and git clang-format. spatch has been invoked as follows:
Commit d703ce2f7f4d ("iscsi/iser-target: Convert to command priv_size
usage") remove iscsit_alloc_cmd() but left declaration.
And finally, a few other declarations were never implenmented since
introduction in commit e48354ce078c ("iscsi-target: Add iSCSI fabric
support for target v4.1").
Dan Carpenter [Thu, 15 Aug 2024 11:29:05 +0000 (14:29 +0300)]
scsi: elx: libefc: Fix potential use after free in efc_nport_vport_del()
The kref_put() function will call nport->release if the refcount drops to
zero. The nport->release release function is _efc_nport_free() which frees
"nport". But then we dereference "nport" on the next line which is a use
after free. Re-order these lines to avoid the use after free.
Fixes: fcd427303eb9 ("scsi: elx: libefc: SLI and FC PORT state machine interfaces") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/b666ab26-6581-4213-9a3d-32a9147f0399@stanley.mountain Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dan Carpenter [Thu, 15 Aug 2024 11:24:36 +0000 (14:24 +0300)]
scsi: ufs: ufshcd-pltfrm: Signedness bug in ufshcd_parse_clock_info()
The "sz" variable needs to be a signed type for the error handling to work
as intended. Fortunately, there is some sanity checking on "sz" on the
next line, so negative values would be caught and it doesn't really affect
runtime.
Fixes: eab0dce11dd9 ("scsi: ufs: ufshcd-pltfrm: Use of_property_count_u32_elems() to get property length") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/404a4727-89c6-410b-9ece-301fa399d4db@stanley.mountain Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Avri Altman [Sun, 11 Aug 2024 14:37:57 +0000 (17:37 +0300)]
scsi: ufs: Add HCI capabilities sysfs group
The standard register map of UFSHCI is comprised of several groups. The
first group (starting from offset 0x00), is the host capabilities group.
It contains some interesting information that otherwise is not available,
e.g. the UFS version of the platform etc.
Reviewed-by: Keoseong Park <keosung.park@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20240811143757.2538212-3-avri.altman@wdc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Avri Altman [Sun, 11 Aug 2024 14:37:56 +0000 (17:37 +0300)]
scsi: ufs: Prepare to add HCI capabilities sysfs
Prepare so we'll be able to read various other HCI registers. While at it,
fix the HCPID & HCMID register names to stand for what they really are.
Also replace the pm_runtime_{get/put}_sync() calls in auto_hibern8_show to
ufshcd_rpm_{get/put}_sync() as any host controller register reads should.
Reviewed-by: Keoseong Park <keosung.park@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20240811143757.2538212-2-avri.altman@wdc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Merge patch series "NCR5380: Bug fixes and other improvements"
Finn Thain <fthain@linux-m68k.org> says:
This series begins with some work on the mac_scsi driver to improve
compatibility with SCSI2SD v5 devices. Better error handling is needed
there because the PDMA hardware does not tolerate the write latency
spikes which SD cards can produce.
A bug is fixed in the 5380 core driver so that scatter/gather can be
enabled in mac_scsi.
Several patches at the end of this series improve robustness and
correctness in the core driver.
This series has been tested on a variety of mac_scsi hosts. A variety
of SCSI targets was also tested, including Quantum HDD, Fujitsu HDD,
Iomega FDD, Ricoh CD-RW, Matsushita CD-ROM, SCSI2SD and BlueSCSI.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: NCR5380: Remove redundant result calculation from NCR5380_transfer_pio()
NCR5380_transfer_pio() returns an ambiguous value which is ignored by
callers. Make it void and remove the redundant calculation. Adopt
kernel-doc format for the updated description.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: NCR5380: Handle BSY signal loss during information transfer phases
Improve robustness by checking for a lost BSY signal during the information
transfer loop. The status register is being polled anyway, so a BSY check
costs nothing. BSY signal loss could be caused by a target error or a
kicked plug etc. A bus reset is another possibility but that is already
handled and hostdata->connected would be NULL.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: NCR5380: Initialize buffer for MSG IN and STATUS transfers
Following an incomplete transfer in MSG IN phase, the driver would not
notice the problem and would make use of invalid data. Initialize 'tmp'
appropriately and bail out if no message was received. For STATUS phase,
preserve the existing status code unless a new value was transferred.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: mac_scsi: Enable scatter/gather by default
Now that FLAG_DMA_FIXUP has itself been fixed up, it can be used to enable
scatter/gather. Increase the default value for sg_tablesize to SG_ALL for
those systems which are compatible with FLAG_DMA_FIXUP.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: NCR5380: Check for phase match during PDMA fixup
It's not an error for a target to change the bus phase during a transfer.
Unfortunately, the FLAG_DMA_FIXUP workaround does not allow for that -- a
phase change produces a DRQ timeout error and the device borken flag will
be set.
Check the phase match bit during FLAG_DMA_FIXUP processing. Don't forget to
decrement the command residual. While we are here, change shost_printk()
into scmd_printk() for better consistency with other DMA error messages.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: mac_scsi: Disallow bus errors during PDMA send
SD cards can produce write latency spikes on the order of a hundred
milliseconds. If the target firmware does not hide that latency during DATA
IN and OUT phases it can cause the PDMA circuitry to raise a processor bus
fault which in turn leads to an unreliable byte count and a DMA overrun.
The Last Byte Sent flag is used to detect the overrun but this mechanism is
unreliable on some systems. Instead, set a DID_ERROR result whenever there
is a bus fault during a PDMA send, unless the cause was a phase mismatch.
Finn Thain [Wed, 7 Aug 2024 03:36:28 +0000 (13:36 +1000)]
scsi: mac_scsi: Refactor polling loop
Before the error handling can be revised, some preparation is needed.
Refactor the polling loop with a new function, macscsi_wait_for_drq().
This function will gain more call sites in the next patch.
After a bus fault, capture and log the chip registers immediately, if the
NDEBUG_PSEUDO_DMA macro is defined. Remove some printk(KERN_DEBUG ...)
messages that aren't needed any more. Don't skip the debug message when
bytes == 0. Show all of the byte counters in the debug messages.
Ranjan Kumar [Thu, 8 Aug 2024 12:54:17 +0000 (18:24 +0530)]
scsi: mpi3mr: Update consumer index of reply queues after every 100 replies
Instead of updating the ConsumerIndex of the Admin and Operational
ReplyQueues after processing all replies in the queue, the index will now
be periodically updated after processing every 100 replies.
Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240808125418.8832-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ranjan Kumar [Thu, 8 Aug 2024 12:54:16 +0000 (18:24 +0530)]
scsi: mpi3mr: Return complete ioc_status for ioctl commands
The driver masked the loginfo available bit in the iocstatus before passing
it to the applications, causing a mismatch in error messages between Linux
and other operating systems.
Modify driver to return unmasked (complete) iocstatus, including the
loginfo available bit, for the MPI commands sent through the ioctl
interface.
Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240808125418.8832-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: ufs: ufshcd-pltfrm: Use of_property_count_u32_elems() to get property length
Replace of_get_property() with the type specific
of_property_count_u32_elems() to get the property length.
This is part of a larger effort to remove callers of of_get_property() and
similar functions. of_get_property() leaks the DT property data pointer
which is a problem for dynamically allocated nodes which may be freed.
scsi: ufs: ufshcd-pltfrm: Use of_property_present()
Use of_property_present() to test for property presence rather than
of_find_property(). This is part of a larger effort to remove callers of
of_find_property() and similar functions. of_find_property() leaks the DT
struct property and data pointers which is a problem for dynamically
allocated nodes which may be freed.
John Garry [Mon, 5 Aug 2024 11:33:15 +0000 (11:33 +0000)]
scsi: block: Don't check REQ_ATOMIC for reads
We check in submit_bio_noacct() if flag REQ_ATOMIC is set for both read and
write operations, and then validate the atomic operation if set. Flag
REQ_ATOMIC can only be set for writes, so don't bother checking for reads.
Fixes: 9da3d1e912f3 ("block: Add core atomic write support") Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240805113315.1048591-3-john.g.garry@oracle.com Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The functional changes of note to smartpqi are for: multipath failover
and improving the accuracy of our RAID bypass counter.
For multipath we are:
Reverting commit 94a68c814328 ("scsi: smartpqi: Quickly propagate
path failures to SCSI midlayer") because under certain rare
conditions involving encryption-enabled devices, a false path
failure is reported to the SML causing multipath to failover to
the other path.
Improving errors returned from the driver back to the SML by
checking for error codes returned from the firmware and returning
the correct ASC/ASCQ codes to the SML.
The other two patches add PCI-IDs for new controllers and change the
driver version.
Don Brace [Thu, 11 Jul 2024 19:47:04 +0000 (14:47 -0500)]
scsi: smartpqi: Update driver version to 2.1.28-025
Update driver version to 2.1.28-025.
Reviewed-by: Mike Tran <mike.tran@microchip.com> Reviewed-by: Gerry Morong <gerry.morong@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-6-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Thu, 11 Jul 2024 19:47:03 +0000 (14:47 -0500)]
scsi: smartpqi: Improve handling of multipath failover
Improve multipath failovers by mapping firmware errors into I/O errors.
In some rare instances, firmware does not return the proper error code for
I/O errors caused by a multipath path failure.
Map I/O errors returned by firmware into errors that help the multipath
layer to detect the failure of a path.
Reviewed-by: Gerry Morong <gerry.morong@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-5-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct a rare multipath failure issue by reverting commit 94a68c814328
("scsi: smartpqi: Quickly propagate path failures to SCSI midlayer") [1].
Reason for revert: The patch propagated the path failure to SML quickly
when one of the path fails during IO and AIO path gets disabled for a
multipath device.
But it created a new issue: when creating a volume on an encryption-enabled
controller, the firmware reports the AIO path is disabled, which cause the
driver to report a path failure to SML for a multipath device.
There will be a new fix to handle "Illegal request" and "Invalid field in
parameter list" on RAID path when the AIO path is disabled on a multipath
device.
Fixes: 94a68c814328 ("scsi: smartpqi: Quickly propagate path failures to SCSI midlayer") Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Gilbert Wu <Gilbert.Wu@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-4-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Thu, 11 Jul 2024 19:47:01 +0000 (14:47 -0500)]
scsi: smartpqi: Improve accuracy/performance of raid-bypass-counter
The original implementation of this counter used an atomic variable.
However, this implementation negatively impacted performance in some
configurations.
Switch to using per_cpu variables.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Co-developed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-3-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add new h3c pci_id:
VID / DID / SVID / SDID
---- ---- ---- ----
UN RAID P4408-Mr-2 9005 / 028f / 193d / 1110
Add new powerleader pci ids:
VID / DID / SVID / SDID
---- ---- ---- ----
PL SmartROC PM8204 9005 / 028f / 1f3a / 0104
Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: David Strahan <David.Strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240711194704.982400-2-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Merge patch series "Update lpfc to revision 14.4.0.4"
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.4.0.4
This patch set contains diagnostic logging improvements, a minor clean
up when submitting abort requests, a bug fix related to reset and
errata paths, and modifications to FLOGI and PRLO ELS command
handling.
The patches were cut against Martin's 6.11/scsi-queue tree.