Computer Vision in Swimming: How AI Is Changing Stroke Analysis

Computer Vision in Swimming: How AI Is Changing Stroke Analysis

Cono Presti 12 min read
Brand Breakdown Swim Tech

A coach on the pool deck used to rely on a stopwatch, a good eye, and decades of pattern recognition to diagnose what was wrong with a stroke. Now a phone clipped to a tripod can extract the elbow angle, catch timing, and stroke efficiency from every length of a practice, and flag the specific frame where the elbow dropped two degrees under fatigue. That is not a future-tense claim. Pose estimation models are detecting joint angles in swimming video at 96% accuracy and agreeing with international technique experts 94% of the time.

The Short Answer

Computer vision in swimming uses deep learning models called convolutional neural networks (CNNs) to detect 17+ body landmarks frame-by-frame in video, then feeds those landmarks to temporal models (LSTM networks) that classify strokes and extract metrics. Research systems achieve 96% accuracy on elbow-angle detection and 94% expert agreement on technique assessment. Real-world platforms such as Dartfish, iSWIM, and a growing set of app-based tools bring that analysis to pool decks and home training. But AI still can't interpret the why behind the numbers, which is why it's a tool that amplifies good coaches rather than replacing them.

What Computer Vision in Swimming Actually Is

Computer vision is the field of AI that teaches machines to interpret visual information the way humans do, including identifying objects, tracking motion, and measuring distances. Applied to swimming, the specific technique that matters is pose estimation: a model that looks at a single video frame and marks the location of 17 or more body landmarks, typically the wrists, elbows, shoulders, hips, knees, and ankles. Run that across every frame of a stroke, and you have a skeleton overlay that moves with the swimmer.

The underlying technology is a convolutional neural network, or CNN. A CNN is a deep learning architecture that processes images through layers of filters. The early layers detect edges and textures, middle layers find body parts, and late layers identify the full pose. It's the same family of models that powers face detection on your phone. What makes swimming harder than most sports is the visual environment: refraction at the water surface, bubbles from the kick, varying underwater light, and body parts occluded by splash. Research models trained specifically on swimming footage, like the SwimmerNET architecture published in 2023, have reached 94% agreement with international technique experts on marker-less underwater pose estimation.

What this means practically: coaches can now see exactly what a swimmer is doing, not just how fast they went. A 50-meter swim used to generate one data point, the split. That same swim, run through pose estimation, generates thousands of frame-level measurements about joint angles, stroke rhythm, and body position. The difference is the difference between reading a scoreboard and reading an MRI.

Four-stage diagram showing how pose estimation works: raw video frame, body landmark detection on 17 key joints, skeleton mapping connecting those joints, and the metrics extracted from the resulting skeleton
Pose estimation turns a video frame into a skeleton of 17+ landmarks, then into measurable metrics. Every stroke becomes thousands of data points instead of a single split.

How AI Analyzes Stroke Mechanics

The video-to-data pipeline runs in five stages. First, the camera captures footage. A phone, GoPro, or dedicated underwater housing will all work at this level. Second, the system extracts individual frames, typically 30 or 60 per second. Third, a CNN runs pose estimation on each frame, tagging landmarks. Fourth, a temporal model (usually a Long Short-Term Memory (LSTM) network) stitches the per-frame poses together into a sequence and understands the stroke cycle as motion over time, not as static snapshots. Fifth, the system extracts metrics and either displays them or flags deviations from a baseline.

Vertical flow diagram of the AI analysis pipeline: raw video input, frame extraction, pose detection by CNN, temporal analysis by LSTM, stroke classification, and coaching feedback output
The CNN-LSTM pipeline. The CNN identifies body landmarks in each frame; the LSTM understands how those landmarks move through the stroke cycle over time.

The metrics that come out the other end are the ones coaches have always wanted quantified and never could, not consistently. Elbow angle is the marquee metric. Research in the Young Scientist Journal out of Vanderbilt found that optimal elbow angle during the catch phase of freestyle sits between 80 and 100 degrees. Too open and you lose propulsion; too closed and you lose leverage. The same study demonstrated 96% accuracy in elbow-angle detection using standard video footage, no specialized equipment. Hand entry depth, hip rotation, knee bend during recovery, and the timing of each phase of the stroke cycle all come out of the same pipeline.

Real-time analysis is no longer a theoretical claim. Modern pipelines, running on consumer-grade hardware, can display metrics on a tablet within a second of the swimmer finishing a length. A coach on deck sees "elbow dropped to 68° on the third-to-last stroke of each 50" and has a specific, defensible cue to give. That is a different conversation than "you looked tired at the end."

Side-view illustration of a swimmer's arm showing an angle arc with the 80-100 degree optimal zone highlighted in green, and red zones marking angles that are too open or too closed, with efficiency ratings on a scale below
The optimal elbow angle during the catch phase sits in the 80 to 100° range. AI can detect a three-degree drift that the human eye misses entirely across a 30-minute practice.
An athlete working through an exercise with a trainer providing coaching cues during a dryland session
The coach's job is the same, but the inputs are different. A single video clip now extracts more biomechanical information than an hour of manual observation, and the on-deck conversation gets to start with the data instead of the guess.

Real-World AI Tools Coaches Are Using

Dartfish is the incumbent. Founded more than 20 years ago, it started as video-analysis software for alpine skiing and expanded across sports. Its modern swimming suite layers computer vision on top of that foundation: automatic object tracking, side-by-side comparison of two athletes or two attempts by the same athlete, drawing tools for live annotation, and integration with external sensor data. Its Dartfish 360 platform adds automatic pose detection and algorithmic flagging of technique deviations. Olympic programs and NCAA Division I swimming programs use it routinely. Pricing typically runs in the professional-software range, low-thousands per year depending on the tier.

iSWIM is the newer, AI-native entry. It positions itself specifically around race analysis: upload a race video, get back automatic split detection, stroke-rate extraction, and comparison against reference footage. It's cheaper than Dartfish and aimed more at individual swimmers and club programs than national teams. Its underwater analysis is improving quickly but still works best with clear, well-lit pool footage.

Beyond the two headline platforms, a cluster of newer tools is emerging: mobile-first apps that run pose estimation directly on a phone's processor, research-adjacent systems like the SWIM-360 project that combine video with wearable IMU sensors, and open-source implementations on GitHub that developers and technical coaches can adapt. The SwimAnalytics platform bridges the consumer and professional markets with subscription pricing around $100 to $500 per year, which puts real AI analysis within reach of any serious age-group or college program.

Comparison matrix showing four AI swimming analysis platforms (Dartfish, iSWIM, SwimAnalytics, mobile apps) across real-time capability, underwater support, cost tier, and accuracy percentage
Four categories of AI swimming analysis tools. The headline numbers converge around 94 to 96% accuracy; the real differentiators are cost, underwater support, and how much setup each one demands.
Three horizontal gauge bars showing pose estimation accuracy metrics: 96% elbow angle detection, 94% expert agreement on technique assessment, and 92% stroke recognition, with source citations below each
Published accuracy benchmarks from peer-reviewed research. The 94% expert-agreement figure is particularly notable, because it means the AI's technique assessment agrees with international coaches almost as often as those coaches agree with each other.

The practical workflow is simpler than the technology sounds. A coach sets up a phone or tablet on a tripod at the end of the lane, records a 30- to 60-second clip, and loads it into the platform. Within a few seconds to a few minutes, depending on whether processing happens on-device or in the cloud, the system returns the metrics, a skeleton overlay on the video, and a list of flagged frames. The coach then reviews the frames with the swimmer, using the visual as a conversation starter rather than a verdict.

Side-by-side before and after comparison of a swimmer's arm position: on the left, a red-marked suboptimal elbow angle with lower speed measurement, on the right, a green-marked optimal 90-degree angle with a higher speed and 17% improvement highlighted
A documented case: flag the dropped elbow, correct to the 80 to 100° range, and stroke efficiency climbs. The mechanism is simple; the reliability of spotting it across hundreds of strokes is what AI adds.

The Specific Data AI Extracts from Your Stroke

Modern pose estimation pipelines produce three categories of data: mechanical, temporal, and efficiency. Each one maps to decisions a coach has always made intuitively, now backed by a number.

Mechanical metrics describe body geometry at any given frame. Elbow angle (degrees). Hand entry position and angle of attack. Hip rotation (degrees off horizontal). Knee bend during the recovery phase. Wrist position relative to the elbow at the catch. These are the metrics research calls "kinematic," the shape of the motion.

Temporal metrics describe timing. Stroke rate (cycles per minute). Catch duration (how long the hand spends in the propulsive phase). Recovery duration. Intra-cyclic velocity fluctuation, the change in speed within a single stroke cycle. The ratio of left-arm to right-arm timing, which reveals asymmetry that swimmers almost never feel on their own.

Efficiency metrics are the derived ones. Distance per stroke. Propulsive-force indicators calculated from the product of hand speed and catch angle. Drag assessments inferred from body alignment. Energy-expenditure estimates from the integration of all the above. These are the hardest to measure accurately and the most useful for training decisions.

What the Data Sounds Like in Practice
  • "Your elbow drops 2° per 50 under fatigue." Identifies the exact moment conditioning gives out and technique follows.
  • "Left-right stroke timing asymmetry of 4%." Quantifies an imbalance the swimmer can't feel.
  • "Catch phase is 0.05 seconds shorter on the third length." Shows where power is leaking.
  • "Average elbow angle 92°, within optimal range." Confirms what's working, not just what's broken.

The point of any of these numbers isn't the number itself. It's the conversion into coaching cues. "Your hand is entering three inches too deep" is actionable in a way that "your entry looks off" never was. And "you've improved elbow-angle consistency by 4% over the last month" is motivating in a way that vague reassurance isn't.

What AI Cannot Do (Why Coaches Still Matter)

The accuracy numbers are impressive and the limitations are real. Pose estimation depends on video quality, and video quality in swimming is harder than in almost any other sport. Murky water, poor lighting, reflections at the surface, bubbles during the pull: all of them degrade the model's ability to find landmarks. Underwater camera angles are constrained. Most pool facilities don't have permanent camera installations, which means setup happens at every session.

Beyond the technical limits, there's the interpretive gap. AI can tell you that a swimmer's elbow drops 2° under fatigue. It cannot tell you whether that drop is a strength problem, a fatigue problem, a technique misunderstanding, or a shoulder impingement. All four produce the same data signature and require radically different coaching responses. A coach who knows the athlete's training history, injury status, and psychology picks up on the distinction immediately. An AI sees a number.

Two swimmers in training gear standing beside an indoor pool between sets
The best results come from combining AI precision with human coaching intuition. Technology provides the data; the coach interprets it within the context of each swimmer's unique physiology, training history, and goals.

Individual variation is the other big gap. The research consensus on optimal elbow angle is 80 to 100°, but the range that works for a 6'4" sprinter isn't identical to the range that works for a 5'6" distance swimmer with a longer torso proportionally. Good coaches know their athlete's physiology and adjust the target. A cloud-based AI trained on aggregate data gives aggregate advice.

Venn diagram with two circles: the blue AI circle contains capabilities like detecting joint angles, tracking timing, and measuring distance; the orange Coach circle contains interpreting context, emotional coaching, customized cues, and motivation; the overlap in the middle shows shared capabilities in technique feedback and trend identification
The capabilities that overlap, including technique feedback, progress tracking, and flagged deviations, are where AI amplifies a coach. The parts that don't overlap are where human coaches remain irreplaceable.

Research on explainable AI (XAI) for coaching, including the SWIM-360 project published in 2024, explicitly frames the best outcome as a partnership. The AI provides objective, high-frequency measurement. The coach provides context, interpretation, motivation, and the judgment call about when to push a technique correction and when to let a swimmer swim through a rough practice. The pattern is the same one that played out with GPS in cycling and with ball-tracking in tennis: the technology didn't replace the coach, it gave the coach better information and changed the conversation.

What's Next: The Future of AI in Swimming

The near-term roadmap (meaning the next two to three years) is mostly about accessibility. Mobile-first apps that run pose estimation directly on a phone's processor, skipping the cloud entirely, are already shipping. Multi-angle analysis, where several phones or fixed cameras capture the same swim and the AI reconstructs 3D body position, is moving from research labs into commercial products. In-pool cameras that give real-time feedback mid-length (imagine a heads-up display on a pair of goggles) are being prototyped. Prices are dropping: the same capability that cost a mid-size program $15,000 five years ago is starting to sit in a $300 app subscription.

The medium-term picture, roughly 2028 to 2032, is where it gets more ambitious. Predictive models that say "your form will degrade over the final 200m of this race, here's how to adjust pacing" are being trained right now on elite race datasets. Personalized AI coaches that adapt their advice to an individual swimmer's baseline, rather than giving generic elite-range feedback, are the subject of active research. Injury prediction, flagging the biomechanical signatures that precede shoulder overuse, for example, is one of the more promising applications because it addresses a real and costly problem in the sport.

Horizontal roadmap timeline showing three phases: Today (2026) with pose estimation, video analysis, and post-race review; 2028-2030 with real-time feedback, wearable integration, and in-pool monitoring; 2032 and beyond with predictive AI, autonomous coaching, and fully personalized training programs
Where swimming AI is heading. The progression is accessibility first, then real-time integration, then predictive and personalized coaching. Each phase makes the previous one cheaper and more reliable.

The honest assessment, though, is that swimming tech is still an emerging field relative to other sports. A 2025 systematic review published in Discover Applied Sciences analyzed 42 peer-reviewed studies on AI in swimming from 2018 to 2024 and found that the overwhelming majority focused on stroke classification, teaching models to tell a butterfly pull from a freestyle pull. Turn detection and fatigue monitoring were a second cluster. Full recommendation systems, meaning AI that takes all the data and tells a specific swimmer what to do differently in training tomorrow, essentially don't exist yet in peer-reviewed form. That is not a pessimistic note. It is a statement about how much room there still is for the technology to mature.

Within five years, AI analysis is likely to be standard equipment at every competitive pool. Within ten, personalized AI coaching will probably be accessible to age-group swimmers at a price point comparable to a tempo trainer today. The question for swimmers isn't whether to engage with it but when. The ones who learn to read the numbers early will have the same advantage that early adopters of underwater filming had in the 1990s.

Key Takeaways

  • Computer vision in swimming means pose estimation: deep-learning models that identify 17+ body landmarks per frame of video and track them through the stroke cycle.
  • Accuracy is real and documented. 96% elbow-angle detection. 94% expert agreement on technique assessment. These are peer-reviewed numbers, not vendor claims.
  • The CNN-LSTM pipeline is the standard. CNNs find body parts in each frame. LSTMs understand motion across frames. The combination produces metrics coaches couldn't measure consistently before.
  • Real-world tools are here. Dartfish for elite programs. iSWIM for race analysis. SwimAnalytics and mobile apps for individual swimmers and club teams. Costs range from $100 to thousands per year.
  • AI measures, coaches interpret. The same 2° elbow drop can be fatigue, a technique issue, a strength issue, or early injury. Coaches distinguish between them. AI doesn't.
  • Swimming tech is still emerging. Stroke classification is mature. Full AI recommendation systems aren't. The next five years will close that gap.

In the companion post, we break down which specific metrics actually predict fast times, including the numbers AI extracts and what they mean for training decisions.

Cover photo by Kindel Media via Pexels. Training photo by Ardit Mbrati via Pexels. Poolside photo by João Godoy via Pexels. Custom diagrams by The Pool Deck.

Comments

Leave a comment

Enjoyed this article?

Be first when BeBox ships

Join swimmers already on the list